Scrubbing Source Data at the Local Level

by Michael Jacoby

First responders and private citizens are the first line of defense, particularly in their local communities, in times of crisis or need. Ensuring that those people, particularly, and local response units are provided accurate and reliable information in times of sudden emergency is therefore extremely important. However, computer data errors and/or discrepancies – in names, addresses, site locations, contact information, phone numbers, and similar data – during an dangerous event or incident can lead to a response unit being dispatched to the wrong location, or responders and other citizens involved being totally unaware of hazardous conditions that require special attention. Waiting for out-of-area assistance to arrive, or for the initial responders to be re-routed to the correct location, could mean the loss not only of valuable minutes but also, in some situations, of human lives.

The Envirofacts website of the U.S. Environmental Protection Agency (EPA) says clearly that it “provides access to several EPA databases to provide you with information about environmental activities that may affect air, water, and land anywhere in the United States [emphasis added].” Those “activities” include but are not necessarily limited to “toxic chemical releases, water discharge permit compliance, hazardous waste handling processes, Superfund status, and air emission estimates.” Unfortunately, numerous locational errors have been found over the past five years in the very EPA databases that are supposed to provide the helpful and precisely accurate information needed.

More specifically: Most but not all responders and planners referred to this type of data as FRS (Facility Registry Services) information. After the FRS addresses – which are based on collected or provided data – were plotted by the EPA’s own people and/or other (non-government) researchers, a disturbingly large number of so-called “sites of interest” were found to be positioned in such improbable locations as the middle of intersections, on interstate highways, and even in farm fields. Among the other erroneous data found were a number of properties plotted as much as 20 to 40 miles or more away from their correct locations.

These data discrepancies were brought to the attention of U.S. Representative Todd R. Platts (R-Pa.), EPA Administrator Lisa P. Jackson, and other senior officials in the U.S. Department of Homeland Security (DHS) and its Centers for Disease Control and Prevention (CDC) as well as members of various private-sector groups. Last year, in a letter dated 7 January 2011, EPA Assistant Administrator/Chief Information Officer Malcolm Jackson concurred that the EPA data “is vital for the public, and should be as accurate as possible.”

It is usually assumed, of course, that official databases such as Envirofacts are indeed “factual.” The problem with that assumption is that the data stored in Envirofacts or other official databases can be only as accurate as the data that has been provided (by any number of sources) and then entered into the database. However, the locational source data for certain sites of interest are provided by a broad spectrum of state and local government agencies and organizations as well as private-sector groups and other “stakeholders” – e.g., state departments of labor and environmental departments that may have their own separate (and frequently different) filing and data requirements.

For that reason alone, it is particularly important that the information being provided by government systems – and shared not only with emergency services agencies but also with the general public – be as accurate as possible; that goal may best be achieved through incorporation into the current system of a rigorous validation process. However, verifying and updating such an extremely large volume of vital data also requires much more, and more effective, public-private collaboration – on a continuing basis – in order to fully and effectively address the obvious deficiencies within the current system.

Millions of Records – Each and Every One of Them “Unique” According to the EPA’s own website, the FRS now has available “over 2.8 million unique facility records linking over 3.0 million program interests, including data from over 25 national environmental data systems and over 45 state systems.” However, after numerous examples of locational errors had been brought to the attention of both the EPA and state government officials, it was obvious that at least some of the data available is not as accurate as it should be for operational purposes, so a data-scrubbing process was started in south-central Pennsylvania to ensure that the locational data for any “site of interest” in that area would be both accurate and complete.

Obviously, knowing how to check the data and how to report an error to the EPA can help reduce delays during future emergency-response efforts. When creating a risk management plan (RMP), therefore, it is just as obviously important to check the vital information already available for local facilities to ensure that such data is both accurate and up-to-date. In addition to the dangers that can affect the general public, there are also many cases where exposure to a substance may affect only a select group of citizens who may not be recognized by other organizations, or individual citizens, because it may fall outside their respective “domains” of control. For example, persons with “special needs” – or suffering from hypersensitivity or from allergic concerns to certain chemicals – may need additional assistance if those same persons are living or working near one of the facilities listed as having created an RMP.

Some local governments maintain lists of special needs residents – e.g., ECRIN is used in York County, Pa., to “Evacuate County Residents In Need.” Other persons, afflicted with an even higher level of sensitivity, might already be on a state’s “Hypersensitivity Registry” list. Having those lists available can help the response efforts considerably in sudden times of crisis.

DV, OTIS, OSWER & VZIS The first step needed to correct current government data is to acquire basic knowledge about Data Verification (DV) procedures. A government employee sitting at his or her desk at EPA Headquarters in Washington, D.C., cannot, at present, accurately determine whether a site’s locational data is correct – because that information usually can be verified only at the local level by persons familiar with the site’s correct location. To rectify the errors discovered when incorrect (and/or incomplete) data is reviewed (and/or verified), the federal government has established a process, managed by the EPA, to report an error by using the EPA’s Integrated Error Correction Process and Online Tracking Information System (OTIS). Among the principal users of such data are the EPA’s Office of Solid Waste and Emergency Response (OSWER) and other agencies and departments, “Environmental Justice” organizations, and the general public.

Another tool offered through the EPA website by the Office of Emergency Management is the Vulnerable Zone Indicator System (VZIS), which provides a quick way to determine if a particular location might be affected by a chemical accident and/or is in the “vulnerable zone” of a facility submitting an RMP. The 1986 Emergency Planning and Community Right-to-Know Act, and certain chemical-accident “prevention provisions” in the 1990 Clean Air Act Amendment, help ensure that certain information on possible hazardous chemicals stored/warehoused at various businesses and/or other local facilities is publicly available from state and local governments.

True community preparedness requires the earnest and continuing efforts of all persons who live and work within the boundaries of that same community. When information related to various sites of interest in the community is in error – e.g., plotted in the wrong location, perhaps, and/or with incomplete or incorrect contact information, including phone numbers and addresses, etc. – the EPA’s reporting process can help significantly not only in reducing the reporting times required for individual citizens but also increasing the processing time available – and needed by the EPA to correct any errors that have been discovered.

One example: After verifying the large number of locational errors in south-central Pennsylvania that had been researched, officials of York County became committed to scrubbing the EPA data for their jurisdiction, as already listed – in alphabetical order. By learning more about the process and the accuracy of federal databases, other local governments, agencies, and individual citizens can determine if the data about their own facilities and sites of interest also should be thoroughly scrubbed. Restoring trust in data systems that are used in times of crisis or unusual need must be a whole-community effort if total community protection is the goal that must be attained.


To verify and correct information for sites of interest in FRS, use the following procedure: (a) visit the EPA – Envirofacts – Multisystem Query site; (b) in the “Geography Search” area, enter a local-area Zip Code number; (c) click “View Facility Information” next to the known facility name and address; (d) if the mapped location of the facility is incorrect, click the “Report an Error” button in the top right corner of page; and (e) follow the instructions provided by the EPA.

Vulnerable Zone Indicator System (VZIS)

Article on “EPA Errors on Environmental Hazards Map Send York County Man – And Government – On a Quest

Click for additional information on EPA’s Flowchart to describe the Error Correction Process

________________________ Michael J. Jacoby is a resident of York County, Pennsylvania, who has been actively concerned for some time about various environmental protection and safety issues. York County is a major community in EPA Region III, and is represented in Congress by U.S. Representative Todd Platts (R-Pa).