Head Count: Census Bureau Taps Data Management Tools
Following the lead of the corporate world, the federal government is harnessing the power of data management tools to improve its operational efficiency and to furnish information for itself and the public.
By Carolyn Hirschman
Following the lead of the corporate world, the federal government is harnessing the power of data management tools to improve its operational efficiency and to furnish information for itself and the public.
The government is using data management and data warehousing technologies for applications ranging from a military health care system to the 2000 census. Since 1997, agencies awarding data warehousing contracts have included the Air Force, Army and Navy, as well as the departments of Commerce, Housing and Urban Development and Treasury, according to Federal Sources Inc., a consulting firm in McLean, Va.
"The federal government is using data warehousing to an extent, but they're not using it to the extent they could," said Payton Smith, manager of strategic studies at Federal Sources. "There's a tremendous opportunity for benefit that's not being tapped."
Federal Sources, which tracks federal IT spending, has no breakdown for data warehousing expenditures. However, government customers of all types will account for about 8 percent of the much larger data solutions market in 2002, or more than $9 billion, according to a 1998 study by the Palo Alto Management Group, a Mountain View, Calif., market research and consulting firm.
Worldwide sales of data warehousing hardware, software and services are projected to shoot from $11.5 billion in 1997 to $29 billion in 2002, a 30 percent average annual growth rate, according to International Data Corp., Framingham, Mass.
Data warehousing is growing "because of the economic benefits that can be derived from this technology. There are many, many applications," said Michael Burwen, president of the Palo Alto Management Group.
The Government Performance and Results Act of 1993, which requires federal agencies to report their performance to Congress, helped to push adoption of the technology, said John Bender, a data warehousing support technologist with Oracle Corp., Redwood Shores, Calif. Many agencies have set up data warehouses to help track their activities and budgets; some are further along than others, he said.
The Census Bureau already is in gear to use data warehousing for the 2000 census. The decennial (occurring every 10 years) survey of virtually every household in America, a projected 275 million U.S. residents, generates a wealth of data about the population's age, income, race, housing, employment and other characteristics.
Data warehousing technology has come a long way since the last census in 1990, and the bureau will take advantage of it this time around, said Enrique Gomez, who manages the bureau's American Factfinder project.
"In 1990, there was a lot less data warehousing technology here in the bureau. For example, we didn't have Oracle," he said. "A lot of [analysis was based on] customized Fortran programming code. SAS [a statistical software tool] was just catching on."
In January 1998, the Census Bureau awarded TRW Inc., Cleveland, the prime contract for data capture services for the 2000 census. The three-year contract, valued at more than $187 million, is the first time the bureau has outsourced a decennial census using state-of-the-art imaging technology. The company has established data capture processing centers in Baltimore, Phoenix and Pomona, Calif.
The data collected and imaged from more than 79 million census forms will be stored in an Oracle data warehouse, estimated at 4 to 5 terabytes in size. That is housed on a mainframe supported by IBM RS/6000 servers at the bureau's computer center in Bowie, Md., Gomez said.
Once the data is edited and processed for accuracy and readability, it will be stored on two servers: one that keeps the information confidential and another for public dissemination. The latter data ultimately will be published on CD-ROM and posted on the bureau's Web site (www.census.gov) under American Factfinder.
Initially, census workers will sort the population data geographically, so that by April 1, 2001, states can get the information they need to redraw legislative districts, Gomez said. To do that, the Census Bureau will use mapping software from Environmental Systems Research Institute Inc. of Redlands, Calif.
Next, the bureau's statisticians and demographers will analyze the data and churn out the many reports and tables relied on by researchers, the media, state and local governments and others.
Because the raw data is too voluminous to be posted on the Web, all analysis will be done in-house, using statistical software from SAS Institute, of Cary, N.C., and tabulation software from the Australian firm STR, Gomez said. However, the research reports will end up online, starting in the first quarter of 2001.
The accuracy of the final reports is only as good as the data on which they are based. New data capture technology will help the bureau process a flood of questionnaires, about 120 million forms in 100 days or more than 1 million per day, more quickly, accurately and cheaply than before.
The bureau is outsourcing its forms processing to take advantage of the latest optical character recognition technology, said Dick Taylor, a senior system architect at Lockheed Martin Mission Systems in Gaithersburg, Md. Lockheed Martin was awarded a $150 million contract in 1997 to develop the data capture system for the 2000 census.
"Our job is to turn paper into ASCII (American Standard Code for Information Interchange)," he said. It is the same job as in 1990, but things will be much different. In 1990, only information in checked-off boxes, such as sex and race, was processed automatically, Taylor said. Everything else was fed into a windshield-wiperlike device that flipped pages as they were photographed. The information was then put on microfilm and keyed into computers.
By contrast, only 20 percent of the data on the 2000 census forms must be keyed in, Taylor said. The rest, including handwritten information such as names and telephone numbers and information in several languages, will be scanned, imaged and read by Lockheed Martin's integrated system. The system, run on Microsoft NT 4.0, features off-the-shelf hardware and software from Electronic Data Systems, Kodak and other vendors.
"Although the forms are structured and predictable, how they're filled out is not," said Reynolds Bish, chief executive officer of Captiva Software Corp. of San Diego, a Lockheed Martin subcontractor. People write information on the wrong lines, draw arrows and correct mistakes, so a certain amount of information will always need to be verified manually, he said. It all boils down to getting the cleanest, more accurate data possible.
The Census Bureau will realize undetermined savings because it will need only about 2,600 data-entry operators at its data-capture centers, Taylor said. In 1990, it used at least five times that many operators.
Will the census survey be totally electronic one day, with people filing forms online?
"By the next time, that will be the norm," Taylor said. For now, the bureau is conducting a "small experiment" with online filing for the 2000 census and working out the security issues, he said.
Priscilla Emery
NEXT STORY: Channel News