Head Count: Census Bureau Taps Data Management Tools
Head Count: Census Bureau Taps Data Management Tools<@VM>Data Warehousing Made Simple
By Carolyn Hirschman
Following the lead of the corporate world, the federal government is harnessing the power of data management tools to improve its operational efficiency and to furnish information for itself and the public.
The government is using data management and data warehousing technologies for applications ranging from a military health care system to the 2000 census. Since 1997, agencies awarding data warehousing contracts have included the Air Force, Army and Navy, as well as the departments of Commerce, Housing and Urban Development and Treasury, according to Federal Sources Inc., a consulting firm in McLean, Va.
"The federal government is using data warehousing to an extent, but they're not using it to the extent they could," said Payton Smith, manager of strategic studies at Federal Sources. "There's a tremendous opportunity for benefit that's not being tapped."
Federal Sources, which tracks federal IT spending, has no breakdown for data warehousing expenditures. However, government customers of all types will account for about 8 percent of the much larger data solutions market in 2002, or more than $9 billion, according to a 1998 study by the Palo Alto Management Group, a Mountain View, Calif., market research and consulting firm.
Worldwide sales of data warehousing hardware, software and services are projected to shoot from $11.5 billion in 1997 to $29 billion in 2002, a 30 percent average annual growth rate, according to International Data Corp., Framingham, Mass.
Data warehousing is growing "because of the economic benefits that can be derived from this technology. There are many, many applications," said Michael Burwen, president of the Palo Alto Management Group.
The Government Performance and Results Act of 1993, which requires federal agencies to report their performance to Congress, helped to push adoption of the technology, said John Bender, a data warehousing support technologist with Oracle Corp., Redwood Shores, Calif. Many agencies have set up data warehouses to help track their activities and budgets; some are further along than others, he said.
The Census Bureau already is in gear to use data warehousing for the 2000 census. The decennial (occurring every 10 years) survey of virtually every household in America, a projected 275 million U.S. residents, generates a wealth of data about the population's age, income, race, housing, employment and other characteristics.
Data warehousing technology has come a long way since the last census in 1990, and the bureau will take advantage of it this time around, said Enrique Gomez, who manages the bureau's American Factfinder project.
"In 1990, there was a lot less data warehousing technology here in the bureau. For example, we didn't have Oracle," he said. "A lot of [analysis was based on] customized Fortran programming code. SAS [a statistical software tool] was just catching on."
In January 1998, the Census Bureau awarded TRW Inc., Cleveland, the prime contract for data capture services for the 2000 census. The three-year contract, valued at more than $187 million, is the first time the bureau has outsourced a decennial census using state-of-the-art imaging technology. The company has established data capture processing centers in Baltimore, Phoenix and Pomona, Calif.
The data collected and imaged from more than 79 million census forms will be stored in an Oracle data warehouse, estimated at 4 to 5 terabytes in size. That is housed on a mainframe supported by IBM RS/6000 servers at the bureau's computer center in Bowie, Md., Gomez said.
Once the data is edited and processed for accuracy and readability, it will be stored on two servers: one that keeps the information confidential and another for public dissemination. The latter data ultimately will be published on CD-ROM and posted on the bureau's Web site (www.census.gov) under American Factfinder.
Initially, census workers will sort the population data geographically, so that by April 1, 2001, states can get the information they need to redraw legislative districts, Gomez said. To do that, the Census Bureau will use mapping software from Environmental Systems Research Institute Inc. of Redlands, Calif.
Next, the bureau's statisticians and demographers will analyze the data and churn out the many reports and tables relied on by researchers, the media, state and local governments and others.
Because the raw data is too voluminous to be posted on the Web, all analysis will be done in-house, using statistical software from SAS Institute, of Cary, N.C., and tabulation software from the Australian firm STR, Gomez said. However, the research reports will end up online, starting in the first quarter of 2001.
The accuracy of the final reports is only as good as the data on which they are based. New data capture technology will help the bureau process a flood of questionnaires, about 120 million forms in 100 days or more than 1 million per day, more quickly, accurately and cheaply than before.
The bureau is outsourcing its forms processing to take advantage of the latest optical character recognition technology, said Dick Taylor, a senior system architect at Lockheed Martin Mission Systems in Gaithersburg, Md. Lockheed Martin was awarded a $150 million contract in 1997 to develop the data capture system for the 2000 census.
"Our job is to turn paper into ASCII (American Standard Code for Information Interchange)," he said. It is the same job as in 1990, but things will be much different. In 1990, only information in checked-off boxes, such as sex and race, was processed automatically, Taylor said. Everything else was fed into a windshield-wiperlike device that flipped pages as they were photographed. The information was then put on microfilm and keyed into computers.
By contrast, only 20 percent of the data on the 2000 census forms must be keyed in, Taylor said. The rest, including handwritten information such as names and telephone numbers and information in several languages, will be scanned, imaged and read by Lockheed Martin's integrated system. The system, run on Microsoft NT 4.0, features off-the-shelf hardware and software from Electronic Data Systems, Kodak and other vendors.
"Although the forms are structured and predictable, how they're filled out is not," said Reynolds Bish, chief executive officer of Captiva Software Corp. of San Diego, a Lockheed Martin subcontractor. People write information on the wrong lines, draw arrows and correct mistakes, so a certain amount of information will always need to be verified manually, he said. It all boils down to getting the cleanest, more accurate data possible.
The Census Bureau will realize undetermined savings because it will need only about 2,600 data-entry operators at its data-capture centers, Taylor said. In 1990, it used at least five times that many operators.
Will the census survey be totally electronic one day, with people filing forms online?
"By the next time, that will be the norm," Taylor said. For now, the bureau is conducting a "small experiment" with online filing for the 2000 census and working out the security issues, he said.
By Carolyn Hirschman
Data warehousing, the powerful computer technology that captures information and makes it accessible and useful, is not cheap nor easy to install, but its payoff may be tremendous, according to experts.
Often touted as a system for decision support or business intelligence, data warehousing is a way to gather, store and transform an organization's computerized data so that workers can retrieve information in a useful way. The ultimate goal is to use that information to perform analyses, improve decision-making and set strategy.
"The data warehouse is the foundation for informed decision-making, which will lead to changing the character of operations," said Henry Morris, vice president of data warehousing research at International Data Corp., Framingham, Mass. "The goal is to improve future actions."
Rooted in the first database-management systems developed more than 40 years ago, data warehouses are large, central repositories of information. Typically housed on Unix-based or mainframe servers, they act as a critical link between transaction processing systems and data retrieval systems. Their value lies in uncovering and analyzing any number of trends and patterns.
"You're taking a series of snapshots to see how things have changed and why they've changed," Morris said. For example, a data warehouse can help an investment brokerage look at a set of daily trades to compare the risk of various financial instruments. A retailer can track customers' purchases to figure out which items are bought at the same time, then stock its shelves accordingly.
Before data warehousing caught on, databases were routinely updated and deleted, wiping out old information that would have helped show patterns over time. More commonly today, an organization has many databases but they aren't linked, making it hard for workers to get the exact information they need.
Designing, building and maintaining a data warehouse is no simple job. "It's a very daunting task. It can take a year or two," said Priscilla Emery, senior vice president at the Association for Information and Image Management in Silver Spring, Md. The average cost of a new data warehouse is $1.8 million, according to the Palo Alto Management Group, a Mountain View, Calif., research firm.
The average size of a data warehouse will shoot from 272 gigabytes now to 6.5 terabytes by 2001, according to the firm. Today's government data warehouses are in the 500-gigabyte to 1-terabyte range, the group noted.
Data warehousing's hallmark is versatility. Spanning many industries and applications, it is used mostly by the telecommunications and financial services sectors, according to a 1999 report by the Palo Alto Management Group.
The leading applications are in finance and marketing for customer identification and retention, category management and vendor performance.