Census Bureau uses XML for fast economic data collection
The Census Bureau is using extensible markup language to speed the layout and assembly of economic census forms that will go to millions of businesses this December.
The Census Bureau is using extensible markup language to speed the layout and assembly of economic census forms that will go to millions of businesses this December.Developed by Fenestra Technologies Corp. of Germantown, Md., the new Generalized Instrument Design System has a central metadata repository plus four software applications that use the metadata to automate forms layout.Every five years, the economic census collects data about 6.5 million U.S. businesses, said Larry Blum, assistant division chief for collection activities in the bureau's Economic Planning and Coordination Division in Suitland, Md. Commerce Department analysts use the results to compute the gross domestic product and other measures of U.S. economic health.About 1.5 million of the companies are small enough that the Census Bureau gets enough data about them from Internal Revenue Service payroll and income tax files, Blum said. Most of the rest do business in only one location, so they fill out a single set of economic census forms. The questions answered by these single-establishment businesses depend on their categories under the North American Industrial Classification System. The bureau makes up different forms for each NAICS category so businesses needn't wade through a single gigantic form full of inapplicable questions, said Steven Schafer, Fenestra's chief technology officer.Large companies that do business in multiple locations fill out one set of census forms per location, Blum said. With paper forms running six to 20 pages each, some big corporations must answer hundreds or thousands of pages of questions. The forms are "sometimes delivered on pallets," said Rick Rogers, Fenestra's chief executive officer.Without GIDS, census workers never would have been able to design electronic and paper forms for each of the 650 NAICS categories, Blum said.Laying out each form with a graphics design package produces good-looking questionnaires, but it's time-consuming and tedious, said Dennis Wagner, special assistant in the Census Bureau's economic planning division and GIDS team leader.So Fenestra created a forms designer application for layout, an auto-format application to assemble the pages into forms, a preview tool and a surveyor tool to display electronic versions of the forms and collect responses.Although the 650 forms are tailored to different industries, many questions apply to multiple forms, Wagner said. GIDS automates the layout of more than 90 percent of the pages. The XML metadata attached to each question in the repository controls the typography and placement of questions on each page."If the question's the same, you only have to design the content once for all the forms," Blum said. Past economic censuses used legal-size paper for many forms, but surveys found that employers preferred standard letter-size paper. "We're going to give them what they want, but I don't think they're going to be too happy about it," Blum said. The smaller page size has substantially increased the number of pages in some forms.The software had to lay out pages to exacting specifications, with tolerances as small as 0.001 inch, for compatibility with the economic census' revamped data-capture system.The last economic census in 1997 used a Digital Equipment Corp. key-from-paper system to capture data, Blum said. For this year's tally, workers will scan paper forms using some leftover equipment from Census 2000, then key the data from electronic images of the forms, Blum said. That system is now being built in the bureau's National Processing Center in Jeffersonville, Ind.The key-from-image system will use special templates that block out portions of the questionnaire and show the data entry workers only the fields they need to read, Rogers said. The templates get the precise coordinates of the data fields from GIDS.Bureau officials are also using GIDS layout and content metadata to design downloadable electronic versions of many of the economic census forms. The metadata repository includes so-called behavioral metadata that controls how the online forms respond to inconsistent entries.In 1997, Fenestra conducted an electronic-filing pilot for 20 of the 600 forms used in that year's economic census. Because data was checked as it was entered, the results were cleaner with higher data integrity than for manually keyed responses, Rogers said.About 300,000 companies answered the 1997 economic census electronically, but bureau officials hope to increase that figure to between 550,000 and 700,000 this year, Blum said.Although some past projects have used metadata, the economic metadata repository in an Oracle9i database management system is the bureau's first such centralized database, Wagner said."If you don't have reusable metadata, you've spent a lot of time doing something and don't realize its full potential," Rogers said.Beyond the 2002 economic census, the bureau and Fenestra are working on a broader metadata repository to speed future projects, Wagner said.
Every five years, the economic census collects data about 6.5 million U.S. businesses, said Larry Blum, shown with Dennis Wagner at the Census Bureau's Economic Planning and Coordination Division in Suitland, Md.
NEXT STORY: Virtual vaults