Next-Generation Data Warehouses Make Their Debut
Next-Generation Data Warehouses Make Their Debut<@VM>Go-Slow Re-Engineering<@VM>Enabling Technologies<@VM>Down the Road<@VM>Data What?
By James Schultz
It's an archivist's nightmare: information made unusable because of disorganization or neglect. But there's a solution for bulging file drawers and reams of uncollated paper gathering dust.
To citizen and government agencies alike, data warehouses offer the tantalizing prospect of providing instantaneous, fingertip access to information that has been electronically converted and stored.
"Companies and government are working to recast their IT infrastructure," said Bob Samson, vice president for sales and strategy for the Storage Systems Group at IBM Corp. of Armonk, N.Y. "Once that infrastructure is built, the next logical step is how to effectively use the information you've collected. When you build a data warehouse, you can do a lot of things with it."
The Internet's rapid maturation is spurring development of next-generation data warehouses, according to Wayne Eckerson, director of education and research for the Seattle-based Data Warehousing Institute, which provides training and education on data warehousing and business intelligence.
In a recent report, Eckerson noted that the hallmark of a successful electronic enterprise, in government or out, will be the ability of implementers to bring information directly and individually to users, while linking internal systems to those of suppliers, distributors and other necessary parties.
"Successful companies will shift their focus to adding intelligence to their e-businesses," he wrote. "These companies will integrate e-commerce and business intelligence capabilities to optimize customer interactions through both online and offline channels, and speed the flow of information and products across an extended supply chain."
Data warehousing is more than creating a simple repository of information, however. Given steady, sometimes spectacular technological advance, electronic storage is perhaps the simplest task confronting warehousing designers.
Much more difficult are the tasks of data standardization, mining, analysis and leveraging. For example, for government agencies to provide seamless online services, information must be "cleaned" and categorized, a task usually complicated by government's multiplicity of data-containing legacy systems. Planning and care must be taken as the information in hard-copy documents is either typed or scanned, and existing electronic archives are transferred to a single data repository.
"Data warehousing is a strong challenge for government," said Jennifer Hill, director of the Public Sector Technology Center at SAS Institute Inc. in Cary, N.C. "Getting into legacy systems is not easy, particularly for 20, 30, 50-year-old information that has never been standardized. You need to attack it as an iterative process: Start small and grow. Bite off more than you can chew, and you won't be able to achieve the results you want and need."Given that there are potentially thousands of terabytes of archived paper documents, a go-slow approach may seem paradoxical. Rapid conversion and electronic storage would seem preferable, given economies of scale and the potential for much more effective access.
Nevertheless, data re-engineering must be deliberate, or planners will create the very informational mess they were hired to clean up.
"The issue now is how to take legacy systems and allow them to evolve to a more graceful architecture," said Alan Alborn, vice president of Science Applications International Corp. of San Diego. "Data warehousing gives you a lot in return for your investment, but you can't do it all at once. The essential step is getting the customer to agree to a definition of the data their enterprise cares about. If you don't start with a good architecture, you end up paving the cow paths. You'll move data of dubious quality into the data warehouse."
Building exceptional data quality was SAIC's mandate in a warehousing to upgrade and sustain the worldwide seismic sensor network, overseen by the Air Force Technical Applications Center, which is used to monitor nuclear weapons testing.
According to the terms of the four-year project, begun last summer, SAIC will install new seismic stations and upgrade existing ones with modern computer hardware and software technology. These stations include seismometers and computer systems to digitize, store, analyze and telemeter the seismic data to AFTAC's headquarters facility in Patrick Air Force Base, Fla.
The new and upgraded stations are intended to improve AFTAC's capability to detect a clandestine nuclear test and distinguish it from other seismic events (primarily earthquakes and industrial explosions) that occur worldwide several times per hour.
While not exactly of the same order of the SAIC Air Force effort, a legacy project undertaken by Unisys Corp. of Blue Bell, Pa., for the Education Department posed complex problems of its own.
The Education Department wished to integrate existing information on hundreds of thousands of students, thousands of schools and dozens of standardized tests. But because the data existed in several different formats and systems, integration seemed nearly impossible. In a prototype project that lasted about a month, though, Unisys was able to combine the information into a single data repository that department officials were able to easily navigate.
"They were having a lot of problems getting information from their legacy systems," said Solomon John, a software engineer with Unisys' U.S. Federal Government Group. "They were spending too much money, taking too much time and not getting very much usable data. In 40 days, we were able to take their legacy products and provide them with something they were able to access routinely."
According to Dan Kaplan, another Unisys software engineer, government is in the early stages of putting the power of data warehousing to use. It's a trend he thinks will accelerate, given the need and advantages.
"You're able to see in a few seconds what would normally take a few days, if at all," he said. "Your job becomes far easier. [With a warehouse] you're not just moving a few feet, but miles forward."Maintaining the momentum will require adding new capabilities to data warehousing architectures. Already, Kaplan said, adding graphical interfaces is making it easier for even the uninitiated to traverse the data thickets.
Further refinements, including advanced data mining and sophisticated data analysis and prediction, should speed the process. Also on the horizon are intensive text-searching features and the use of programmable "intelligent agents" to retrieve otherwise unattainable information.
For budget planners in particular, these tools will likely prove a godsend, especially in procurement, where online purchasing and reverse auctions are becoming routine.
In-house supply data warehouses enable managers to arrange substantial discounts from vendors by committing to specific quantities that will refresh inventories, even as they track vendor-product availability and past price quotes, and negotiate online for the best terms available.
"Data warehousing will be a necessity for reverse auctions," said SAS' Hill. "Data mining will help you go through all those nuggets of information. You have to learn about what you're collecting to realize the maximum value."
SAS has built several data warehouses for federal government clients, including a crucial component of the Environmental Protection Agency's Aerometric Information Retrieval System, or AIRS. Containing billions of pollution values and related data, AIRS is the most extensive collection of air pollution data known to exist, providing information for the entire United States.
For the Treasury Department, the company developed a means of storing and transferring complex, legacy financial information between its internal auditing systems. The product, the Treasury Information Executive Repository, or TIER, stores more than 300 general ledger accounts, 530 Treasury fund symbols and budget object-class codes. Included within the large volume of numbers is historical information for the past fiscal year.
Before the project was undertaken, data quality standards and data definitions were difficult to set and maintain because of the number and variety of the Treasury Department's financial systems.
The experts say data warehousing stands out as an especially strong application in areas where social services are delivered to diverse and large populations, for financial transactions, and, using a warehouse's analytical and predictive capabilities, as a means to detect fraud and abuse.
Data warehousing is particularly favored by agencies that must quickly and effectively handle enormous volumes of interrelated and interdependent information, while navigating complex regulations and serving the needs of diverse constituencies.
"There are certain kinds of information that governments can warehouse and analyze that have the greatest financial and programmatic impact," said Jack Ginsburg, vice president of Boston-based Bull Information Systems public-sector business. "They're high-visibility, high-impact programs: health and human services, tax and revenue, Medicaid, criminal justice and child welfare. All are areas where a good data warehouse will enable managers to efficiently manage and understand their populations."
Bull has helped the states of Illinois, Michigan, Minnesota, New York and Utah create Medicaid-related databases to help oversee caseloads, supervise costs and improve delivery of services.
In Iowa, the company is assisting in designing the state's first consolidated data warehouse. Officials have two goals: to encourage interagency sharing of information to improve operational efficiency, and to enable electronic delivery of services to citizens. The project's motto, "100 percent E by 2003," reflects the goal of making state information widely available online to the public.
"What we have tried to do is go into a lot of different agencies and show them what can be done," Ginsburg said. "People right now are still trying to figure out how to deploy [these warehousing systems]. We're just beginning to scratch the surface."
Citizens should expect that a new generation of data warehouses eventually would make government information widely available to the computer-enabled. But that happy day may not be soon.
According to Rishi Sood, principal analyst with GartnerGroup Inc. of Stamford, Conn., a market research firm, implementation will take time. Even though state and local governments were expected to spend $600 million on data warehouses in 2000, a figure expected to double to $1.26 billion by 2005, Sood said governments are still trying to figure out how best to use and then deploy the new information.
"The problem with data warehousing is that it tends to be a back-end system. It's really more of a strategic planning tool," Sood said. "As a citizen online, you only get the benefits a couple of iterations down the road ? maybe even years later."
For advocates, the Holy Grail of data warehousing is data virtualization, the creation of vast online warehouses that can be updated in real time. Such an advance presupposes not just simultaneous ? although likely ? advances in multiple related technologies, such as storage and broadband communications, but a degree of legacy-system data conversion that has yet to be achieved.
Nevertheless, as technologies converge and merge, warehouse architects should be able to incorporate powerful tools for data standardization, analysis and prediction. At that point, data warehouses won't simply be static repositories, but interactive, even proactive knowledge nodes useful both to governments and the taxpayers that support them.
"Governments around the world have all this information, and they're using it in new customer relationship modes," said IBM's Samson. "Over time, data warehousing will allow governments to leverage the information they have. I think there's a very bright future for data storage, leveraging and virtualization across an enterprise."Data warehousing was originally defined as the electronic collection, standardization and organization of enterprise-critical information in a computerized repository accessible to governments, businesses or the general public. More recently, the term has expanded to include other related capabilities, such as easy-to-navigate graphical interfaces, advanced search techniques, known as data mining and drilling, and sophisticated analytical and predictive features.