Distinctions Get FUZZY Between
Database Management and Data Warehousing
By John Makulowich ike the soft line marking the horizon in an impressionist painting, the division between database management system and data mart and warehouse begins to blur the closer you get.
Driven by the diversity of data - from true color images and rich audio to full-blown multimedia - and the demand for packaged information from knowledge workers, the future shape of database management systems is beginning to show up among the major players. One sign of what's ahead appeared last September when IBM Corp., Armonk, N.Y., released its DB2 Universal Database, proudly proclaimed by its publicists as the industry's first multimedia, Web-enabled database.
The product, they noted with enough IT clichés and acronyms to choke a freshman in Advertising 101, combined "power for business intelligence (decision support, data warehousing and data mining) with industry-leading performance and reliability for transaction processing. From OLAP [On-line Analytical Processing] to OLTP [On-line Transaction Processing], fast, reliable and secure application solutions can be developed easily and quickly, turning technology investment into competitive asset."
Visual Warehouse and DB2
Beyond the hard sell, a case for an IBM solution called Visual Warehouse that uses DB2 comes through Don Schau, an information consultant with Winnipeg, the capital of Manitoba, Canada. In early 1996, he orchestrated the purchase of the IBM database product, which he is now upgrading.
At the time, Schau and his colleagues were studying data warehousing, exploring its benefits and possible uses. Their effort focused on trying to find an area in which to apply it that made business sense.
One irksome problem Schau faced was to substantiate with statistics and analysis Winnipeg's claim that it was an equitable employer and had dealt correctly with gender bias and racial discrimination. It was then that he discovered a report on equity and diversity in the workplace prepared by the city's Equity and Diversity Office.
"It almost screamed data warehouse, requiring decision support and analysis. And the report outlined performance metrics that the office wanted to measure," notes Schau. "It became for us an excellent pilot, to build a data warehouse with IBM Visual Warehouse and collect information on the city's employment practices. It was also excellent because it involved much less investment to get it delivered."
The warehouse's main data source was the city's DB2 payroll database. Data was extracted monthly, blended with survey data and put into a DB2 data warehouse using Visual Warehouse. Now a query can be processed, on average, in 45 seconds; previously it took weeks because data were scattered on different systems.
The data warehouse for the first time allowed the city to comprehensively collect and analyze data on gender, ethnicity and job type, not only for existing employees but also for potential staff. With the ability to track applicant profiles and to determine the kind of work force it's attracting, the software allows the city to proactively develop a work force that is representative of the general populace.
Now Schau is moving to the Visual Warehouse for Microsoft's Windows NT. His next step is to expand on what has been done by creating a warehouse for the city's human resources data.
The product for NT comes with agent technology, which lets the user shift the data extraction process to a separate server. Thus, administering the warehouse can continue on one server, while on another the agent extracts data from the source database and populates the warehouse.
The Fine Line When asked where he draws the line between database management system and data warehousing, Schau admits that, more and more, it depends on the organization.
However, he reserves the notion of database management system for the simple recording of data and for its use on the micro level - that is, as an operational function pairing a customer with an invoice on a given date or for a given amount.
On the other hand, data warehousing focuses on how a business is performing, on trend analysis and on the strategic information that can be gleaned from the analysis.
"Data warehousing involves performance issues. Even though you are physically talking about the same pieces of data, when you are trying to get strategic information, you need to reorganize the data. You discover that it is sometimes difficult to write queries to get the information you want," says Schau.
While Schau admits what he is doing is very small compared to the systems of other IBM customers, his program has become an IBM reference site. That's because of the innovative approach he takes: the decision not to write any code for the tasks the software must perform. In essence, it puts the Visual Warehouse through its paces and shows where its scope must be broadened.
Jeff Jones, IBM
Jeff Jones, program manager of data management marketing for IBM Software Solutions, says that while the government applications for DB2, in general, will have many of the same requirements of enterprises, the functions performed may be different; for example, the data stores. However, the basic goal remains the same: bring content to the right venue at the right time.
From his standpoint, there are six areas in which DB2 excels, areas that he says all database software must address to meet the needs of today's public or private customer. The four editions of DB2 each target a different competitor, from Microsoft to Oracle, as well as different platforms, from Windows NT to Unix.
The six areas include:
- Accessibility, the ease of access from any place with any equipment.
- Applicability, the ability of any application to run well on any platform and equipment, from stand-alone laptop or 24-node cluster.
- Extensibility, the ability to support such media as video, image, audio and spatial as well as security. Lots of extenders.
- Manageability, the ease of installation, administration, management and use.
- Reliability, the necessary quality and enhancements to assure availability, including faster recovery utilities and increased granularity of utilities.
- Scalability, the ability to handle richer content and more unpredictable workloads.
Another perspective on the demands for database management systems comes from the acknowledged market leader, Oracle Corp., based in Redwood Shores, Calif.
According to Timothy Hoechst, vice president for technology in Oracle's Government, Education and Health Division, the primary IT data challenges for the government include the year 2000 problem and meeting a new level of demand from end users that has arisen with the advent of the World Wide Web.
"In the issue of Y2K, the question many government agencies are facing is, should I fix the old or buy new. It's almost too late to be asking that question," says Hoechst. "But another issue not receiving as much attention is end-user demand for a whole new level of access to information. The problem is the enterprises have not consolidated the data in the way the users want. It does not exist."
With user communities now more focused on what they want, Oracle is also focusing on the problem of access to information to the people who need it in a way that is timely, secure and cheap.
"As an executive at Netscape noted, the challenge is to get people just enough information. It used to be that technology could not keep up with need," notes Hoechst. "Frankly, we have far surpassed what enterprises are able to implement technologically. The key issue now is, how can we use it to solve business problems?"
When asked about the future, Hoechst answers in mantra fashion: "Network computing, network computing, network computing."
For Hoechst, that translates into focusing not on the data management problem but on the access problem, to make more information available in a cost-effective way.
"First, we had centralized computing. Professional computer people were administering the system. The problem was it lacked enough flexibility," explains Hoechst. "Now we have the client/server platform. The problem here is that it is incredibly complicated and expensive to deploy in the modern world. There is too much on the desktop."
According to the Oracle vision, with network computing you move the applications' complexity from the desktop to the network, where it can be professionally managed. This essentially places all the complicated logic in the network.
In this scenario, the Web browser, such as Microsoft's Internet Explorer or Netscape's Communicator, becomes the universal interface.
"There's a more important issue here," says Hoechst. "When the applications are on the network, they can be shared. In the past, you had to hook databases into one another. That does not solve the problem. You begin to realize that the data is not relevant. What is relevant is the application."
With all the talk of network computing, a concept both IBM and Oracle support with different hues, the heavy hitter of the desktop ever looms in the distance - the massive presence of Microsoft and its pitch for SQL Server 6.5.
Tim McCaffrey, application development team manager for Microsoft Federal, claims there is a big surge in federal organizations deploying enterprise-class applications built on SQL Server 6.5, especially for data marts and intranets.
"Users are attracted to the quick and easy development and deployment cycles that SQL Server and the BackOffice platform allow, especially when they are used with Visual Basic. We see many organizations standardizing on the complete BackOffice architecture because the products are integrated," says McCaffrey.
As evidence that SQL Server is ready for prime time - that is, enterprisewide, mission-critical applications - Microsoft points to the Federal Aviation Administration's deployment to about 120 worldwide sites for use by more than 3,500 flight standards inspectors.
The inspectors complete forms on the status of flight worthiness, for example, equipment and crew, and file them into the local airport's SQL Server database. This can then be accessed by the FAA and commercial airline personnel to assess the status of readiness. The information then is replicated to other SQL Server sites on the FAA network using SQL's built-in replication.
Other prime-time demonstrations of SQL scalability include the Department of Housing and Urban Development and the Federal Reserve. HUD, which recently standardized on SQL Server, has current applications with requirements for 2,000 to 10,000 users on databases that range from 40 megabytes to 10 gigabytes. More than 64 SQL Server databases are in various phases of development, test/pilot and production at HUD, according to Microsoft.
And the Federal Reserve uses SQL Servers to store financial information that is accessed by all 13 Federal Reserve banks nationwide. The number of users has reached 1,000, with the database scaling to 5 gigabytes.
Running alongside the major players is one database company struggling to resurrect its former fortunes.
Publicly traded Sybase Inc., headquartered in Emeryville, Calif., has seen its share price descend from $40 per share to $10 per share over the last three years. In the last year alone, its range has been $23 to $6 a share.
While Sybase has faced some difficulties lately in the commercial arena, its Public Sector Group is among the strongest performers in the firm, according to a company spokesperson. That group offers open software architecture, and its products include Adaptive Server Enterprise, Adaptive Server Anywhere and Adaptive Server IQ for OLTP.
| ||"The amount of data at the back end is getting larger and larger. The challenge is primarily to have that data stored and accessed in reasonably quick fashion."|
-Steven Morck, Sybase Public Sector Group
It also features the Enterprise-Connect family of middleware tools and Powersoft application development tools to integrate legacy systems and develop enterprise client servers, as well as Internet applications.
The company sports a fairly sizable, installed base in the intelligence and defense communities and provides the back-end database and tools for the Defense Department's Standard Procurement System. It also provides infrastructure for the Treasury Department, the Federal Communications Commission, the Securities and Exchange Commission and the Bureau of the Census.
For Stephen Morck, director for public service professional services for Sybase Public Sector Group in Bethesda, Md., and head of the company's systems integration activities in the government, the government is migrating from the mainframe to smaller footprint technology and building whole suites of applications around those small technologies.
Among bigger problems he sees are the year 2000 programming flaw and creating data warehouses to bring together large amounts of data to connect disparate systems.
"The amount of data at the back end is getting larger and larger. The challenge is primarily to have that data stored and accessed in reasonably quick fashion," notes Morck.
He freely admits there no longer is a discrete break point between database management systems and data warehouses.
Morck himself views data warehouses as two types: One, a homogeneous collection of information on a specific subject; the other, different database management systems tied together.
Kim Weatherford, deputy director for information systems for the Texas Workforce Commission in Austin, is one customer using the Sybase solution to the state's advantage. The commission is a state agency that administers work force programs such as unemployment, Job Training Partnership Act and the Employment Service Program.
The program, which uses Sybase, is known as TWIST for The Workforce Information System of Texas. It amounts to a database integration solution for Texas employment and training agencies to help citizens move from welfare back to work by integrating the information services of all Texas agencies responsible for job training and career services.
Started in 1996, TWIST ties together agencies with incompatible computer systems so they can share information.
This allows them to function as a seamless organization and offers communication and collaboration for caseworkers from agencies in the Texas Workforce Commission. Before TWIST, clients often got frustrated going through the eligibility process over and over because computer systems didn't share data.
TWIST supports more than 2,000 users and 16,000 transactions daily in the peak summer season. TWIST is paid for by public funds comprised of Federal Government One-Stop, JOBS and JTPA grants. The grants' purpose is to empower states to find solutions to move citizens from welfare to work.
According to Weatherford, the development effort began as an answer to a one-stop grant from the Department of Labor, allowing users to go to one location to file information. But there was a problem with multiple and incompatible computer systems.
That all changed. Since its beginning, the project has won two national awards and gained the attention of several states.
"As far as development goes, this has been the smoothest, as well as the best received, computer project," Weatherford says. "It's an anomaly, because usually there's a lot of initial enthusiasm, and interest fades as the project moves ahead. In this case, we began with little fanfare, and as the system has grown, interest has grown."
Weatherford says the commission has even gotten the attention of the Texas Legislature and the governor's office, and it has demonstrated the project for North Carolina, Illinois, Michigan and California.
The secret? It came along at the right time and was the first of its kind, says Weatherford.