Find the right box for your data
<@VM>Saving the president's mail
Open standards holds key to sharing, moving and preserving
Years ago, the Pentagon turned over huge amounts of raw data on the Vietnam War to the National Archives and Records Administration to be stored in perpetuity. But that wasn't the end of it.
The military later wanted copies of some of the archives, which contained information on soldiers' health and medical care as well as Agent Orange data, to help medical officers prepare studies on the health of soldiers at war.
Kenneth Thibodeau, who directs NARA's electronic archives project, anticipates many similar requests from federal agencies, starting in 2007 when NARA's Electronic Records Archive is up and running. The amount of data that will flow in and out of the archive from government agencies likely will be staggering. It will require greater compatibility and interoperability between the archives and other government IT storage systems, he said.
"I don't think people from the Pentagon and other agencies will be coming in to click and download individual files. They'll want entire databases," Thibodeau said. "Interoperability could be important."
Many government agencies and corporations are seeking greater interoperability in storage, to let storage systems interact with other systems and allow use of different vendors and tools. For the data storage industry, the crux of being able to respond to those needs and foster flexible use between multiple storage systems comes down to mostly one thing: open standards.
Several industry efforts are under way to develop open standards available to all ? creating common code for storage system management tools ? and to improve the ability of IT managers to integrate storage more easily while using multiple vendors.
Despite their common goal, storage industry standardization groups recently have splintered. That means customers, including government IT managers, may continue to sit on the sidelines while industry members wrangle over who will take the lead in solving the problem.
The squabble may give IT managers an opportunity to play a larger role in influencing the path toward open standards. More than one in four storage professionals surveyed by the Storage Management Industry Association in December said they were struggling with a lack of interoperability and integrated solutions. This challenge was pegged as one of the four most significant findings of the survey.
Continued feedback from government and corporate IT managers could spur industry to solve these problems, many experts said. "In writing standards, it's a good idea to talk to the users," said Melonie Warfel, director of worldwide standards for Adobe Systems Inc. "The users will drive what the technology companies can do."Dynamic Change
Interoperability is one of the latest hot issues to hit data storage. The industry is large, complex and fragmented, with sectors devoted to hardware, software, management and services. Globally, the storage market was valued at $65 billion in 2005 and is expected to grow to $80 billion by 2009, according to research firm International Data Corp. of Framingham, Mass.
Recent years have brought dynamic change, with enormous increases in the volume of data to be stored, including an explosion in the number of electronic records such as e-mails. New regulations such as the Sarbanes-Oxley and Health Insurance Portability and Accountability acts are requiring data to be more accessible over longer periods of time. Corporate scandals have focused attention on the importance of electronic records in lawsuits and in influencing public opinion. And Hurricane Katrina swept in a renewed demand for data for disaster response and recovery.
A sense of urgency following the Sept. 11, 2001, terrorist attacks has in recent years increased the government's focus on information-sharing, information-centric environments and information security. To handle data more strategically, many IT managers are applying a lifecycle approach, assessing data's value now and in the future, and designing systems to manage and store the data based on those priorities.
All this activity contributes to a rapidly changing marketplace for storage. "The storage software market continues to see strong growth driven by fundamental business and operational forces," said Laura DuBois, research director of IDC storage software.
Many government agencies store electronic documents in proprietary formats, including Adobe's Portable Document Format and Microsoft Corp.'s Word. More recently, technologies such as Extensible Markup Language have developed that are interoperable with multiple applications and formats.
Common standards for e-records storage and archiving are just beginning to have an impact.
The largest industry group, the Storage Network Industry Association, with industry heavy hitters EMC Corp. of Hopkinton, Mass.; Symantec Corp. of Cupertino, Calif.; and Hewlett Packard Co. of Palo Alto, Calif., included as members, released the initial Storage Management Initiative Specification in 2003 and an updated version in September 2005. It provides for common storage management tools and is designed to support interoperability, in which at least basic functionality of one vendor's storage equipment is made available to another's.
"SMI-S is doing well, it's growing, and vendors would be wise to use it," said Kenneth Steinhardt, chief technology officer of global products sales for EMC.
Another industry effort has emerged as well. Aperi, spearheaded by IBM Corp.; Brocade Communications Systems Inc., San Jose, Calif.; Network Appliance Inc., Sunnyvale, Calif.; Sun Microsystems Inc. and other companies, is a consortium that debuted in October for the purpose of writing a common code that implements the SMI-S.
"What SMI-S doesn't do is give you guidance on implementation. We're providing a vision to extend SMI-S," said Jamie Gruener, IBM-Tivoli storage manager.
However, in some corners Aperi is viewed as a fringe effort led by vendors trying to gain market share. The largest storage vendors: EMC, HP and Symantec, have not joined the group. "Some vendors want to try to drive the use of their own technology," EMC's Steinhardt said of Aperi.
IBM's Gruener declined to comment except to say that "IBM is not driving Aperi." The group is still evolving and vendors and customers will decide for themselves if Aperi provides value to the market, he said.Natural Discord
Industry observers and small storage vendors say such discord over standards is common in the storage industry, citing similar examples of infighting over the Internet-Small Computer System Interface standard released in 2003. But here's the bottom line: Vendors' slowness in implementing SMI-S may have left the door open for the new effort, Aperi, to gain converts and possibly speed standards development.
"There's no question that there will be something of a battle on who's leading the industry on standards," said Garth Gibson, co-founder of Panasas Inc., a Fremont, Calif, storage vendor. "If Aperi does a good job, it could take a leadership role."
Meanwhile, storage providers say government IT managers will benefit from open standards and interoperability, no matter which industry group takes charge.
"For customers, it will reduce complexity, allow them to use best of breed and make it less difficult to use five different companies for storage," said David Kresse, general manager of storage management applications for Network Appliance.
"Customers will have to weigh in with the right approach. The government has some of the biggest challenges because it has some of the largest data centers," IBM's Gruener said.
The promise of XML for long-term storage may not be realized without open standards, experts say. Some customers are not aware that they may lose some of their XML functionality if they have proprietary hardware or software limitations, said George Sullivan, owner of small storage company Overtones Software Inc. in Bethesda, Md.
Freedom and interoperability in storage contributes to better ownership of data.
"Do you want control of your data?" Sullivan asked. "Or do you want vendors' proprietary formats to control your data?"
Staff Writer Alice Lipowicz can be reached at email@example.com
Kenneth Thibodeau is expecting a big blip in the number of White House e-mails to be archived in 2008.
That's the year the National Archives and Records Administration's Electronic Records Archive director, already preparing for the e-mail onslaught, expects to complete electronic storage of e-mails for George W. Bush's second presidential term: up to 100 million of them and more than twice as many as the 38 million e-mails generated by the entire Clinton administration.
Most classified documents once were on paper, but no longer. For the first time, Thibodeau will be handling archiving of millions of Top-Secret White House e-mails.
"The requirements for security are a lot higher for the classified documents," Thibodeau said.
Today, the e-record archive is just getting started; initial funding is rising to $45 million in 2007 from $36 million this year. But it will need substantially higher investments in coming years to support the growing White House presidential libraries, among many other critical documents, Thibodeau said.
The e-archiving project began in 2002 as a technology demonstration but recently has gained momentum. Lockheed Martin Corp. in September won a contract to design the IT architecture for the archive, slated to expand to multiple petabytes within a few years.
Lockheed Martin recently turned in a 1,000-page design for the IT system architecture, which must be able to evolve and be updatable, expandable and secure. Federal officials are reviewing the design, a service-oriented architecture that uses Extensible Markup Language, and considering specific data formats for storage.
Data quality also is a major issue, including preserving the availability and integrity of the records over time and managing copies that are made accessible to the public, Thibodeau said.
What gives an archiving manager nightmares? There was the time during the Clinton Administration when the White House switched e-mail systems, copying millions of e-mails to the new system. In the process, six months' worth of Vice President Al Gore's e-mails were lost, Thibodeau said.
"A six-month gap in the records is something you couldn't have predicted," he said.
Fortunately, the White House was able to recover most of the e-mails in that incident. Although such experiences are becoming rare as the archive continues to test procedures to ensure data quality, similar episodes have occurred.
For example, the archive, in a test converted 4 million patents from the U.S. Patent Office into a newer data format related to XML. A year later, it converted all the data again, this time to XML.
The conversions worked well, Thibodeau said.
"We only had two data errors, and it turns out both errors were present in the original formats," he said.