The really big picture

The hulking size of the multiple-petabyte Electronic Records Archive, when it becomes operational in 2007, does not seem to faze Project Director Kenneth Thibodeau.

           
The hulking size of the multiple-petabyte Electronic Records Archive,
when it becomes operational in 2007, does not seem to faze Project Director
Kenneth Thibodeau.


           
And he is unruffled as he describes the volume of the National Archives
and Records Administration's project as likely to be close to a thousand
petabytes a few years later.


           
"It will need to scale up to one exabyte by 2021," Thibodeau said.


           
What is mind-boggling to him is that the archive eventually will need to
manage trillions of separate data objects, ranging from tiny files such as
single e-mails to huge databases containing terabytes of information.


           
"We may need to manage up to 10 trillion objects," Thibodeau said.
"That is one of the most frightening things about it. ? That is beyond
current capabilities."


           
The Electronic Records Archives is one of the largest projects in the
world in the field of long-term, accessible data storage and archiving. It is
important not only for its size, but also for its strategic approach, which is
being closely watched by industry as it meets a demand for long-term, accessible
storage solutions.


           
Two teams ? headed by Lockheed Martin Corp. of

Bethesda

,
Md.


, and by Harris Corp. of

Melbourne

,
Fla.


? are competing for the archives contract.


           
The project started in 1998 as pure research, until it was demonstrated
in 2002 that it is possible to build a logical architecture to archive data that
is "scalable without limits," Thibodeau said. That solution, which uses
Extensible Markup Language, helped spur the project forward, although XML is not
required as part of the solution, he said.


           
"The information must be independent of the system," Thibodeau said.
"XML is a format that makes that possible."


           
The solution must be compatible across all agencies. Although, on
average, only about 2 percent of all federal records are deposited in the
National Archives, for the White House it's 98 percent, Thibodeau said.


           
The e-records project is extremely complex, dealing with an enormous
range of applications and formats including e-mail, photographs, imaged
documents, audio files, geospatial records and multimedia presentations, among
others.


           
The State Department alone produces about 1 million digital diplomatic
messages a year that require archiving, many in telegram form, he said.


           
Another challenge lies with the Navy, which typically has archived sets
of blueprints and drawings for each vessel. But with most ships relying on
digitally machined parts, there is a quest to preserve records of those
machining applications so they can be reproduced in 50 years, Thibodeau said.


           
"No one knows how to archive the manufacturing processing data," he
said. "But if I cannot remanufacture the part, I cannot keep the

system working."


? Alice Lipowicz


NEXT STORY: IT Fund scrutinized