E-documents need e-preservation

<FONT SIZE=2>The Information Age has spawned an archivist's nightmare. Computers make it easier to create records, whether e-mail or formal documents. But storing them for the long haul wasn't part of the original plan.</FONT>

[IMGCAP(2)]

"The lack of a common, recognized and accepted standard preserving electronic records has led to the loss of valuable and historical records and has also created a growing sense of urgency within the archiving community." ? Paul Hughes, Adobe Systems Inc.

Henrik G. de Gyor

"The key to making [the standard effort] work is leadership and organization. I think everyone involved will see that this will go through." ? Basil Manns, senior physical scientist in the Preservation Directorate of the Library of Congres

Henrik G. de Gyor

Government, industry group to build on Adobe's PDF formatting

The Information Age has spawned an archivist's nightmare. Computers make it easier to create records, whether e-mail or formal documents. But storing them for the long haul wasn't part of the original plan.

"The lack of a common, recognized and accepted standard preserving electronic records has led to the loss of valuable and historical records and has also created a growing sense of urgency within the archiving community," said Paul Hughes, Adobe Systems Inc.'s director of public affairs, at a Feb. 26 program on electronic records preservation in Washington.

But work is underway by a group of public- and private-sector information management and archiving specialists to create a new, international standard for the preservation of electronic documents.

The group plans to pursue International Organization for Standardization certification for PDF/A, the new standard. The standard will set guidelines for archiving and preserving digital documents in Adobe's Portable Document Format. The standard will be designed to meet archivists' needs in several ways. For example, the standard should require elimination of passwords or encryption, discourage the use of embedded, executable code and encourage the embedding of fonts in the document, according to officials involved with the project.

If fonts aren't embedded, for example, "font substitution could take place that would alter the look of that record. Many people believe there is a lot more to preserving records than the content -- it's also the original look and feel of the record," said Melonie Warfel, Adobe's director of World Wide Government Programs and Standards.

The PDF/A standard effort is being led by two nonprofit organizations: the Silver Spring, Md., Association for Information and Image Management International and Reston, Va., NPES: The Association for Suppliers of Printing, Publishing and Converting Technologies. Group members hope the new ISO standard will be publicly available in 2005.

The need for an archival standard is particularly acute in the federal government, where 4,800 records formats are used, Hughes said.

Desktop computers have enabled the capture, alteration and dissemination of much more information than was possible 20 years ago, when paper filing and capturing documents on film were common preservation methods. But the ease of document creation has made archiving more difficult, because so many more documents are produced today, officials said.

The 2000 Census, for example, generated 600 million pages of information that will transfer to the National Archives and Records Administration -- more than five times the amount of data that the Archives has ever captured and processed, according to the agency. In addition, federal agencies face an Oct. 21 deadline under the Government Paperwork Elimination Act to allow the public, when practicable, to submit, maintain and disclose required information electronically.

"In an environment where you can create information so easily, how do you make sure that information will be accessible to people down the road?" asked John Mancini, president of AIIM. "If we were going back to the moon today, it would be very difficult to do because a lot of the information is not accessible by modern machines."

Mancini said PDF/A could be one solution to the problem. The PDF format, which retains the content and format of documents as they were created, is widely used in the public and private sectors. More than 20 million PDF documents are publicly available on the Internet, according to San Jose, Calif., Adobe.

"The idea is that you could take PDF, define the core technological characteristics appropriate to the archiving environment, and make that so it can stand up independent of the means by which it was created," Mancini said.

Most documents are dependent on the software used to create them. For example, Mancini said, a document created using WordStar, a word-processing program popular in the 1970s and 1980s, could not be opened and read today in the way it was originally conceived.

"The trick with PDF/A is to create a platform-independent version of PDF that would be possible to look at in 30 years if you didn't have Acrobat any more," he said. Adobe Acrobat is the software suite that Adobe sells to create, view and enhance PDF documents. Other vendors have used the publicly available specification for PDF to create their own PDF viewers.

Government officials said there is no one-size-fits-all method for archiving federal electronic documents, but PDF/A is promising.

"The key to making [the standard effort] work is leadership and organization. I think everyone involved will see that this will go through," said Basil Manns, senior physical scientist in the Preservation Directorate of the Library of Congress. Manns worked on a similar project in 1994. Because of slow government action, that public-private effort didn't take off, he said.

Hundreds of people are involved in the PDF/A effort, including National Archives staff. The agency will issue on March 31 its own guidance for transfer of PDF documents to the archives as part of a cross-agency e-government project for electronic records management, said Mark Giguere, lead for information technology policy and planning at the agency.

Then, instead of beginning to develop future design requirements for transfer of PDF documents, the archives will take advantage of the work being done by the PDF/A working group, Giguere said.

"At a minimum, whatever comes out of it might serve as a valid starting point for what NARA might want to consider," Giguere said. "If [the PDF/A standard] is good enough for agencies with long-term temporary records, it would probably be good enough for us for archival purposes, although we haven't made that determination yet," he said.

Some contractors are skeptical that the PDF/A standard will be widely accepted by government.

Jan Scholtes, president of ZyLAB Technologies BV, said his company favors converting documents to XML, extensible markup language. The Amsterdam company's software allows users to file, retrieve and manage documents.

"One of our largest customers is the U.N. International Tribunal on the former Yugoslavia," Scholtes said. "In their cases, it may take 40 years before they finally find a suspect. It's unacceptable for them to store all their data in a format that is owned by a company. They want to store their documents in a format they can download from the Internet. Then it doesn't matter if ZyLAB is around or not, because any other vendor can access the information," he said.

Carl Muller, senior vice president of systems integration services for Vredenburg, said some government agencies are hesitant to adopt a standard put forward by a vendor. Vredenburg of Reston, Va., helps numerous agencies manage their electronic records, including the National Archives.

"PDF/A is a good idea in that we need to start looking at these ideas," Muller said. "We need someone to say 'Here is a solution,' whether that is the ultimate solution or not. It's like having an 8-track recording. It's no good anymore, but if you can move it to something where you can listen to it, that's good."

Steve Levenson, co-chairman of the PDF/A working group, said he's confident the standard will be used far into the future because it is being built off the published PDF specification.

"We won't have the ease of use if Adobe goes away, but we will be able to go in and re-render these records, not in today's technology, but whatever the new technology is," said Levenson, judiciary records officer at the Administrative Office of the U.S. Courts.

And if the standard changes in the future to add new capabilities, the ISO will have final say on those changes, not Adobe, he said. *

Staff Writer Gail Repsher Emery can be reached at gemery@postnewsweektech.com.