Get the record straight

IT storage managers should be happy; storage costs per gigabyte are plummeting. But ... if only the demand for capacity weren't soaring even faster.This leaves agencies with a two-pronged problem to solve: how to cut storage costs while giving users timely access to data.When it comes to access, the U.S. Geological Survey's data center in Middleton, Wis., is ahead of the game. The center uses 40G of solid-state disk devices from Texas Memory Systems Inc. of Houston to hold its most active databases in RAM."The solid-state disks hold the data that is high priority to give to customers fast, or data files that are hot and get hit a lot," said data center director Harry House. "If you are input/output bound, solid-state disk is a godsend. You can achieve some real performance breakthroughs with it."At the National Archives and Records Administration, long-term preservation of electronic data is of predominant importance. NARA awarded Lockheed Martin Corp. a $308 million contract to build an electronic records archive system."NARA's in the business of archiving information for the life of the republic, and the electronic records will continue to grow," said Clyde Relick, Lockheed Martin's program director for the contract. "We are building a system that has to be able to incorporate new technology and be scalable [to include] unlimited amounts of storage."Whether the concern is having enough disks for today's needs or an archive to last the millennia, proper system design is key.The term "archiving" is used for two purposes. It can refer to part of a standard backup or disaster recovery program. It also can refer to making the data available for long-term access.One increasingly popular approach to balancing storage costs and availability is multitiered storage. According to New York consultancy 451 Group Inc., costs for primary disk storage run around $2 to $6 per gigabyte, secondary ATA drives around 50 cents per gigabyte, and tape is only 12 cents per gigabyte.For high availability, everything should be on disk. For best cost containment, everything should be on tape. The first route may strain the budget, but the second violates the need of availability.Tape, in general, is thought more and more to be substandard for backups, said Simon Robinson, storage research director at the 451 Group."Users like it because it's cheap, but apart from that, it's inherently unreliable and delivers poor performance," he said.The trick is to find the optimum balance between tape and disk. This is where information lifecycle management comes in.Information lifecycle management is a strategy for automatically moving data from one storage tier to another to cut costs of storing infrequently accessed material.For instance, notice how a bank handles a customer's deposit, said Dorian Cougias, co-author of "Backup Book: Disaster Recovery from Desktop to Data Center" (Schaser-Vartan Books).For the first few weeks, the teller can produce a copy of the transaction, said Cougias, who also is CEO of Network Frontiers LLC of Oakland, Calif. Then it goes onto a system that the bank manager can access. After six months, the data is archived, and the customer has to put in a request and wait to receive a copy. Eventually, the data is erased or destroyed.The Ninth Circuit Court of Appeals relies on 400G of single-tier storage, which is backed up on tape, said Robert Eckstein, the court's assistant network manager. That system is adequate for now, but may not meet future needs, he said."We are looking at information lifecycle management," he said. "We expect that our storage will increase significantly when our court is on the new Case Management/Electronic Case Files system. At that point, we will need a much more in-depth storage system."The simplest way to implement information lifecycle management is to store all the data on tier-one storage, and over time it to other, cheaper tiers. Because this approach doesn't meet all business needs, vendors have suggested more complex sets of rules based on the types of documents being held or how often they have been accessed. This approach, too, has its limitations.The industry has used information lifecycle management to put forth the notion "that data will be created on a certain class of storage, [and subsequently], based on policies, age or something else, will dynamically migrate to lower-cost storage," said Manish Goel, vice president and general manager of data protection and retention for Network Appliance Inc. of Sunnyvale, Calif."That is an administratively complex architectural solution and has never really taken off."That doesn't mean the basic concept of moving data through different levels is bad. But it is necessary to define a strategy appropriate to individual needs.For example, databases must be handled differently than documents. The Transportation Department's Research and Innovative Technology Administration has 15T of storage for data warehousing databases used for analysis and reporting, data collection and processing systems, and Web sites to make that data publicly available."At this time, all of our data are maintained on a single tier. We have not archived anything," said Terry Klein, deputy CIO and director of the Office of Information Technology. "We do, of course, backup all our data to tape."The Geological Survey data center also keeps its databases active, but migrates them to different types of storage based on the level of requests for data.The data most frequently requested, or requiring the fastest response times, stay on the tier-one, solid-state disks devices. Data that have a lower number of input/output requests stay on tier-two disks.Middle-tier storage consists of several terabytes of databases on Network Appliance storage devices. These are backed up to about 10T of disk storage from Excel Meridien Data Inc. of Carrollton, Texas. Finally, the data is archived to tape and moved offsite.The University of New Mexico's Health Sciences Center, however, uses conventional information lifecycle management criteria of age to migrate documents.The university has 10T of storage for general information, which largely consists of about 10 million files that users have uploaded to the central storage to backup their hard drives. About two-thirds of that is primary storage.The center uses hierarchical storage management software from CaminoSoft Corp. of Westlake Village, Calif. The software automatically moves data from the primary tier and leaves in its place a stub file, a small file that points to the document's location in the secondary tier."We do it on a simple rule," said IT systems manager Barney Metzner. "If the file creation date and last-access date goes longer than, on average, six months, we migrate it, though there are a number of exceptions for files such as databases and Power Points."Such systems are complicated, Network Appliance's Goel said. Additionally, said Metzner of the Health Sciences Center, some technicians who must deal with the complexity view information lifecycle management as a negative."I still weigh it as a positive," Metzner said. "It continues to keep us in business as our storage needs grow."Archiving for long-term access has challenges of its own. There is the need to access the data despite changing technologies and the deterioration of storage media."Government agencies face the same problems everyone does: maintaining secure and cost-effective long-term readability, physically and logically," said Michael Peterson, program director of the Storage Networking Industry Association's Data Management Forum. "Media has to be migrated every three to five years to ensure physical readability, and application data formats have to be maintained throughout revision changes, application changes, and reader changes."There is also the matter of finding the data once it has been archived. It is not the same as restoring a file from a backup tape."Backup is for mass restoration," Cougias said. "Archiving is 'Give me the needle inthe haystack, and I want it in a readable format.' "

RFP Checklist: IT Storage

» How much primary storage does your customer need? How much secondary storage, tertiary storage? Can you identify areas of duplication?

» What data formats need to be stored?

» Does the data move automatically or manually from one storage tier to another?

»Do different types of data reside on different tiers, such as databases on tier one? Or does the data migrate from one tier to another based on age, frequency of access or some other criteria?

» Will the data be archived on disk to keep it readily available, or will it be sent to tape?

» For what will the archived data be used? Through what type of application or interface will users access it?

» How will that data be indexed and searched?

» What policies will determine which data gets archived and what gets deleted?

» How long must the data be kept available? Is it the same amount of time for everything, or do different types of data have different life spans?

» What type of data classification tools will work best?

» How do you ensure data security requirements are met as the data migrates from one storage layer to another? Do you need to maintain separate physical systems for the classified data, or can you go with a software security system?

» What about maintaining confidentiality of medical or personnel records which don't depend on a security level, but a need to know?

» If the data is being stored on tape or optical disk, how long will the data on that medium be readable? What mechanisms need to be in place to copy the data onto new storage media and on what time schedule? Who is responsible for doing this?

» How will you handle changes in data formats, applications and hardware over the years to maintain data accessibility?

» Will your customer be the project lead in coordinating the different hardware and software vendors, or will there be a single primary contractor? ? Drew Robb















To store or archive




























Set some rules

































Drew Robb is a free-lance writer in Los Angeles.

NEXT STORY: The future arrives without notice