For Tech's Sake: Storage Means Getting What You Want Now
- By Gary Arlen
- Sep 29, 2004
Time and space are not just for theoretical physicists.
They are fundamental factors in the expanding universe of data storage, which has become an increasingly essential ingredient in government operations. Thanks to retention policies, compliance requirements plus security concerns, effective storage procedures are climbing the "must-do" lists of agencies.
The plunging price of storage capacity and the concurrent tsunami of files (from e-mail to transaction records) complicate the implementation process for solution providers, who are trying to create future-proof systems.
Moreover, in the storage equation, time has at least two dimensions: the speed of access and the age of the retained data. With hundreds, sometimes thousands, of distributed workers tracking down information within their systems, storage designers confront contentious issues about access time and availability of old case files.
For example, at the Department of Agriculture's Food Nutritional Services, nearly 800 employees look up records; 70 percent of the agency's work involves food stamp management and school lunch programs.
Like many organizations, FNS cannot throw away aging data, nor can its staff members wait two to three minutes every time they need to dip into older files, said Harold Russell, a project manager for Wyandotte Network Inc., which recently revamped the FNS storage facility.
"We moved the [older] files to a storage area network (SAN)," Russell said. "Anything not touched in one year was moved to secondary storage."
On behalf of FNS, Wyandotte spent about three months evaluating options. The actual installation of the new storage structure took nearly three more months, largely because the agency was simultaneously installing new hardware, including a NetApp drive array plus EMC and Hitachi devices. FNS also used the period to install new back-up systems.
The storage dilemma is confronting agencies and private industry, as they grapple with the data deluge. Time is moving fast for all these organizations. In the first quarter of this year, about 247 petabytes (i.e., 247 million gigabytes) of disk storage system capacity was shipped worldwide.
That's nearly 40 percent more than the same period a year ago, according to market analysts at International Data Corp., who point out that much of this capacity is earmarked for data warehousing.
As government agencies evaluate how to determine what can be put into storage, "one year old" seems to be a consistent benchmark at many locations. Another criterion is size (e.g. two megabyte files that have not been accessed for a year), which defines what can be moved to remote storage facility.
There's no common formula, though.
"What is one person's critical data is another person's trash," said Greg Hilsenrath, vice president of Overtone Software, a Bethesda, Md., software firm that creates storage solutions widely used by government agencies.
Furthermore, how and where to structure the storage process is still a work in progress.
"Storage is not one size fits all," said Kris Domich, senior solutions architect at Dimension Data (DiData), which also works with federal and local agencies. "You cannot mix and match the storage. Interoperability has come a long way, but it is not there yet."
Domich noted that applications determine where storage systems should be installed; his company has used network-attached devices for some installations. He said that the most important considerations in selecting a storage solution are how available can you keep the data, how much can you afford to lose if it fails.
Those vital questions lead back to the issues of time and space.
For example, the New York City Department of Transportation has faced such issues in its data overhaul. Storage has figured prominently as the department has built a disaster recovery site as well as a business continuity site, according to David Shatzkes, vice president of Computer Horizons Corp.'s government solutions unit, which is handling the project.
"If we're going to do this right, we need SANs on both sides," Shatzkes said. "You replicate everything."
Much of the New York project was triggered by post-9/11 considerations for the city's Department of Transportation, whose authority ranges from bus, subway and highways to street lamps, parking permits and traffic signage.
In a remarkably candid description, Shatzkes explained that the city's Compaq SAN "didn't do what it was supposed to" and suffered "lots of failures"? which meant that city employees stopped using it.
As a result, Computer Horizons spent a year and a half on its evaluation before coming up with a solution that included DimensionData and EMC equipment.
Shatzkes said that the two EMC machines are a sixth-generation version, offering comforting reliability; each has eight terabytes of storage capacity.
Calculating Future Needs
To accommodate the time and space issues, storage developers are pushing different approaches that facilitate access to stored files. For example, Overtone Software starts with "hierarchical storage management" as part of its Information Lifecycle Management (ILM) solution for government clients.
The company characterizes its software as a "data organizing tool" that gives users access to files "without knowing that it has been moved to less expensive storage," Overtone's Hilsenrath said.
"We use tools that provide the bookmark or dotfile that is similar to a shortcut on your PC desktop," Hilsenrath said. "The dotfiles don't carry a link into where the data is. ? You can open up a folder that has 100 files with shared items in it. Many of those files can be stored elsewhere, but the dotfile is sitting there" on a user's local PC. The architecture allows a user move among files seamlessly.
FNS is among Overtone's customers. Russell, the Wyandotte integrator who is handling that project, said that Overtone's value is that its technology "links to the users' directory."
"When a user clicks on the link, it pulls from secondary storage so it's like moving from disk to disk," Russell said.
Another method to reach large files is through hashing, which many vendors are developing. Hashing is a form of compression. DiData's Domich said that hashing "takes less disc space" and "holds promise for the future," although he said it's "not that great right now."
Nonetheless, he said, hashing "becomes valuable because you don't have to back up as much information." He said its major value stems from its authenticity and authentication characteristics.
"Hashing's integrity [means] that you can make sure the data is what you think it is," Domich said. For example, a piece of e-mail that is archived and hashed can be compared to the original mail to assure its authenticity if it is ever questioned.
He said organizations such as the Internal Revenue Service and Homeland Security Department, where hashing can be useful to guarantee that content has not been altered after it is electronically stored.
Just as dial tone is always available in the conventional phone network, "storage tone" is becoming an operative principle in the new data access and retention environment. But just as mobile and Internet telephony revise the concept of dial tone, so will the meaning of storage tone change as technology affects the installation of systems.
For example, designers are already weighing relative merits of SANs versus NAS (Network Attached Storage). For high-performance implementations, developers often choose a SAN, but as DiData's Domich said: "That's not to say that information on a NAS is any less critical than info on a SAN."
"If you back up data from a thousand servers, you'll need some device, probably a network-based arrangement," Domich said. "The problem is that your network during backup is unusable, and the servers are going to take a performance hit ? which means the user [experience] is going to drop."
With the growing reliance on remotely stored data, look for even greater focus on effective solutions. DiData, Overtone and their competitors and partners are reshaping the storage landscape.
And they're resolving the time and space conundrum while they're at it.
Gary Arlen is president of Arlen Communications Inc., a Bethesda, Md., research firm. His e-mail address is GaryArlen@columnist.com