Deduplication and storage tiering can keep ever-expanding data storage requirements at bay
Storage optimization is no longer an option — if your data center doesn't go with it, you will have a problem.
Efficient use of storage capacity and devices is rapidly shifting from a nice-to-have capability to a must-have tool for government data centers, according to a new survey by the 1105 Government Information Group. Storage optimization usually takes the form of storage virtualization and applies a variety of capabilities, such as deduplication and storage tiering.
Government agencies expect to optimize their storage infrastructure to tame the torrent of data that threatens to overwhelm their storage resources and budget. Even through the Great Recession of 2008-09, many organizations experienced 40 percent annual increases in storage demand, according to IDC, a market research firm, and the flow is already climbing toward the traditional 60 percent annual growth rate that happens during a healthier economy.
Organizations are not only storing more data but also accumulating different types of data: structured and unstructured. In addition to the growing number of transactions in their traditional financial and operational structured databases, agencies must handle a deluge of data from social networks and sensors, which are increasingly showing up on street corners, in equipment sheds and all over other locations. And that's on top of the myriad other forms of unstructured information that organizations must handle.
The unrelenting data growth led more than 80 percent of survey respondents to report that they are very or somewhat concerned with continuously expanding data storage requirements.
Their concerns are not misplaced. “Data growth is placing continued demands on the IT infrastructure by requiring more processing, network, I/O bandwidth, and data storage capacity,” says Greg Schulz, senior analyst at StorageIO, a consulting firm, and author of several books on IT management, including the recently published "Cloud and Virtual Data Storage Networking." The problem: more data is being generated, processed, moved, stored and retained in multiple copies for longer periods of time, he observes.
A large majority of the 321 officials of federal civilian, military, state and local agencies who responded to the 1105 Government Information Group survey indicated that their organizations were adopting or investigating storage virtualization as a solution to the growing gap between storage demand and storage budget. For more information about the demographic characteristics of the survey respondents, read the "Data Center Optimization Survey Methodology" section of the first article in this series, "How to relieve the pressure of data center consolidation."
Storage virtualization is a software capability that abstracts physical storage resources, making transparent the location of stored data and the physical attributes of the particular storage device, Schulz notes. With virtualized storage, the applications only know they have sufficient storage available to them rather than knowing that a specific set of drives contains only the data for those applications. Behind the scenes, the storage administrators and automated storage management tools can move data around depending on its need for protection, performance, backup or other attributes. Data that does not require high-performance, costly disk or semiconductor storage can be moved to lower cost tape or optical drive storage. The applications and users don’t need to know; the storage system will retrieve and deliver the stored data as it is requested. “The application doesn’t care where the data resides as long as it gets the data it needs when it needs it,” Schulz adds.
Although demands for storage are growing dramatically, storage budgets in general are not increasing nearly enough to keep pace. Hence, the interest in storage virtualization. In fact, storage virtualization promises to improve storage device utilization rates as much as server virtualization increases server utilization.
In addition to storage virtualization, organizations are turning to deduplication — 25 percent are using it now or plan to use it within six months, while an equal number plan to deploy it within the next few years. Many are using both inline and post-processing deduplication. Inline deduplication occurs as the data is being initially stored to disk and, as such, slows down the storage process and consumes additional server resources. Post-processing deduplication occurs after the data has been stored. With post-processing, there is no waiting for the hash calculations and lookup to be completed before storing the data, which avoids degrading storage performance and allows other processing options, though it might unnecessarily store duplicated data for a short period of time.
Storage tiering automatically moves data among various types of storage, and it is another storage strategy that takes advantage of storage virtualization. In the past, hierarchical storage management (HSM) meant shifting little-used data to magnetic tapes. Infrequently used data was stored on optical drives, and the most frequently used data were stored on the most expensive and fastest media: hard drives. Unlike HSM of the past, tiered storage today relies on low-cost, high-capacity disk drives, which might act as virtual tape. With tiered storage, organizations classify the data and define policies for handling the different types of data. Then an automated storage resource management (SRM) tool automatically moves the data based on the applicable policy. Policies may be built around frequency of data access, age of the data, ownership of the data, compliance with regulatory mandates or other attributes. This technology is still in an early stage of adoption. Just more than one-third of the organizations surveyed noted that they were using or considering storage tiering, while the jury was still out for almost half the respondents.
As with server virtualization, not every organization is ready to embrace storage virtualization. More than one-quarter of the respondents whose organizations were not interested in storage virtualization said they plan to stay with what they have because it works for them. Others cite factors ranging from costs and application support to the need for training as barriers to storage virtualization.
However, the majority of respondents are clearly embracing storage virtualization. In fact, it will become a cornerstone of government agency data storage management in the next few years and a vital part of data center optimization.