Disaster recovery: Recovering from the unthinkable
- By Heather B. Hayes
- Jun 22, 2007
"You start thinking about this stuff, and it just gets so overwhelming that it almost makes your hair hurt. And then you start wondering: 'How do I ever convey this stuff to everybody?' It's not an easy task." ? Steve Hunt, SI International
"Although disaster recovery is a hot topic in contracting circles, that doesn't mean everybody has an effective plan." Tony Cole, Symantec Corp.
In the summer of 2005, Tom Shelman had good reason to be confident of his ability to prepare for any type of calamity. As vice president and chief information officer of Northrop Grumman Corp., Shelman had spent years developing and fine-tuning a disaster recovery plan that had seen his company through a number of natural disasters and other disruptions with minimal downtime. But that was before Hurricane Katrina.
The storm destroyed the data center at Northrop Grumman's Pascagoula Shipyard in Mississippi and forced the employees' evacuation from its designated backup data center at the New Orleans Shipyard. Moreover, the contractor had no way to communicate locally. Public infrastructure was non-existent there, and employees were facing their own personal disasters.
"Until then, whenever I talked about the recovery plan, I talked about data center recovery, network recovery, getting the systems back online," Shelman said. "Katrina changed my worldview, because before we could begin trying to recover the IT infrastructure, we had to recover the people."
Northrop Grumman, which returned to operational status in just two weeks, has since modified its preparedness plan to include more options for communications, mirrored redundancy for e-mail systems and the provisions for trained backup employees to recover a site remotely if those on-site personnel are incapacitated.
The company is not alone. With the 2001 terrorist attacks and Hurricane Katrina occurring less than five years apart, thinking about effective disaster recovery plans has changed.
Plans have become simpler in scope and more global in perspective. There's greater recognition that regional and organizational interdependencies must be included. Assets and processes are being better prioritized, and companies are making better use of advanced, automated backup and failover tools.
Government contractors' increasing responsibility for maintaining their federal customers' mission-critical and sensitive information is driving many of these changes, and contractors can't afford to be blindsided by unanticipated events.
"There's a lot more at stake," said Tony Cole, director of federal consulting at Symantec Corp. Government contractors have more issues to consider in a disaster recovery plan than run-of-the-mill companies, he said.
These issues include ensuring that the company remains in regulatory and security compliance even if there is a disruption in operations. Contractors also need to be able to locate government information within backup systems without delays, and they must put contract support systems at or near the top of the priority list.
Although disaster recovery is a hot topic in contracting circles, Cole said, that doesn't mean everybody has an effective plan. Some companies have not devoted the resources necessary to move beyond the initial stages. Some have started but were sidetracked before finishing. And still others end up with an attractive set of three-ring binders full of plans that are never re-visited and quickly become outdated and irrelevant.
"There is definitely room for improvement," Cole said.Think simple
For many contractors, enthusiasm for disaster recovery starts high. But it wanes when companies come face-to-face with the complexity, time and cost required to develop and maintain plans for every possible contingency, including tornadoes, cyberattacks, electrical outages, fire, loss of key employees and even simple human error.
"You start thinking about this stuff, and it just gets so overwhelming that it almost makes your hair hurt," said Steve Hunt, vice president and CIO of SI International Inc. of Reston, Va. "And then you start wondering: 'How do I ever convey this stuff to everybody?' It's not an easy task."
But it's not impossible, said Cameron Matthews, chief technology officer at Sentek Consulting Inc. of San Diego.
The key is to keep it simple, be realistic and get senior management involved at the earliest possible stage.
The latter may be a cliché, but it is critical to long-term success in disaster recovery planning, Matthews said, adding that the level of executive involvement needs to be deeper and more intense than on other projects.
"If you don't find the right drivers to get them truly interested, they may give you the go-ahead, but what I've found is they'll want you to produce the plan and then [they will] put it on the shelf," Matthews said. "In this situation, true executive buy-in means that they're fully involved in the process, they understand what it all means and they understand the implications of what you're doing now and what you have to keep doing."
SI has revamped its approach in recent years by developing a universal set of steps that work in any disaster situation, Hunt said. That goes against traditional thinking on preparedness planning, which focuses on developing separate tactics for each possible scenario, he added.
"That [approach] gets to be very, very complicated, because you have to keep track of which plan covers which circumstance," Hunt said. "What we've learned is you really do have to keep it simple and keep it organized."
The company's master plan covers basic procedures for mobilization, such as making the decision to declare a disaster, communicating to employees and senior management, determining which steps need to be taken from the start and knowing how to prioritize assets effectively.
These procedures, among other things, make it much easier to train employees ahead of time. The approach also results in less confusion during what can be a high-pressure event and enables effective action in the face of an unimagined disaster.
SI has dealt with 30 separate incidents that disrupted operations during the past three years, including Hurricane Katrina and a lightning strike that destroyed the phone system at a critical facility supporting the Space Command in Colorado. "It's a startling list when you look at how many diverse things you have to deal with, but we ... handled each of those situations essentially with the same structure," Hunt said.
Details are also critical. For example, SI's disaster recovery plan also includes supplementary online information for dealing with certain specific events, such as hurricanes, fire and cyberattack.
It also considers interdependencies across facilities, contracts and customers. An example, Hunt said, is the company's work with the Space Command. SI hosts mission-critical functions for the agency at a facility in Colorado Springs. Employees work on-site at Peterson Air Force Base. There also are other projects in the region.
"Any circumstance could affect multiple different organizations, multiple contracts and multiple customers, and we have to find a way to make sure we keep all of those things in mind and not just concentrate on one plan at a time," Hunt said.
Frank Jablonski, director of product marketing for storage solutions at CA Inc., said overlapping organizational complexity and connectivity require a three-pronged strategy for backing up data. There should be a copy on-site for quick recovery ? in the case of a deleted file, for example. Another copy needs to be off-site, preferably in another region. In the case of a major storm or power outage, operations can switch to that site without interruption. A third copy should be in a secure vault in case something such as an enterprisewide cyberattack affects both the operational site and the backup site.Sticker shock
A stumbling block that can stop disaster recovery planning cold crops up early in the process when planners examine different scenarios and realize the extent of the company exposure. "You get to the point where you start determining costs for mitigating risks, and that's when sticker shock hits," Matthews said.
Instead of letting that stall the effort, Matthews said, it's critical for companies to come up with strategies to deal with resource limitations. There are several that work well.
First, prioritize assets, he said. Which systems are most important to your business? For government contractors, having access to contract details and systems that support their work ? including human resources and payroll systems ? is critical.
Another cost-coping strategy, Matthews said, is to develop disaster recovery in levels. That's especially important if the contractor is offering availability guarantees to its government customers. "You've got to be realistic about what you can offer."
For example, companies could start with a silver level in the beginning, in which hardware, software, processes and training are prepared for disaster, but only the most critical lines of business are mirrored and replicated in real time, Matthews said. From there, the company could try to work up to gold and then platinum levels.
"You can basically develop an effective disaster recovery plan over time and then also spread the cost out over time," he said.
Companies should also not go it alone. Every company ? no matter how experienced ? can benefit from a third-party perspective. SI brought in an outside consultant specializing in emergency preparedness and response. "We have opened all of our procedures and all of our documentation to that individual and said, 'Help us figure out what we can do to do it better,' " Hunt said.
Companies also can purchase disaster recovery kits, which generally include templates. Cole said companies should consult the National Institute of Standards and Technology's SP800-34 publication, which provides help with disaster recovery planning.Leave nothing to chance
Once the plan has been developed, companies need to periodically put it through its paces. "Unless you test your disaster recovery plan and do it on a regular basis, then you really don't have a disaster recovery plan," Jablonski said. "You've got to make sure that the data is recoverable at the remote site, and there are no issues with data corruption or with personnel knowing how to perform the recovery."
In addition, companies need to continually test their plans against disaster scenarios, even very unlikely ones. SI conducted a succession-planning test to see how senior management and the rest of the company would hold up if its chief executive officer were killed and its vice president of corporate communications critically injured in a train wreck. Hunt is also developing a test that would put the plan through a simulated pandemic.
"The [tests] have to be meaningful examples that you may not want to deal with, but are plausible," Hunt said. "We hope that they never happen, but if they do, we've got to know whether the plan and the company can handle it."Heather B. Hayes is a freelance writer in Clifford, Va.