Disaster Recovery

There are six main steps to designing any good disaster recovery plan. These steps concentrate on how the CIO or CTO can enable the IT department to properly execute each step.

  1. Define (un)acceptable loss. How does the CEO/CIO decide how much to budget for a disaster recovery project? By first deciding how much it will cost if they don’t have one.
  2. Backup everything. Do you know how much of your data is not being backed up and why?
  3. Organize everything. Your company has everything on tape. Can you find the tape you need when disaster strikes?
  4. Protect against disasters. Most people only think about natural disasters when creating a disaster recovery plan. There are nine other types of disasters and you have to protect against all of them. Learn what types of disasters strike your area and how your company can protect against them.
  5. Document what you have done. Learn innovative ways that your company can document its disaster recovery plan that ensures that this documentation is available after a disaster.
  6. Test, test, test. Most disaster recovery plans fail because they are not tested. Learn how other companies are testing their disaster recovery plans. There are ways to do this that won’t swallow the entire IT budget!

Disaster recovery (DR) plans need to be flexible and scalable to address a broad range of disruption scenarios. The same goes for business continuity (BC) plans. Both plans also need to be tested regularly to ensure that the technology, processes and people all work together with as minimal disruption to the business as possible when a disaster strikes.

This data center disaster recovery planning guide focuses on best practices for setting up a DR plan. Discover the most important factors in a successful data center DR plan, who should be involved in the planning process and how to get started.
Developing a data center disaster recovery plan

Once you have analyzed the data center and have identified potential risks to operations, prioritize the risk scenarios in order of severity, potential damage and likelihood of occurrence. This can be used to focus the plan’s response activities in the proper sequence for the situation.

Using the structure noted in the National Institute of Standards and Technology’s SP 800-34 standard, “Contingency Planning Guide for Information Technology Systems,” we can expand those activities into the following structured sequence of activities:

1. The data center plan development team should meet with the internal technology team, facilities department, utility service providers and relevant vendors to establish the scope of the activity, e.g., internal and external threats, internal and external assets, third-party resources, and linkages to other offices/clients/vendors. Be sure to brief senior management on these meetings so they are properly informed.

2. Gather all relevant infrastructure documents, e.g., building floor plans, building site plans, utility diagrams, HVAC diagrams, network diagrams and equipment configurations.

3. Obtain copies of existing IT disaster recovery plans. If these do not exist, proceed with the following steps:

  • A. Work with management to determine the most serious threats to the data center infrastructure, e.g., fire, human error, loss of power, flooding, system failure, severe weather.
  • B. Identify what management perceives as the most serious vulnerabilities to the data center, e.g., insufficient backup power, minimal building security, proximity of the data center to a flood plan.
  • C. Review the history of data center outages and disruptions and how the firm handled them.
  • D. Identify what management perceives as the most critical data center assets, e.g., server farms, storage systems, network infrastructure, staffing.
  • E. Determine the maximum outage time management can accept if the identified data center assets are unavailable.
  • F. Identify the operational procedures currently used to respond to critical data center outages.
  • G. Determine when these procedures were last tested to validate their relevance.
  • H. Identify emergency response team(s) for all critical data center disruptions. Determine their level of training, especially in emergencies.
  • I. Identify vendor emergency response capabilities: if they have ever been used; if they were, if they worked properly; how much the company is paying for these services; the status of data center maintenance contracts; the presence of service-level agreement(s) if used.

4. Compile results from all the assessments into a gap analysis report that identifies what is currently done versus what ought to be done, with recommendations as to how to achieve the required level of data center preparedness, and the investment required.

5. Have management review the report and agree on the recommended actions.

6. Prepare data center disaster recovery plan(s) to address critical assets, e.g., hardware and
software, data storage, networks.

7. Conduct tests of plans and system recovery assets to validate their operation.

8. Update data center DR plan documentation to reflect changes.

9. Schedule next review/audit of data center disaster recovery capabilities.
Important data center disaster recovery plan caveats

When building a data center DR plan, keep in mind the following guidance:

  1. Obtain senior management support so your plans can be funded.
  2. Take the data center DR planning process seriously: Plans don’t have to be dozens of pages long; rather, they need the right information, and that information should be current and accurate.
  3. Consider using standards as part of the process, including NIST SP 800-34, ISO/IEC 24762:2008 and BS 25777:2008, as they provide a useful structured format for plans, as well as guidance on the issues to address. This aspect is particularly important if plans will be audited.
  4. Keep the planning process simple by gathering and organizing accurate information.
  5. Review results with key departments, such as IT and facilities, to ensure your assumptions are correct.

Data center disaster plans help protect a significant investment for most organizations. While some firms address data center recovery by building a second data center or leasing specially equipped space at a third-party facility, a careful assessment of data center operations and risks is an important starting point in a DR program.