Developing A Contingency Planning Policy Statement Information Technology Essay

Published: November 30, 2015 Words: 3496

Disaster Recovery Planning plays a most vital part in major industries where stored information or so called data plays the key role. Every business organization can be subjected to serious incidents or accidents which can prevent it from continuing day-day or normal operations and may cause in huge loss in terms of time as well as money. These incidents can happen at any day and at anytime, these causes can be natural calamities, human errors and system malfunctions. All Disaster Recovery planning needs to encompass how employees will communicate, where they will go and how they will keep doing their jobs. The details can vary greatly, depending on the size and scope of an organization and the way it does business. For some businesses, issues such as supply chain logistics are most crucial and are the focus on the plan. For others, information technology may play a more pivotal role, and the Disaster Recovery plan may have more of a focus on systems recovery. In this paper we are going to primarily discuss about steps to implement an actual disaster recovery plan. Below is the brief description of how the plan is implemented.

Developing a contingency planning policy statement

Conducting the business impact analysis (BIA)

Identifying preventive controls

Developing recovery strategies

Developing a contingency plan

Planning, testing, training and exercises

Planning maintenance activities

All the above steps are planned and performed taking all factors of the business into consideration. We shall also discuss the limitations of implementing such a plan. We shall also include real time examples and the successful results yielded by implementing the Disaster Recovery Plan. So this plans would act like a 'backup recovery process" or a kind of 'business continuity solution' while the actual system goes offline or corrupted.

WHAT IS DISASTER RECOVERY PLANNING?

Disaster Recovery Planning plays a most vital part in major industries where stored information or so called data plays the key role. Every business organization can be subjected to serious incidents or accidents which can prevent it from continuing day-day or normal operations and may cause in huge loss in terms of time as well as money. These incidents can happen at any day and at any time; these causes can be natural calamities, human errors and system malfunctions. The terrorist 9/11 attacks on the United States are one of such great examples in history for many organization decision makers to focus on the need for disaster recovery. It may be written for a specific business process or may address all mission-critical business processes. Business continuity and disaster recovery are critical components used to ensure that systems essential to the operation of the organization are available when needed. Before September 11, 2001, most organizations thought of a disaster in terms of natural calamities that disrupts operations because essential personnel cannot get to work. Recent events have made it clear that the word "disaster" has an entire complete different definition in terms of business continuity. Events may occur in such a way that the organization can take months or even years to recover.

During the late 1960s, and into the 1970s, consumers of communications, computing, and information technology (IT) began to recognize that their rapidly growing IT operation centers were becoming Single Points of Failure (SPOF). People realized that IT interruptions could potentially have significant impacts on the business continuity of critical operational functions. The continuity of the business itself could even be threatened. Computing hardware, supporting network infrastructures, and software platforms were full of SPOFs during those times. In the 1960's and 1970's, IT engineers knew how to build resilient computer systems. Actual operational examples of such configurations were developed, and were used in government, military, or research. The typical business or government agency of that time could not however, cost justify the high investment needed to eliminate SPOFs for existing technologies. More economical and practical alternatives were needed as information technology became essential to many public and private organizations. There is a greater risk of computer failure the more an organization depends on computers. In the late 1970s, through contracts, vendors began offering shared-use access to computing recovery environments. The fees were substantially less than the costs of duplicating critical computing resources by customers. Though SPOFs were not totally eliminated, this cost-effective means to recover from major outages became the standard for IT recovery during that period and it remains a major sector of the industry today. Through the 1980s, several companies of various sizes entered the market, offering similar IT operational recovery services. IBM entered the market in a big way in 1990, further adding to the available options. IBM's entry legitimized the industry by effectively endorsing these services as a practical means for customers to recover major and critical portions of their IT infrastructures. The role of business continuity planner or disaster recovery planner evolved from the need to integrate hot site services into a customer's operational environment. This was accomplished by documenting required capabilities and usually included off-site backup data storage services, which were used for testing and in case of a real disaster occurred. Whether documented by the customer's personnel or by consultants, the planner's role was to translate these services into real operational recovery capabilities for a customer's critical IT assets. Disaster recovery is essential in restoring systems and data to a state of normalcy prior to the incident.

The disaster recovery process consists of defining rules, processes, and disciplines to ensure that the critical business processes will continue to function if there is a failure of one or more of the information processing or telecommunications resources upon which their operations depends. The following are key elements to a disaster recovery plan:

Developing a contingency planning policy statement

Conducting the business impact analysis (BIA)

Identifying preventive controls

Developing recovery strategies

Developing a contingency plan

Planning, testing, training and exercises

Planning maintenance activities

CONTINGENCY PLANNING POLICY STATEMENT

This is probably one of the most important of the seven key elements. The basic description of this policy is that it provides the authority and guidance necessary to develop an effective contingency plan. The terms "authority and guidance necessary" of the above definition is vital to the success of the planning venture. The policy statement is really talks about communication between management and those responsible for developing the plan. Keeping the driving goals of the project in view and the level of financial and other resources, the effort commands and the particular people who are to be responsible, the policy statement gives planners everything they need to work out options that can achieve the organization's goals. It also provides a basis for planners to communicate back to management either their success or the need to reassess the goals or the resources, should that be necessary. The importance of this step extends well beyond the stage of DR plan development and implementation, because sometimes most of the cost is incurred after the initialization phase, during testing and maintenance and of course, in the worst case, during and after a disaster that proves the inadequacy of the plan. This is probably a good time to point out that you may need a couple of cycles through the steps. The first version of the policy may set goals that turn out to be impossible under the resource constraints specified. You will need to reevaluate the policy and scale down goals, scale up resources, or attempt some radical rethinking. The important point to remember always in disaster recovery planning is that reality is your partner and like it or not, you must cooperate with it, not fight it.

Here are the key points that the policy statement should address:

What kind of disaster we intend to cover?

What do you want to accomplish?

How much time would we need in order to get things back to normalcy?

Where does the responsibility of the plan and planners end?

How to take advantage of the crisis in order to improve your organization image with the stake holders?

What level of system should be covered in case of any crisis?

What is the maximum level of resources that the plan can command during the preparation, implementation, testing and maintenance?

BUSINESS IMPACT ANALYSIS

The purpose of this step is to ensure that you protect everything without any wastage of resources. The goal is to determine to what extent must be recovered and how fast the above information will be used to develop recovery strategies. The output of this step is a prioritized list of critical data, roles and IT resources that support your organization's business processes, together with maximum outage times for each of the critical systems. We need to identify the key business processes that act as a backbone to the organization's ability to carry out its business and the requirements that drive these processes. It is very important that this be done from the outside-in analysis, starting from the standpoint of external stakeholders, whether they be customers of the company, outside suppliers, or internal departments within the company that depend on the IT services you provide. It is also important that those actually involved in the business processes must be engaged in the planning process which includes external stakeholders, the internal staff who deal with them and even those who work with operational support of the process. The remaining analysis is carried out for each of the processes identified, with two distinct phases, one that works from the outside-in, the other from the inside-out.

The Outside-In Analysis:

The outside-in phase of the analysis focuses on whole systems and is similar to peeling layers of an onion. At each layer, we consider the current process or system as distinct both from the users or other systems that depend on it and from other systems on which it depends. Depending on the overall complexity of your business and how it makes the best sense to divide things up in your context, you may end up with just a single layer or with many of them.

The Inside-Out Analysis

The inside-out phase focuses on resources that are required in each layer in order to provide the services that have been identified in the previous phase. Beginning from the deepest system or layer, list all IT and infrastructure resources that are required for it to function. Next, for each of these resources, determine the impact of a disruption in the availability of the resource on the functioning of the system and its ability to deliver the services on which outer layers depend. In particular, determine the maximum allowable outage time for each resource before it causes unacceptable disruption in essential functions that are essentially the point at which the availability of the system falls below the most stringent requirement of all the systems which depend on it. We must be sure to include in the analysis any indirect impact that may occur through related or dependent systems.

IDENTIFY PREVENTIVE MEASURES

A simple formula for estimating the financial risk associated with a given type of disaster (i.e. how much is worth investing in a plan to mitigate that risk) is R$ = P X C X T where P is the probability that the disaster will occur, C is the hourly or daily cost of downtime in lost productivity, lost revenue, etc. and T is the time that systems are expected to be down. One way to minimize this risk is to reduce the downtime, which is basically the primary purpose of the disaster recovery planning exercise. However, it is not the only way. The risk can be reduced as well by reducing the probability that the disaster will occur or by reducing the cost that will be incurred if it does. Both of these are types of preventative measures. It is very often that the cost of preventing a problem is far lower than the cost of fixing it after it occurs. Measures that reduce the probability of a disaster occurring range from fairly drastic, like physically moving the organization out of reach of threats such as hurricanes or floods, to the fairly mundane, such as ensuring that regular maintenance is performed on critical systems; that redundant components are built in; that sensors are installed to monitor environmental factors; that performance monitors are installed to give early warning of server malfunction; even something as simple as keeping plastic tarps available to throw over computer equipment to protect it from water damage. It is sometimes even possible to reduce the cost of downtime by reducing your organization's dependence on the system. The basic idea is to examine the potential win of removing or replacing a system entirely. What is sometimes forgotten when new equipment and systems are implemented is that the total cost of any system includes not just the upfront cost and the ongoing maintenance, but also the risk associated with it. There are times that it is better to replace a system with one that, while lower in performance, exposes the organization to significantly lower risk. While we don't have any particular procedure to offer, it is potentially very useful to spend some time in this step both for all types of disasters that you wish to protect against and for all the systems being protected, at both the full-system and component levels.

DEVELOP RECOVERY STRATERGIES

The primary task of this step is to determine how you will achieve your disaster recovery goals for each of the systems and system components that were identified in the Business Impact Analysis. It is here that you do the core work of balancing costs and benefits of the available approaches, before diving into the complexities of the full plan. This step is not about selecting specific vendors, determining exact costs, or developing detailed procedures. Rather, the purpose in this stage is to select the types of solution that you will use and to determine the scales of the costs involved. Thus, for example, you may determine that a small, critical subset of systems require a fully-mirrored and staffed alternate site ready to take over in minutes, while other systems can utilize a more traditional backup strategy which trades longer recovery time for much reduced expense.

DEVELOP THE CONTINGENCY PLAN

This step is the culmination of all your work. It is not, unfortunately, an easy step, but neither is it too complicated, as long as you have been thorough in the previous steps and you approach it systematically. The outcome of the step is both a documented plan and the completed implementation of the entire infrastructure required to enable the plan. The documentation includes background information on the assumptions and constraints that went into making the plan, as well as written documentation on specific procedures. The implementation side includes purchasing and installing hardware and software, setting up alternative locations, contracting for alternative sources of network or other communication services and so on. This step is a major project all by itself, even if the previous steps have been carried out perfectly. It will require a significant amount of time on the part of the person or team responsible for leading development of the plan, but it will also require time and effort by everyone whose systems are involved since their expertise will be required both to develop recovery procedures and of course, to test them.

The organization itself is not important - that should be adapted to best serve your needs - but all the types of information we will discuss should be present in the plan. The sections we will use are as follows:

1. Introduction: Here the main task is to document the goals and scope of the plan, along with any requirements that must be taken into account whenever the plan is updated.

2. Operational Overview: The purpose of this section is to provide a concise picture of the plan's overall approach. It contains essentially two types of information: (1) a high-level overview of the systems being protected and the recovery strategies employed and (2) a description of the recovery teams and their roles.

3. Notification/Activation Phase: This phase defines the initial actions taken once a system disruption or emergency has been detected or appears to be imminent. This phase includes activities to notify recovery personnel, assess system damage and implement the plan. At the completion of this phase, recovery staff will be prepared to perform contingency measures to restore system functions on a temporary basis.

4. Recovery Phase: This is the second of the three major sections documenting actual recovery operations, but it is the one that most of us have in mind when we talk about a DR plan. This is the section of the plan that documents in detail the solutions to be used to recover each system and the procedures required to carry out the recovery and restore operational capabilities.

5. Reconstitution Phase: This is the last of the three sections of the plan. In this phase "recovery activities are terminated and normal operations are transferred back to the organization's facility. If the original facility is unrecoverable, the activities in this phase can also be applied to preparing a new facility to support system processing requirements.

6. Appendices: The appendices should contain any information that (a) is necessary as reference material during recovery, (b) may be necessary during any revision of the plan, or (c) documents legal agreements.

PLANNING TESTING TRAINING AND EXERCICES

Over time, things change. Hardware components are replaced, software is upgraded, networks are reconfigured, data sizes grow, and people come and go. All this is a normal part of the life of an IT environment. And all of it can impact the performance of your disaster recovery systems. Although these systems were fully tested when first installed, the dynamic nature of the environment makes it critical that testing continues to take place regularly and that personnel training be up to date.

There are many different types and levels of testing. Generally speaking, they span two key dimensions: scope and realism. Scope refers to the degree to which you are testing a full system or just individual components. Realism refers to the degree to which you are performing exactly the procedures that you would during a disaster - a class room role-playing test in which you talk through steps without actually doing them is one extreme and a full execution the other. In both cases, one side of the spectrum tends to be less expensive and less disruptive to day-to-day operations but also less reliable in its results.

Proper training is equally vital. Training in disaster recovery procedures should be considered part of the regular orientation of new hires if they have any role at all in implementing the plan. Key disaster recovery personnel should undergo frequent enough training that they are intimately familiar with the procedures that they will have to carry out under the plan.

PLAN MAINTAINENCE

After developing a disaster recovery plan, it is also worth the effort to ensure that the plan accurately reflects current requirements and systems. There are three natural points at which the plan can be reviewed: during testing, in a regular annual or semiannual review devoted specifically to the task of review and when changes are made in either the IT systems being protected or in the business processes they support. The first two fall directly under the purview of those responsible for disaster recovery planning and so can be planned for directly. The last requires that consideration of the impact of changes on the disaster recovery plan be introduced as a standard consideration in procedures that are outside the scope of direct concern of those responsible for the DR plan.

SUMMARY

The world is changing and organizations need to prepare for natural or manmade disasters that could disrupt business processes. Customers and millions of dollars could potentially be lost and never recovered if business processes are disrupted. The Business Continuity Plan resumes business processes and the Disaster Recovery Plan resumes the IT systems. The objective of a Disaster Recovery Plan is to restore the operability of systems that support mission-critical and critical business processes to normal operation as quickly as possible. Business continuity planning integrates the business resumption plan, occupant emergency plan, incident management plan, continuity of operations plan, and disaster recovery plan. Personnel from each major business unit should be included as members of the team and part of all disaster recovery planning activities. These people need to understand the business processes, technology behind those processes, networks, and systems in order to create the disaster recovery plan. Applications and systems are identified by the team that is mission-critical and critical to the organization.. The disaster recovery team will be responsible for training, implementing, and maintaining the plan. They will possess unique skills, knowledge, and abilities that should be updated in the plan. A Disaster Recovery Plan that is well developed, trained on, and maintained, will minimize loss and ensure continuity of critical business processes in the event of disaster.