EXERCISING THE PLAN

From Projects
Revision as of 10:32, 13 April 2014 by WikiSysop (Talk | contribs) (Created page with "Category:Disaster Recovery =GENERAL= Testing the Emergency Response Plan is an essential element of preparedness. This is the only way that we can be certain that our Pla...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

GENERAL

Testing the Emergency Response Plan is an essential element of preparedness. This is the only way that we can be certain that our Plan will work as expected when needed. This gives us the opportunity to verify that the procedures will fit the incident.

Partial tests of individual components and recovery plans of specific teams will be carried out on a regular basis, usually annually or semi-annually. A comprehensive exercise of the continuity capabilities and support by the designated recovery facilities should be performed not less often than on annual basis.

After exercise procedures have been developed, an initial test of the plan should be performed by conducting a structured walk-through test. This exercise will provide additional information regarding any further steps that may need to be included, changes in procedures that are not effective, and other appropriate adjustments. The plan should be updated to correct any problems identified during the exercise and it should be tested again. Initially, testing of the plan should be done in sections and outside of normal business hours if disruptions to the overall operations of the organization could occur.

The knowledge and procedures for your all of your emergency response teams must be regularly tested, as well as their ability to work together and communicate effectively. Paper-and-pencil simulation testing has proven an effective way to keep team membership lists current and to test team members’ knowledge and decision-making abilities.

It is essential that the plan be thoroughly tested and evaluated on a regular basis. Procedures to test the plan should be documented. The tests will provide the organization with the assurance that all necessary steps are included in the plan. Other reasons for testing include:

  • Determining the feasibility and compatibility of backup facilities and procedures
  • Identifying areas in the plan that need modification
  • Providing training to the team managers and team members
  • Demonstrating the ability of the organization to recover
  • Providing motivation for maintaining and updating the Emergency Response plan

EMERGENCY DRILLS

The conduct of drills and exercises will become an integral part of the Emergency Response Plan. These drills and exercises are required in order to test the functionality and effectiveness of each emergency response while simulating an actual emergency environment. Planning, coordinating, and evaluating emergency drills and exercises can be a major undertaking involving significant resources, both within and outside the The Company organization.

Exercises consist of duties, tasks or operations conducted similar to the way they would be performed in a real emergency. However, the exercise performance is in response to a simulated event. Therefore, they require input to emergency personnel that motivates a realistic action.

The Team Leader of the Exercise Management Team will work closely with each of the other teams. This Team will have the responsibility to plan, prepare, organize, and develop the emergency drill and exercise process for the Emergency Response Plan and to report to management on the outcome of the exercises. Scheduled drills should occur semi-annually during development stages of the total Emergency Response Plan and annually after implementation. A variety of exercises will ensure operational readiness.

EXERCISES

Exercises are activities designed to promote emergency preparedness; test or evaluate emergency operations, policies, plans, procedures or facilities; train personnel in emergency management duties; and demonstrate operational capability. Depending on the nature of the exercise, representatives from the appropriate Support Teams and Response Teams will be involved.

Walk-Through Exercise

A Walk-Through Exercise is a high-level exercise in which the team talks through what the members will do in various scenarios, by specifically following the logical flow of the documented DR Plan but not actually executing the steps..

Team members will practice problem-solving for emergency situations. Participants will practice a coordinated, effective response while conducting ongoing discussions and critiques of appropriateness of actions taken and decisions made. This approach permits breaks before new messages are delivered in order to discuss proper response.

Typical objectives of this type of test are any one or combination of the following:

  • to verify the components of the Emergency Response Plan being developed prior to delivery to ensure completeness
  • to prepare for a simulation of component or integration testing to:
  • ensure readiness for ‘live’ test
  • ensure full integration of interfaces
  • ensure team member preparedness
  • to train new team members
  • to maintain Emergency Response team members’ preparedness

Functional Exercise

A functional exercise is a simulation of an emergency that includes a description of the situation, a timed sequence of messages, and communication between players and simulators. Emergency Response Team members will practice a coordinated, effective response in a time-pressured, realistic emergency simulation, while having individual and system performance will be evaluated.

Simulation Exercise

A Simulation Exercise, following a predetermined scenario, actually executes Emergency Response Plan steps for a single component or integrated components, e.g.,

  • Call team members on contact list to see if phone numbers are correct
  • Execute Emergency Response jobs and compare results to validate procedures
  • Execute recovery of a single piece of equipment

Typical objectives of this type of exercise are any one or combination of the following:

  • to demonstrate the accuracy of the execution of the Emergency Response Plan and/or
  • to train and cross-train team members

Full-Scale Exercise

A full-scale exercise adds a field component that interacts with a functional exercise through simulated messages. The purpose of the full-scale exercise will be to test the deployment of a seldom used resource.

Hot-Site Testing

Testing the hot-site is unique. It is a situation in which you will go offsite to the hot-site and exercise all components of your plan to ensure that they will work when needed. Suggestions as to how to do that follow:

Confirm a date from the hot-site provider, and then begin having weekly meetings of your affected teams and business units-about six weeks before the test. Determine at these meetings what to test (usually critical applications that require you to be up within 24 hours of a disaster). Also determine what hardware and software requirements are needed, any licensing issues, Emergency Response Team staffing requirements, team responsibilities, travel requirements, reservations, and even the catering of the food.

For the weekly meetings, have the team members from the hot-site provider assigned to your test on a conference telephone call with your affected team members (and business units if appropriate) to review every detail of your plans. This includes the mainframe, Unix, NT, and telecommunications departments of both companies and provides the opportunity to get together and exchange information. This is often the best way to not miss any details on hardware configurations. Have the senior technical coordinator from the hot-site provider on the conference call as well. This is so that they know what you expect from their staff, as well as what they expect from yours. This makes certain that everyone knows what the common goals are during the test.

Before the test, fill out a hardware and software requirements form for the hot-site provider that lists out in detail your exact configuration. Send this to the hot-site provider. This should leave no margin for error once you get to the recovery site.

An important part of the exercise is the recording of everything that happens. The coordinator is usually there during the entire test, and that sometimes is exhausting, but take notes. This ensures that you know what happened and helps in the process of updating your recovery procedures and making necessary changes to documentation before the next test or an actual disaster. Most hot-site tests are very long and you cannot rely on your tired team members to remember every detail the next day or a week later. It’s better to have some notes to refer to when you return home. Make a form that you can hand out before the test actually starts and insist that it be used. Your team members will find this very useful in recording problems and/or issues. When the test is completed, collect the forms. This will allow you to write a better executive summary for management as well as hold a detailed post-mortem meeting to discuss your successes or learning experiences (there are no failures in testing).

Remember, a successful exercise is one where you know your objectives, carry them out as best you can, learning from the experience and update any and all documentation to the best of your knowledge.

SCOPE OF THE EXERCISE

Within each type of exercise, the following are examples of how you might state the scope of the exercise:

Component Exercise

The scope of this exercise is one part of the Emergency Response Plan that we will be testing, e.g.:

  • contact lists
  • off-site storage media accuracy

Plan Exercise

The scope of this exercise is the testing of a plan in its entirety for a specific element, e.g..

  • systems application
  • work center

Process Exercise

The scope of this exercise is to test the integration of the plans supporting this process, e.g.:

  • an integrated message-billing process should begin with the collection of the call data, go through rate assignments, customer identification to the Bill Print and delivery function and the receipt of the customer remittance

Exercise Description

When describing the testing being performed, both the type and category should be specified, e.g.:

  • a Walk-through Component test
  • a Walk-through Plan test
  • a Walk-through Process test
  • a Simulation Component test
  • a Simulation Plan test
  • a Simulation Process test

EXERCISE FREQUENCY

Emergency Response Plans must be tested as often as necessary to support the business objectives and to control the risks associated with the loss of a process/function. This could be semi-annually or annually or even more often if the organization is dynamic in change.

EXERCISE RESPONSIBILITY

The development of the tests and the conduct of the tests is the responsibilities of the Exercise Management Team. This team is usually composed of members of several other teams and decides on what should be tested.

The responsibilities for and during testing are as follows:

Exercise Management Team

  • Determine what to test
  • Set exercise schedules
  • Coordinate or participate in all pre, during and post exercise meetings
  • Manage and observe the exercises
  • Coordinate the completion of the Application Information Form from existing data available then gather additional information from the application ER team
  • Coordinate the development of the application scope, objectives and team list for the exercise
  • Coordinate the development of the application recovery timeline using data supplied by the application ER team
  • Participate in problem resolution that may impact the exercise schedule
  • Monitor application recovery progress to ensure that documented recovery procedures are being followed correctly
  • Gather the Test Information Forms which were completed during the exercise.
  • Analyze the Test Information Forms in conjunction with the teams that were involved in the exercise
  • Conduct the application portion of the post exercise review.
  • Track and provide status for all outstanding application issues resulting from the exercise.
  • Reports to management on the exercise results.

Application Team(s)

  • Participate as needed in all pre, during and post exercise meetings
  • Provide input to the preparation of all exercise documents (scope, objectives, timeline)
  • Provide support for application related problems identified during the test
  • Participate in the application recovery where specified in the Plan

DATA COLLECTION

The Exercise Management Team will ensure adequate documentation from all participants in the drill or exercise. Evaluators of an exercise will complete data collection documents based on observations regarding the overall effectiveness of their portion of the exercise. Guidelines for effective data collection are as follows:

Data collection forms designed for this purpose should be as straight-forward as possible. Forms should be developed to evaluate each group of players within the exercise. A simple form with brief instructions and plenty of room to write would be best.

A checklist can be used for the evaluator to rate various activities. A more free-form document on which the evaluator determines what to look for and how that activity was rated may also be used.

All forms should identify the evaluator, location, activity observed, participants observed, times, and dates. Participants should be identified by function, not names, in order to minimize personal criticism.

All documentation of an Emergency Response Plan scheduled exercise should be promptly submitted to the Exercise Management Team for review.

INTERNAL REVIEWS & CRITIQUES

The internal review and critique process will be an evaluation function of the Emergency Response Plan which addresses the simulated exercise and the actual emergency event. It should be noted that the standards of promptness, documentation, reporting, and internal tracking will apply to simulated and actual emergency events. The Exercise Management Team will evaluate problems which require improvement, and promptly document and submit observations to the Command Coordinator. The two forms of documentation are:

Evaluation

The Exercise Management Team will be convened to perform debriefing interviews of key personnel and examine records generated during the planned or real emergency. An evaluation report containing critiques will be sent to the Team Leader for information and action as required. This report should address the scope and objectives of the emergency, including the major deficiencies observed. Also, whether the objectives were fulfilled and why any problems occurred in meeting event objectives should be discussed. Proposed solutions for the problems documented should be included where possible. This report forms the basis for future improvement tracking guidelines.

Internal Tracking

This aspect of event review will ensure that necessary corrective actions are taken in order to address the major issues identified with an event, whether simulated or actual. Team Leaders of the teams and business/operational units that were tested should be made aware by the Exercise Management Team of deficiencies and suggestions for improvement in their areas of responsibility. Action items should be identified and dates for deficiency close-out should be assigned and tracked by the Exercise Management Team.

A meeting will be scheduled in which the review process and any proposed solutions are recorded and compiled into a master list of deficiencies and recommendations. These should be logically organized in accordance with the event objectives.

EXTERNAL REVIEWS & CRITIQUES

Depending upon the complexity of the event, many emergency response groups and support organizations, other than The Company, may be involved (i.e., fire and rescue, law enforcement/security, volunteer assistance, hazardous material response, etc.). Like the internal review, the external review and critique process will address the simulated exercise and the actual emergency event as well. The representatives from these organizations may be participants or observers. This area of the Emergency Response Plan will allow such groups to participate in the review and critique process. The Response Management Team Leader, in conjunction with the Exercise Management Team Leader, will schedule a meeting with key participants of these groups in order to evaluate how well they interfaced and functioned with The Company during a simulated or actual emergency event. The meeting would be planned according to the following guidelines:

Schedule

The external review and critique should be conducted as soon after the exercise or emergency event as possible while allowing sufficient time to review internal forms and comments. Following use of the Emergency Response Plan, the Exercise Management Team will contact outside groups who were involved and request written and oral comments regarding The Company's performance. Those comments will be factored into the evaluation report transmitted to the Command Coordinator.

Location

Critique locations involving non-The Company organizations should provide adequate accommodations to discuss event comments. Ideally, this meeting should be conducted outside of a The Company security area to permit easier access and taking into account that attendance is often greater than anticipated.

Participants

The critique should include participation by all major Emergency Response Plan personnel, evaluators, appropriate management, and staff. Comments and recommendations for improvement of the emergency response should be encouraged.

Agenda

The post-event debriefing need not be formal; however, the evaluation critique should have a formal agenda which will ensure presentation of opening comments and introductions by appropriate management, staff comments and suggestions, and timely review and discussion of lead evaluator's report of critique items and event objectives.

Action Item Tracking

Following the evaluation process, action items emanating from non-The Company emergency response groups should be incorporated into the tracking process. Responsibilities should be assigned for every action item and a date set for resolution and closure to ensure that all problem areas are addressed prior to a new exercise. A copy of this action item list should be sent to appropriate control and evaluation staff members representing the non-The Company emergency response groups.

Acknowledgements

Letters of acknowledgement could appropriately be sent to key organizations and other emergency response groups that have contributed to the overall success of a The Company simulated emergency event or response to an actual emergency, such as local officials and volunteer fire and medical groups. Positive recognition will strengthen and motivate future efforts to improve the Emergency Response Plan. A news release highlighting the event may be considered.

TEST SUGGESTIONS

Some tips on performing a Emergency Response test that doesn't spell disaster for your organization's routine:

  • When Emergency Response is viewed as an occasional nuisance, rather than an important part of day-to-day activities, testing is prone to failure, either in whole or in part. Commitment to Emergency Response must begin with senior management and be impressed upon all management levels down to the unit manager level.
  • Recovery documentation is the most important tool brought to the disaster site. The document must be easily updated by everyone responsible for a portion of the recovery process. It should be updated often, as part of a daily routine in connection to changes made to the IT environment. It must be written in detailed step-by-step instructions so that a non-technical person can accomplish each task.
  • Testing at the backup site offers refinements to the plan. Testing at a hot site is expensive - so much so that most IT departments contract for one or two tests per year - but testing at the backup site is significantly cheaper and can be accomplished in a shared environment.
  • Written procedures in support of daily activities should include a section that details the steps necessary to keep disaster plans current as a result of the activity. An independent auditing department should regularly audit all documents, programs and procedures for currency.
  • Less reliance on technical skills reduces the risk of DR failure. Recovery plans riddled with technical jargon and dependent on highly technical personnel from multiple departments are at risk of failure. Plans centered on as few processes as necessary are more successful because non-technical staff can replace technical staff in the event they are not available.
  • Automation simplifies the Emergency Response process. Any task that can be automated, such as submitting recovery jobs using a scheduling system, frees staff members to perform concurrent recovery tasks. Be cautious, however, not to "over-automate" to the point that recovery is now dependent upon special in-house written code. Stick with third- party vendors with 24/7 support.
  • Eliminate human error by eliminating those manual tasks subject to error, such as the need to manually register vaulted tape data set names with a tape management software package. Proper naming conventions and automated adherence through the use of third-party software are worth it when they ensure that all tape media required for the recovery is vaulted and available.
  • Recovery prioritization gets you quickly back on track. Identification of critical systems and their recovery time objectives (how soon the systems need to be available) allow for prioritization of recovery tasks. For example, if your business is reliant upon the availability of an online system within six hours from the time of the disaster, then the recovery plan must be tailored to meet this objective, even if it means re-working your backup and recovery strategies.
  • Success should be measured by the overall commitment made by IS departments and senior level management. This commitment is evident in budget preparation, staffing, and the overall attitude. Businesses that commit to a systematic approach to Emergency Response testing are well positioned to quickly return to normal business operations when disaster strikes. And that is the most critical measure of success.