Operation Risk Consulting Services from Risk Concepts, Ltd.

dotMain Page
dotAbout Risk Concepts, Ltd.
dotRisk Management Services
dotOther Services
dotSeminars
dotAffiliates
dotEmail

More About OPERATIONAL RECOVERY PROCESSES

This document describes the services of Risk Concepts, Ltd. and its corporate partner CyberCommunication, Inc. (collectively the "RCL/CyberComm Team") in the specialized area of Operational Recovery Processes.

WHY HAVE EFFECTIVE OPERATIONAL RECOVERY PROCESSES?

While the need for Disaster Recovery and Business Continuity Plans (jointly referred to in this document as "ORP" standing for Operational Recovery Processes) is obvious to most, the criticality of these processes may not be as well understood, particularly by non-technical executive managers and Directors. Consider the following:

In today's technology environment, top three needs are:
1. Interoperability - one platform to communicate with all other platforms
2. Security
3. Disaster Recovery Planning & Business Continuity Planning
Veritas Corp. Dec. 2002
 
2 of every 5 businesses that experience a disaster will be out of business within 5 years (disaster = lose information for a period of time, but you can restore it). If you lose data and can't restore, 85% of companies will be out of business in 2 years.
Gartner - Aug 2002
 
Senior management must see the cost of downtime: (a) the average cost of downtime = $85,000 per hour from lost revenue, opportunity cost and costs of repair work and (b) 97% uptime = 3% down time = $22,000,000. Actual costs in other markets:

Downtime --/----- Retailer -------/-- Financial
8 Days----- / -- $9.75 million ----- $3 billion
73 Hours -- -- $3.9 million ------ $1.2 billion
3.5 Hours -- /--- $195,000 --/-------- $58,000

Meta Group, 12 June 2002

Thus the business justifications for effective and fully-tested ORP are compelling - as are the returns-on-investment in this area, especially when one considers the indirect costs of downtime on reputation, customer satisfaction and market share, lost productivity, and, ultimately, stock price.

THE RCL/CYBERCOMM TEAM'S APPROACH TO DEVELOPING ORP

Most require the development of an Operational Recovery Strategy, a process best done in a three-phase approach.

Phase 1 - ASSESS REQUIREMENTS

This typically consists of:

  • Gathering information on existing infrastructure (software, hardware and facilities).
  • Assessing the requirements, including preparing appropriate questions to address the needs of internal and external customers and conducting surveys with internal and external customers via email or in person.

Phase 2 - PREPARE BUSINESS IMPACT ASSESSMENT

The results of the survey are analyzed and a Business Impact Assessment is prepared identifying both operational (qualitative) and financial (quantitative) impacts of inoperable or inaccessible functions on an entity's abilities to conduct critical business processes.

The Business Impact Assessment is the basis for formulating Operational Recovery Strategies and guides the selection of recovery tactics to restore operations within required time frames. As part of these activities, a clear understanding of each business unit's Maximum Acceptable Outage is developed that, in turn, is directly linked to a well-reasoned "reaction requirements" for each supporting element of the organization (e.g., Information Technology, Facilities, Incident Response units). The diagram below provides further details of these activities.

ORP should consider at least four categories of disaster:

  1. Natural disasters - Fire, flood, high winds, tsunamis and earthquakes, as may be applicable to business locations.
  2. Human error - Fatigue, drug abuse, inability to recognize a disaster, and poor training.
  3. Acts of malice - Employee sabotage, violence in the workplace, theft, vandalism, computer crime, viruses and terrorism.
  4. Hardware failure - Inadequate maintenance, improper handling of media, power outages, lack of climate control, and poor manufacturing.

Best practices for ORP also require that a broader scope of risks be considered as depicted below:

Phase 3 - DESIGN OPERATIONAL RECOVERY STRATEGIES AND TEST ORP

In designing ORP, the first priority of the RCL/CyberComm Team is protecting the entity's staff, then protecting the organization. This is accomplished through the following primary objectives:

  1. Identify sources of disaster.
  2. Follow preventive practices that will minimize the risk or impact of disaster.
  3. Set criteria for making the decision to recover at a cold site, hot site, or repair the affected site.
  4. Describe an organizational structure for carrying out the component ORP.
  5. Provide information concerning personnel, including computing expertise, that will be required to carry out each component of the ORP.
  6. Identify the equipment, floor plan, procedures, and other items necessary for the recoveries.
  7. Provide detailed procedures for staff to follow.
  8. Train staff in following the ORP, and carry out simulated disasters (also known as "fire drills") to test the ORP effectiveness.

In addition, the RCL/CyberComm Team uses the following definitions for the Levels of Criticality ("LC") that are assigned to the resulting components of the ORP as a guide to their implementation and testing:

  • (LC-0) Conventional Processing Business functions can be interrupted and integrity of the data is not essential. To the system user work stops and uncontrolled shutdown occurs. Data may be lost or corrupted. Operational recovery = days to weeks.
  • (LC-1) Highly Reliable Business functions can be interrupted as long as integrity of the data is assured. To the system user work stops and uncontrolled shutdown occurs. Operational recovery = days.
  • (LC-2) Highly Available Business functions can allow only minimal interruptions during essential time periods, or during most hours of the day or week throughout the year. To the system user work is interrupted but they can quickly log back onto the system. However, some transactions may need to be rerun from a journal file and users may experience performance degradation. Operational recovery = hours up to a day.
  • (LC-3) Fault Resilient Business functions require uninterrupted computing during essential time periods, or during most hours of the day or week throughout the year. This means that the user stays on-line. However, the current transaction may need restarting and users may experience performance degradation. Operational recovery = minutes to hours.
  • (LC-4) Fault Tolerant Business functions that demand continuous computing and where any failure is transparent to the user. This means no interruption of work; no transactions lost, no degradation in performance and continuous 24x7 operation. Operational recovery = minutes.
  • (LC-5) Disaster Tolerant Business functions that absolutely must be available to the user and where any failure must be transparent to the user. This means no interruption of work; no transactions lost; no degradation in performance and continuous computing services because computing capability is available in multiple data centers/sites. Operational recovery = instantaneous.

Readers should note that all ORP assume a certain amount of risk, the primary one being how much data is lost in the event of a disaster. There are compromises between the amount of: (i) time, effort and money spent in the planning / preparation for a disaster and (ii) data loss that can be sustained and still remain operational following a disaster. Time also enters the equation since many organizations simply cannot function without the computers they use to conduct business. Consequently, their Operational Recovery Strategies must focus on quick recovery - or even zero down time - by duplicating and maintaining computer systems in separate facilities.

Routine testing is critical to the recoverability of operations and the intent of any test will be to find ways to improve ORP, not just to validate their effectiveness. Consequently, ORP must include specific testing, typically at three levels:

  1. Walk-throughs are used to exercise the logic of the Operational Recovery Strategies and supporting procedures before Component Exercises are conducted thereby providing important results at very nominal cost.
  2. Component Exercises test a single plan component such as off-site storage contents, specific business unit procedures or compatibility of alternate sites.
  3. Integrated Tests (i.e., simulated disasters) are the most complex level of testing wherein two or more components are exercised in concert thereby verifying the functions of interfaces between plan components and assuring that other aspects the component Operational Recovery Strategies mesh properly.

Procedures and evidence of testing ensure that the ORP are executable. Details of testing scenarios, results of tests performed, key learnings from tests, and planned changes based on test results are documented in detail. In addition, documentation of ORP should include a testing calendar based on a 2 to 3 year cycle to ensure that all areas of the ORP are appropriately covered and evaluated.

THE RCL/CyberComm TEAM'S PHILOSOPHY, EXPERIENCE AND EXPERTISE

While leading the development of ORP, the RCL/CyberComm Team emphasizes the skills and knowledge of registered Computer Information Systems Security Professionals (CISSP). We believe in the transfer of knowledge to our clients, empowering them with the latest technology and skills. This practice results in personal growth for client staff members. Our clients include commercial, high- and low-tech manufacturing companies, financial entities, and state/local government agencies. Moreover, developing an ORP is more than a knowledge management system; it requires experience and expertise in specialized fields. Every member of the RCL/CyberComm Team specializes in one or more of the following areas:

Safety Management

Network Security

Network Administration

Physical Security

Backup and Recovery

Cyber Security

Civil Engineering / Architecture

Crisis Intervention / Disaster Response

HAZMAT / WMD Decontamination

Firefighting

Training / Documentation

Survival Training

For further information Operational Recovery Processes and / or the related services of RCL/CyberComm Team contact us at:

Risk Concepts, Ltd.
3 Jekyll Court
Bluffton, SC 29910

Phone: 1 (843) 706-3878
Cell:     1 (540) 840-7450
Represented in the United States, Central and South America and the Caribbean.

Click here to email RCL for client references or to request our complete brochure.

(Last updated: February 20, 2008 )