|

Main
Page
About
Risk Concepts, Ltd.
Risk Management Services
Other
Services
Seminars
Affiliates
Email |
|
More
About OPERATIONAL RECOVERY PROCESSES
This
document describes the services of Risk Concepts, Ltd. and its corporate
partner CyberCommunication, Inc. (collectively
the "RCL/CyberComm Team") in the specialized area of Operational
Recovery Processes.
WHY
HAVE EFFECTIVE OPERATIONAL RECOVERY PROCESSES?
While
the need for Disaster Recovery and Business Continuity Plans (jointly
referred to in this document as "ORP" standing for Operational
Recovery Processes) is obvious to most, the criticality of these processes
may not be as well understood, particularly by non-technical executive
managers and Directors. Consider the following:
|
-
In
today's technology environment, top three needs are:
-
1.
Interoperability - one platform to communicate with all other
platforms
2. Security
3. Disaster Recovery Planning & Business Continuity Planning
-
Veritas
Corp. Dec. 2002
-
-
2
of every 5 businesses that experience a disaster will be out of
business within 5 years (disaster = lose information for a period
of time, but you can restore it). If you lose data and can't restore,
85% of companies will be out of business in 2 years.
-
Gartner
- Aug 2002
-
- Senior
management must see the cost of downtime: (a) the average cost of
downtime = $85,000 per hour from lost revenue, opportunity cost
and costs of repair work and (b) 97% uptime = 3% down time = $22,000,000.
Actual costs in other markets:
Downtime
--/----- Retailer -------/--
Financial
8 Days----- / -- $9.75 million -----
$3 billion
73 Hours -- -- $3.9 million ------
$1.2 billion
3.5 Hours -- /--- $195,000 --/--------
$58,000
Meta
Group, 12 June 2002
|
|
Thus
the business justifications for effective and fully-tested ORP are compelling
- as are the returns-on-investment in this area, especially when one considers
the indirect costs of downtime on reputation, customer satisfaction and
market share, lost productivity, and, ultimately, stock price.
THE
RCL/CYBERCOMM TEAM'S APPROACH TO DEVELOPING ORP
Most
require the development of an Operational Recovery Strategy, a process
best done in a three-phase approach.
Phase
1 - ASSESS REQUIREMENTS
This
typically consists of:
-
Gathering
information on existing infrastructure (software, hardware and facilities).
-
Assessing
the requirements, including preparing appropriate questions to address
the needs of internal and external customers and conducting surveys
with internal and external customers via email or in person.
Phase
2 - PREPARE BUSINESS IMPACT ASSESSMENT
The
results of the survey are analyzed and a Business Impact Assessment is
prepared identifying both operational (qualitative) and financial (quantitative)
impacts of inoperable or inaccessible functions on an entity's abilities
to conduct critical business processes.
The
Business Impact Assessment is the basis for formulating Operational Recovery
Strategies and guides the selection of recovery tactics to restore operations
within required time frames. As part of these activities, a clear understanding
of each business unit's Maximum Acceptable Outage is developed that, in
turn, is directly linked to a well-reasoned "reaction requirements"
for each supporting element of the organization (e.g., Information Technology,
Facilities, Incident Response units). The diagram below provides further
details of these activities.
|
 |
ORP
should consider at least four categories of disaster:
-
Natural disasters - Fire, flood, high winds, tsunamis and earthquakes,
as may be applicable to business locations.
-
Human error - Fatigue, drug abuse, inability
to recognize a disaster, and poor training.
-
Acts of malice - Employee sabotage, violence in the workplace,
theft, vandalism, computer crime, viruses and terrorism.
-
Hardware failure - Inadequate maintenance,
improper handling of media, power outages, lack of climate control,
and poor manufacturing.
Best
practices for ORP also require that a broader scope of risks be considered
as depicted below:
|
 |
|
Phase
3 - DESIGN OPERATIONAL RECOVERY STRATEGIES AND TEST ORP
In
designing ORP, the first priority of the RCL/CyberComm Team is protecting the
entity's staff, then protecting the organization. This is accomplished
through the following primary objectives:
-
Identify sources of disaster.
-
Follow preventive practices that will minimize the risk or impact
of disaster.
-
Set criteria for making the decision to recover at a cold site, hot
site, or repair the affected site.
-
Describe an organizational structure for carrying out the component
ORP.
-
Provide information concerning personnel, including computing expertise,
that will be required to carry out each component of the ORP.
-
Identify the equipment, floor plan, procedures, and other items necessary
for the recoveries.
-
Provide detailed procedures for staff to follow.
-
Train staff in following the ORP, and carry out simulated disasters
(also known as "fire drills") to test the ORP effectiveness.
In
addition, the RCL/CyberComm Team uses the following definitions for the Levels
of Criticality ("LC") that
are assigned to the resulting components of the ORP as a guide to their
implementation and testing:
-
(LC-0)
Conventional Processing Business functions can be interrupted and
integrity of the data is not essential. To the system user work stops
and uncontrolled shutdown occurs. Data may be lost or corrupted. Operational
recovery = days to weeks.
-
(LC-1)
Highly Reliable Business functions can be interrupted as long as integrity
of the data is assured. To the system user work stops and uncontrolled
shutdown occurs. Operational recovery = days.
-
(LC-2)
Highly Available Business functions can allow only minimal interruptions
during essential time periods, or during most hours of the day or
week throughout the year. To the system user work is interrupted but
they can quickly log back onto the system. However, some transactions
may need to be rerun from a journal file and users may experience
performance degradation. Operational recovery = hours up to a day.
-
(LC-3)
Fault Resilient Business functions require uninterrupted computing
during essential time periods, or during most hours of the day or
week throughout the year. This means that the user stays on-line.
However, the current transaction may need restarting and users may
experience performance degradation. Operational recovery = minutes
to hours.
-
(LC-4)
Fault Tolerant Business functions that demand continuous computing
and where any failure is transparent to the user. This means no interruption
of work; no transactions lost, no degradation in performance and continuous
24x7 operation. Operational recovery = minutes.
-
(LC-5)
Disaster Tolerant Business functions that absolutely must be available
to the user and where any failure must be transparent to the user.
This means no interruption of work; no transactions lost; no degradation
in performance and continuous computing services because computing
capability is available in multiple data centers/sites. Operational
recovery = instantaneous.
Readers
should note that all ORP assume a certain amount of risk, the primary
one being how much data is lost in the event of a disaster. There are
compromises between the amount of: (i) time, effort and money spent
in the planning / preparation for a disaster and (ii) data loss that
can be sustained and still remain operational following a disaster.
Time also enters the equation since many organizations simply cannot
function without the computers they use to conduct business. Consequently,
their Operational Recovery Strategies must focus on quick recovery -
or even zero down time - by duplicating and maintaining computer systems
in separate facilities.
Routine
testing is critical to the recoverability of operations and the intent
of any test will be to find ways to improve ORP, not just to validate
their effectiveness. Consequently, ORP must include specific testing,
typically at three levels:
-
Walk-throughs are used to exercise the logic of the Operational Recovery
Strategies and supporting procedures before Component Exercises are
conducted thereby providing important results at very nominal cost.
-
Component Exercises test a single plan component such as off-site
storage contents, specific business unit procedures or compatibility
of alternate sites.
-
Integrated Tests (i.e., simulated disasters) are the most complex
level of testing wherein two or more components are exercised in concert
thereby verifying the functions of interfaces between plan components
and assuring that other aspects the component Operational Recovery
Strategies mesh properly.
Procedures
and evidence of testing ensure that the ORP are executable. Details
of testing scenarios, results of tests performed, key learnings from
tests, and planned changes based on test results are documented in detail.
In addition, documentation of ORP should include a testing calendar
based on a 2 to 3 year cycle to ensure that all areas of the ORP are
appropriately covered and evaluated.
THE
RCL/CyberComm TEAM'S PHILOSOPHY, EXPERIENCE AND EXPERTISE
While
leading the development of ORP, the RCL/CyberComm Team emphasizes the skills
and knowledge of registered Computer Information Systems Security Professionals
(CISSP). We believe in the transfer of knowledge to our clients, empowering
them with the latest technology and skills. This practice results in personal
growth for client staff members. Our clients include commercial, high-
and low-tech manufacturing companies, financial entities, and state/local
government agencies. Moreover, developing an ORP is more than a knowledge
management system; it requires experience and expertise in specialized
fields. Every member of the RCL/CyberComm Team specializes in one or more of
the following areas:
|
|
Safety
Management
|
Network
Security
|
Network
Administration
|
|
Physical
Security
|
Backup
and Recovery
|
Cyber
Security
|
|
Civil
Engineering / Architecture
|
Crisis
Intervention / Disaster Response
|
HAZMAT
/ WMD Decontamination
|
|
Firefighting
|
Training
/ Documentation
|
Survival
Training
|
For
further information Operational Recovery Processes and / or the related
services of RCL/CyberComm Team contact us at:
Risk Concepts, Ltd.
3 Jekyll Court
Bluffton, SC 29910
|
Phone: 1 (843) 706-3878
Cell: 1 (540) 840-7450
Represented
in the United States, Central and South America and the Caribbean. |
Click
here to email RCL for client
references or to request our complete brochure.
(Last
updated:
February 20, 2008
)
|