Disaster recovery involves a set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business functions, as opposed to business continuity, which involves keeping all essential aspects of a business functioning despite significant disruptive events. Disaster recovery can therefore be considered as a subset of business continuity.
IT Service Continuity (ITSC) is a subset of Business Continuity Planning (BCP) and encompasses IT disaster recovery planning and wider IT resilience planning. It also incorporates those elements of IT infrastructure and services which relate to communications such as (voice)telephony and data communications.
Planning includes arranging for backup sites, be they hot, warm, cold or standby sites with hardware as needed for continuity.
In 2008 the British Standards Institution launched a specific standard connected and supporting the Business Continuity Standard BS 25999 titled BS25777 specifically to align computer continuity with business continuity. This was withdrawn following the publication in March 2011 of ISO/IEC 27031 - Security techniques -- Guidelines for information and communication technology readiness for business continuity.
The recovery time objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
In accepted business continuity planning methodology, the RTO is established during the Business Impact Analysis (BIA) by the owner of a process, including identifying options time frames for alternate or manual workarounds.
In a good deal of the literature on this subject, RTO is spoken of as a complement of recovery point objective (RPO), with the two metrics describing the limits of acceptable or "tolerable" ITSC performance in terms of time lost (RTO) from normal business process functioning, and in terms of data lost or not backed up during that period of time (RPO) respectively.
A Forbes overviewnoted that it is recovery time actual(RTA) which is "the critical metric for business continuity and disaster recovery."
RTA is established during exercises or actual events. The business continuity group times rehearsals (or actuals) and makes needed refinements.
A recovery point objective (RPO) is defined by business continuity planning. It is the maximum targeted period in which data (transactions) might be lost from an IT service due to a major incident.
If RPO is measured in minutes (or even a few hours),then in practice, off-site mirrored backups must be continuously maintained - a daily off-site backup on tape will not suffice.
Recovery that is not instantaneous will restore data/transactions over a period of time; the goal is without incurring significant risks or significant losses.
RPO measures the maximum time period in which recent data might have been permanently lost in the event of a major incident; it is not a direct measure of the quantity of such loss. For instance if the BC plan is "restore up to last available backup", the RPO is the maximum interval between such backup that has been safely vaulted offsite.
Two points should be noted. Firstly, business impact analysis is used to determine RPO for each service - RPO is not determined by the existent backup regime. Secondly, when any level of preparation of off-site data is required, the period during which data often starts near the time of the beginning of the work to prepare backups, not the time the backups are taken off-site.
Although a data synchronization point is a point in time, the timing for performing the physical backup must be included. One approach used is to halt processing of an update queue, while a disk-to-disk copy is made. The backup reflects the earlier time of that copy operation, not when the data is copied to tape or transmitted elsewhere.
RTO and the RPO must be balanced, along with all the other major system design criteria.
RPO is tied to the times backups are sent offsite. Offsiting via synchronous copies to an offsite mirror allows for most unforeseen difficulty. Use of physical transportation for tapes (or other transportable media) comfortably covers some backup needs at a relatively low cost. Recovery can be enacted at a predetermined site. Shared offsite space and hardware completes the package needed.
For high volumes of high value transaction data, the hardware can be split across two or more sites; splitting across geographic areas adds resiliency.
Planning for disaster recovery and information technology (IT) developed in the mid- to late 1970s as computer center managers began to recognize the dependence of their organizations on their computer systems.
Regulatory agencies became involved even before the rapid growth of the Internet during the 2000s; objectives of 2, 3, 4 or 5 nines (99.999%) were often mandated, and high-availability solutions for hot-site facilities were sought.
IT Service Continuity is essential for many organizations in the implementation of Business Continuity Management (BCM) and Information Security Management (ICM) and as part of the implementation and operation information security management as well as business continuity management as specified in ISO/IEC 27001 and ISO 22301 respectively.
The rise of cloud computing since 2010 continues that trend: nowadays, it matters even less where computing services are physically served, just so long as the network itself is sufficiently reliable (a separate issue, and less of a concern since modern networks are highly resilient by design). 'Recovery as a Service' (RaaS) is one of the security features or benefits of cloud computing being promoted by the Cloud Security Alliance.
Disasters can be classified into two broad categories. The first is natural disasters such as floods, hurricanes, tornadoes or earthquakes. While preventing a natural disaster is impossible, risk management measures such as avoiding disaster-prone situations and good planning can help. The second category is man-made disasters, such as hazardous material spills, infrastructure failure, bio-terrorism, and disastrous IT bugs or failed change implementations. In these instances, surveillance, testing and mitigation planning are invaluable.
Recent research supports the idea that implementing a more holistic pre-disaster planning approach is more cost-effective in the long run. Every $1 spent on hazard mitigation(such as a disaster recovery plan) saves society $4 in response and recovery costs.
2015 disaster recovery statistics suggest that downtime lasting for one hour can cost
As IT systems have become increasingly critical to the smooth operation of a company, and arguably the economy as a whole, the importance of ensuring the continued operation of those systems, and their rapid recovery, has increased. For example, of companies that had a major loss of business data, 43% never reopen and 29% close within two years. As a result, preparation for continuation or recovery of systems needs to be taken very seriously. This involves a significant investment of time and money with the aim of ensuring minimal losses in the event of a disruptive event.
Control measures are steps or mechanisms that can reduce or eliminate various threats for organizations. Different types of measures can be included in disaster recovery plan (DRP).
Disaster recovery planning is a subset of a larger process known as business continuity planning and includes planning for resumption of applications, data, hardware, electronic communications (such as networking) and other IT infrastructure. A business continuity plan (BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis communication and reputation protection, and should refer to the disaster recovery plan (DRP) for IT related infrastructure recovery / continuity.
IT disaster recovery control measures can be classified into the following three types:
Good disaster recovery plan measures dictate that these three types of controls be documented and exercised regularly using so-called "DR tests".
Prior to selecting a disaster recovery strategy, a disaster recovery planner first refers to their organization's business continuity plan which should indicate the key metrics of recovery point objective (RPO) and recovery time objective (RTO) for various business processes (such as the process to run payroll, generate an order, etc.). The metrics specified for the business processes are then mapped to the underlying IT systems and infrastructure that support those processes.
Incomplete RTOs and RPOs can quickly derail a disaster recovery plan. Every item in the DR plan requires a defined recovery point and time objective, as failure to create them may lead to significant problems that can extend the disaster's impact. Once the RTO and RPO metrics have been mapped to IT infrastructure, the DR planner can determine the most suitable recovery strategy for each system. The organization ultimately sets the IT budget and therefore the RTO and RPO metrics need to fit with the available budget. While most business unit heads would like zero data loss and zero time loss, the cost associated with that level of protection may make the desired high availability solutions impractical. A cost-benefit analysis often dictates which disaster recovery measures are implemented.
Traditionally, a disaster recovery system involved cutover or switch-over recovery systems. Such measures would allow an organization to preserve its technology and information, by having a remote disaster recovery location that produced backups on a regular basis. However, this strategy proved to be expensive and time-consuming. Therefore, more affordable and effective cloud-based systems were introduced.
Some of the most common strategies for data protection include:
In many cases, an organization may elect to use an outsourced disaster recovery provider to provide a stand-by site and systems rather than using their own remote facilities, increasingly via cloud computing.
In addition to preparing for the need to recover systems, organizations also implement precautionary measures with the objective of preventing a disaster in the first place. These may include:
Several large hardware vendors have developed mobile/modular solutions that can be installed and made operational in very short time.
real-time ... provide redundancy and back-up to ...
.. patient records
...the disaster-recovery industry has grown to
Manage research, learning and skills at defaultlogic.com. Create an account using LinkedIn to manage and organize your omni-channel knowledge. defaultlogic.com is like a shopping cart for information -- helping you to save, discuss and share.