I would like to know what general emergency and recovery steps are taken when there is a network and server outage.
One of the big issues with outages is to be prepared before the event occurs. Here are some general steps:
- Management buy-in: What I am referring to here is that management must make the decision to invest funds to deal with potential outages.
- Business impact analysis: With management's decision to invest the funds, a team must be appointed to determine what are critical systems and what can be done to reduce outages. For all practical purposes this is a risk assessment.
- Recovery strategy planning: With critical systems, identified methods must be identified to reduce outages and deal with their potential effect. As an example, a loss of power can bring operations to a standstill. Having a backup generator on standby would be one way to deal with this identified risk.
- Implementation: This is where funds are spent and the agreed to recover alternatives are put in place. This can include options like RAID, offsite backup, backup generators, etc.
- Testing and maintenance: Whatever is implemented and put in place needs to be tested. That's the only way to know that systems and procedures will work as planned. There also needs to be mechanisms in place to update the plan. After all, life is about change.
Therefore, in reality what's most important about a disaster is what you have done before the disaster to prepare for the situation; have alternate plans, test the plan, and have key employees trained and ready to deal with the situation.
For complete steps toward recovery, view this network disaster recovery checklist.
This was first published in September 2007