Petya Petrova - Fotolia

Problem solve Get help with specific problems with your technologies, process and projects.

What steps should I take in the event of a network failure?

Don't panic when the network goes down. Instead, use this checklist to go through five factors that could cause network failure, and follow tips on how to plan for them in advance.

No one series of steps can deal with a network failure. Failures differ. Perhaps no traffic is moving at all, or maybe the problem is confined to just one part of the network. Perhaps access to a critical application is blocked, but does that application execute on a local server or from a SaaS provider?

Recovery steps vary based on the nature of the failure. That's why it's important to plan for each type of problem. Essentially, recovery planning depends on understanding how each type of failure could occur: Why would the entire network become inoperable?

Possible reasons for network failure

Power issues. The most obvious problem is power to the switches and routers failed. By planning in advance for this possibility, you can add battery backup and run a secondary link to the power supplier so your network doesn't go down when someone drives into a power pole.

Updates and configurations. Another potential reason for network failure is issues with software updates or network configuration. Were switches updated overnight with new software, or did a network-wide configuration change roll out? No update should be done without testing. But, in any case, the network must retain the previous software version and network configuration, and teams should be ready to quickly restore them.

Teams should also log all changes to the network. Each change must include an explanation of what was changed, why it was changed and who made the change. Organizations should carefully control access to admin passwords so only qualified personnel can introduce changes.

Network outage checklist
Follow these steps to pinpoint what might be causing your network failure.

Hardware problems. Hardware device failure is always a possibility when only part of the network is down. Network teams should design the network with redundancy so alternate paths exist for any failing device, but performance may degrade to the extent that the network is unusable. Network monitors should make it quickly apparent which device has failed.

Server troubles. Failure of a single application may be due to an issue with the server on which it runs or because of a network failure on the path to the server. Teams should plan in advance to ensure enough extra server capacity is available to move the application elsewhere. If the network is the issue, it may be due to a hardware failure, a software update or a configuration change.

SaaS provider issues. Failure of a SaaS provider can be more difficult. While your business may depend on the provider, its operation is out of your hands. Again, advance planning is vital. Before signing up, insist on examining the provider's backup plans and on choosing service-level agreements that provide adequate guarantees.

Investigate disaster recovery-as-a-service providers. They may offer a way to deal with a SaaS failure. Make sure you can gain access to an updated copy of your data and to the software required to access it.

Finally, don't panic when a problem occurs. When users report the network is down, it could actually be down, or the problem could be confined to one application. Determine the extent of the problem, take out the appropriate plan and then follow it.

Next Steps

How to build a resilient network design

This was last published in July 2019

Dig Deeper on Network Infrastructure