Scott M. Ballew
Redundant links in your network provide you with more ways to get your data to its destination, should something fail. But wait...they also give you more things that can fail. This tip, excerpted from Managing IP Networks with Cisco Routers, published by O'Reilly and Associates, gives details.
One failure mode you are likely to consider in a cable failure. Someone digging a trench may cut through a fiber bundle, or a construction crew in a remote building may cut a copper cable, or someone may trip over a drop cable and damage a data outlet in the wall. All of these are potentially disruptive to your network operation, and the natural response is to consider putting in redundant links. You may also be tempted to put in redundant links to deal with the possibility of router failure in a remote building.
However, before you simply start drawing redundant links on your network design, stop and think carefully about what effect the link will have. In many cases, adding a redundant link may have the opposite effect from what you intended. Each new link means a new way things can go wrong, especially in the complex decisions involved in IP routing. For example, if you have redundant paths between two routers, you must be running a dynamic routing protocol, or your routers won't be able to detect a failure and take advantage of an alternate path. Routing protocols can be extremely complex. If a link fails, it takes a finite amount of time for the routers to detect the failure and agree on a new set of routes to bypass the failure. This is known as convergence time. In some cases, the convergence time can be several minutes. When the failed link returns to service, the routers must again detect this and recomputed the optimal paths through the network.
So what happens if the link is falling solely because it has too much traffic on it? When the routers detect the failure, they will begin routing traffic around the failed link. This reduces the traffic on the link, causing it to return to service. The routers will detect this, and begin sending traffic out the link again, increasing its load, and causing it to fail yet again. If this happens consistently, and especially if it happens faster than the router convergence time, the routers will always be in an inconsistent state and will spend increasing amounts of their CPU time computing and recomputing routes. This is knows as route flap. If the redundant link had never been there, the rest of the network would have noticed the main link go away, and would have simply started eliminating traffic for that link.
This is not to say that redundant links are bad. On the contrary, they can be the lifeline of a network faced with a catastrophic failure. However, they need to be placed carefully, and you need to monitor the main link so you know when it has failed. I have, from time to time, been surprised to find that an FDDI ring has been in a wrap state, meaning that one of the primary links had failed and a backup was in use. The ring continued to function, hiding the fact that the main link had failed.