Manage Learn to apply best practices and optimize your operations.

Troubleshooting routing: Strategies for fast problem solving

Network troubleshooting can be challenging for a number of reasons, not the least of which is the lack of a standard methodology. In this tip, Tom Lancaster looks at some of the best and fastest ways to find out where your problem lies.

Network troubleshooting can be challenging for a number of reasons, not the least of which is the lack of a standard methodology. In this tip, we'll look at troubleshooting methods from a couple of different angles.

Formal approaches

Typically, when someone mentions a methodology, we think of something like the scientific method, which we might alter a bit for our purposes. Thus, we might go through some distinct phases in our troubleshooting where we would first prepare by understanding the normal, steady-state operation. Then, when the trouble occurs, we would define the problem, based on symptoms (e.g. "the network is slow" or "I cannot connect to the VAX"). Next, we'd identify the current state of the network, performing steps such as checking to see if the WAN circuits are up or collecting device logs as appropriate. Finally, we'd form a hypothesis and test it.

While a formal methodology does provide some semblance of scientific rigor for an otherwise artsy process, and it does increase the odds of success, it also has some drawbacks. Primarily, it's slow. This is because it takes time to work through the initial steps which necessarily cover a lot more ground than is relevant to the problem, since we don't yet know what the problem is. Second, it doesn't take into account the natural process of learning, e.g. "It took me two hours to figure out why the network was slow the first time Bob in Accounting ran his application, but now it's the first thing I check when users call."

The OSI model

7: The application layer
Communication partners are identified, quality of service is identified, user authentication and privacy are considered and any constraints on data syntax are identified.

6: The presentation layer
Usually part of an operating system, this layer converts incoming and outgoing data from one presentation format to another. Sometimes called the "syntax layer."

5: The session layer
Sets up, coordinates and terminates conversations, exchanges and dialogs between the applications at each end. It deals with session and connection coordination.

4: The transport layer
Manages end-to-end control and error-checking. Ensures complete data transfer.

3: The network layer
Handles routing of data. The network layer does routing and forwarding.

2: The data-link layer
Provides synchronization for the physical level and does bit-stuffing for strings of 1's in excess of 5. It furnishes transmission protocol knowledge and management.

1: The physical layer
Conveys the bit stream through the network at the electrical and mechanical level. Provides the hardware means of sending and receiving data on a carrier.

Another set of methodologies with a lot of proponents is based on the seven-layer OSI model. These suggest attacking the problem from either the top or bottom. For example, start by testing the application layer. If that works, move to the next lower layer you have a way to test, until you get down to the physical layer, where you find yourself crawling under desks and through closets. Methods like these are based on a "process-of-elimination" concept where you figure out what the problem isn't, and whatever's left must be the problem. Again, it's not a bad thing, and it's popular enough that I've even seen certification tests with questions where this was the correct answer.

Still, as you get more experience troubleshooting networks in general, and your current network in particular, you'll find this process a little tedious. So, my tip to help you troubleshoot faster is to understand the benefits of several methods and use the best of each together.

Faster methods

When you first become aware of a problem, you should make a conscious effort to first understand the severity or complexity of the issue. Ask yourself: "Based on the symptoms and a minute or two of investigation, is this something I've seen before? Can I fix this quickly, or would I benefit from the structure of a formal methodology?" If you choose the former, but the issue remains elusive, you should periodically revisit this question.

Next, as you work a problem, I'd suggest not starting from the top or bottom of any list and proceeding in order. Rather, do the fastest items first. For instance, starting in the middle of the OSI model with a ping is fast and immediately lets you know, if successful, that there's nothing wrong with Layers 1 or 2, and if unsuccessful, no amount of diddling at the application layer will result in connectivity. Another fast start is checking a network management console. What's red? What's green? Hopefully, you have in place an array of such tools that have a quick dashboard-style view into your network.

As an example of a list of things I'd check for a routing problem where the symptom is loss of connectivity, I'd start a ping to show that it's not working, followed quickly by a traceroute to give me a general idea of where the problem might be. Once I logged into the last router to respond to the traceroute, I'd check the routing table to see if it has an entry for destination and that the next hop points in the right direction.

More on this topic

Troubleshooting and analyzing WAN-deployed applications

More Routing & Switching tips

If it doesn't, or it isn't immediately obvious why (such as "an interface is down"), I'd start a more ordered approach to troubleshooting, which would involve checking the protocol's database (assuming you're using OSPF, EIGRP or BGP) to see if an advertisement was received but not installed in the route table, followed by checking for interference (from IDS or firewalls, or ACLs, distribute lists, prefix-lists, etc.), followed by debugging, followed by long conversations with the router vendor's helpdesk.

The point is that each of these steps takes longer than the previous one. Do what you can do quickly first -- then as efforts get more involved, start to use a mini-"scientific method"-like approach in each step. And throughout your process, keep notes. Make them just a little more detailed than you think is necessary.

Tom Lancaster, CCIE# 8829 CNX# 1105, is a consultant with 15 years experience in the networking industry, and co-author of several books on networking, most recently, CCSPTM: Secure PIX and Secure VPN Study Guide published by Sybex.

This was last published in February 2006

Dig Deeper on Network Infrastructure