In a previous tip, I discussed formal and informal troubleshooting methodologies using routing as an example. In this tip, I want to look at another dimension that is critical to timely troubleshooting -- a solid understanding of systems and their components, using switching as an example. An understanding of systems is critical to troubleshooting simply because you have to know what to peek and poke at, and though it's perhaps a statement of the obvious, it's becoming more important and more difficult by the day.
More specifically, many of the ways we attempt to envision today's complex networks (or explain them to non-technical parties) simply don't lend themselves to troubleshooting. Case in point: switching. In reality, your switches probably perform many very distinct functions, most of which have nothing to do with the academic definition of "switching." In order to troubleshoot your environment with a minimum of effort, you need to understand these functions -- how they work in isolation and how they interact with the other functions in the switch -- so you can eliminate the unlikely causes of failure and quickly get to the root cause.
This list of systems and components that provide extra functionality typically built onto the basic switching function is far from comprehensive, but it illustrates the point:
Addressing and routing
In order to know which port to use for forwarding a frame, the switch needs a database of MAC addresses. This is usually called the Forwarding Database (Cisco calls it the CAM). In addition, most vendors have proprietary ways of caching this information at the port to minimize demands on the CPU. There are plenty of technologies out there, such as Microsoft's Network Load Balancing, which when "properly" configured result in interesting behavior from your switch, like flooding frames out of all the ports.
Access and security
Features like IEEE's 802.1x Port Authentication and Cisco's Port Security and Layer 2 ACLs can be extremely useful in some circumstances, but they also present dozens of new ways to misconfigure devices or introduce unexpected downtime.
Port and switch configuration
A decade ago, switches that could automatically detect and set link speed and negotiate the duplex were pretty spiffy. TechTarget readers are no doubt well aware of the legendary problems this caused. Today's switches have many more protocols to detect and configure links. These are such things as Uni-Directional Link Detection (UDLD), port aggregation protocol (PAgP), and other negotiation methods built into specific technologies -- for example, the method built into the Power over Ethernet spec to keep it from sending power to a device that isn't expecting it. In addition to ports, components such as VTP can even configure other switches (VLANs, in this case).
Of course, we can't avoid mentioning spanning tree protocol (STP) and all the various recent incarnations like Per-VLAN Spanning Tree (PVST). The point of all this is that if you have a simple problem -- for instance, "I can't ping your PC from my PC and they're both on the same subnet, separated by several switches" -- then almost any of the components above could be the culprit.
- There could be an Ethernet loop due to an STP issue or an MLT/Etherchannel link misbehaving.
- Your PC may not be allowed onto the network because you didn't authenticate, or you put a hub at your desk and connected more PCs than Port Security permits.
- Perhaps there are far more MACs on the network than can fit into a switch's FDB, and the subsequent flooding is causing congestion.
- Perhaps QoS is configured to put ICMP into a low-priority queue, so regular traffic is working but I just can't ping you.
Hopefully, you've disabled all of the protocols you don't need; even then, there still could be dozens of possible causes of this problem. If you organize your thoughts about components by function, however, it will help you construct the questions and tests that ferret out the problem.
Tom Lancaster, CCIE# 8829 CNX# 1105, is a consultant with 15 years of experience in the networking industry. He is co-author of several books on networking, most recently, CCSPTM: Secure PIX and Secure VPN Study Guide published by Sybex.