I am looking for some troubleshooting suggestions. Our network has been experiencing intermittent connectivity issues. We get a network-wide (LAN) error message on each workstation that says "a network cable is unplugged." Powering off and restarting the network switch seems to correct the problem after about 30-45 minutes. But then the problem repeats itself about 6 hours later. I am concerned that this may be a sign of a dying switch, but I do not know how to confirm that. Any suggestions?
As in all sorts of cases, the answer to this question is that it depends. Issues like the one you describe here often have many root causes from a faulty or failing power supply, hardware failure, or a dying switch. It can also be a symptom of a serious networking issue. For example, I have personally witnessed something like this due to spanning tree problems where spanning tree loops went unnoticed by the network team. When this problem occurred, it was due to the fact that massive amounts of the data was being switched in a continuous loop and the switch just could not keep up.
All hope is not lost however. To really diagnose this problem as hardware or networking related, I would suggest a few key troubleshooting steps. It is easier to eliminate hardware issues first as they will tend to be fault-based. The harder types of issues are network topology-based where it might take a certain network load to reproduce. So eliminate the hardware first, then identify if it is related to network engineering.
What type of switch is it? Is it a managed or unmanaged switch?
If the switch supports SNMP or some type of logging, I would evaluate the CPU, Memory, and interface utilization/discards/error rates for possible root cause information. Managed switches often have the capability of identifying issues through diagnostic tools produced by the vendor or SNMP/syslogging. Oftentimes when a switch is about to die, several severity 1 type of logs are produced indicating a failure or doing some sort of core dump of the internal programs. If it doesn't have this capability, then I would recommend you upgrade the switch to something that has some logging and manageability features. Managed switches, especially as network demands grow in the environment, will help you identify problems and manage LAN performance much better than having a lot of unmanaged switches.
If the problem is not related to hardware, try some of these questions. What type of topology do you have in this environment? How many devices? Is it a large or medium sized network? Spanning tree issues with multiple switches take a little patience, especially when there is not a clear network topology. I would recommend clearly mapping out the switching architecture to identify where loops might occur then diagnose if you have those loops. If you need some resources related to this, please let me know and I'll write some up.
This was first published in March 2007