Troubleshooting network errors is no trivial task, and there are some misconceptions about what causes errors and how to correct them. Learning about the most common errors and their potential causes will help you save time and effort, and assure that any project you roll out will run reliably. You can find network errors by checking SNMP. This can be accomplished with a network management tool or network analyzer, or by looking at...
your switches' port statistics.
FCS, or File Check Sequence Errors, are one of the more common errors found in a network. When packets are transmitted and received, each contains a File Check Sequence that allows the receiving device to determine if the packet is complete without having to examine each bit. This is a type of CRC, or Cyclical Redundancy Check. Barring a station powering up or down during a transmission, the most common cause of these errors is noise. Network noise can be caused by cabling being located too close to noise sources such as lights, heavy machinery, etc. If a cabling installation is particularly faulty -- such as pairs being untwisted, improper terminations, field terminated patch cables, etc. -- these errors will occur on your network. Poorly manufactured components or minimally compliant components that are improperly installed can compound this issue. Cabling segments that are too long can also cause these errors.
Alignment errors are caused when a file has an uneven number of bytes not divisible by eight and a FCS error. Cabling issues, as defined above, or MAC layer packet formation issues (possibly hardware related) cause these errors. A faulty LAN driver can also cause this. Replacement of the driver will correct the latter issue. These errors can also be seen in correlation with RUNT packets or packets that are too short. Noise, however, is the most common cause and can generally be corrected by addressing the cabling channel.
CRC errors are a combination of both alignment and checksum errors. Analyzing other conditions that exist will help determine the cause of these errors. A CRC error is caused when the cyclical redundancy check fails. It can be caused by a faulty network interface card (NIC). In general, if the machine is failing to maintain a connection and the errors are consistent and often attributable to a single PC, the NIC should be checked or replaced. If the errors are intermittent and the PC drops only occasionally, the errors are caused by something other than a NIC. CRC errors in excess of 1% of the network utilization are worthy of a fix.
Collisions and late collisions are two separate anomalies with similar solutions. Collisions occur when more than one device tries to use the network at the same time. This does not happen in a full duplex network. There are many networks out there today that are still running at half duplex. Collisions will occur in half duplex networks, which are shared networks (such as those connected through hubs). As this is older technology, it is not covered here. The best remedy for collisions is to upgrade to a full duplex switched environment.
Discards are also common and are a functionality of network devices. Discards can be caused by a buffer being too full which prevents the network from getting the packet from the device (transmit discards). Another cause of discards is that the packet may not be able to be delivered to an upper layer protocol due to congestion or other error (receive discards). In the case of frequent retransmissions, discards can increase due to the additional traffic loads.
More resources to address common network problems
Everything you need to know about network cabling technology
The history of network cable for network professionals
Book chapter: Troubleshooting your network
Unknown protocol errors are generally a result of the network not recognizing the IP protocol port. The reason for this is usually some anomaly, such as the port exceeding 1024. These errors can be corrected in a variety of ways and you will generally rely on the switch manufacturer or router manufacturer for a solution. It can also be that a card or port is dynamically assigning an IP port that is not recognized, but the solution is the same. Barring a corrupted packet, which can be identified by packets that exhibit other errors, this problem generally resides in the active components. It is a good idea to rule out other errors before moving to any change in your active equipment.
It is important to note that each switch manufacturer and router manufacturer carry both managed and unmanaged switches. If you have an unmanaged switch or SNMP is not turned on you will not see these errors without a packet sniffer. Sniffers will decode packets and in some cases can decode passwords. The trend is to utilize a network management tool. If you do not feel comfortable analyzing the errors, there are several options available to you. One would be to solicit the help of a Certified Infrastructure Auditor. Any audit should include a combination of both the physical layer and the electronics layer.
Carrie Higbie, Global Network Applications Market Manager, The Siemon Company
Carrie has been involved in the computing and networking industries for nearly 20 years. She has worked with manufacturing firms, medical institutions, casinos, healthcare providers, cable and wireless providers and a wide variety of other industries in both networking design/implementation, project management and software development for privately held consulting firms and most recently Network and Software Solutions.
Dig Deeper on Network Performance Management
Carrie Higbie asks:
What network errors do you find the hardest to diagnose?
0 ResponsesJoin the Discussion