Every network has a few links that regularly seem to be overloaded. Identification of these links is easy using...
a network management system's Top 10 report, or by using a script that collects interface information via the command line interface (CLI). Look for interfaces that have high output discards. Note: Input discards are rare and generally mean the packet-processing path in the device is impacted in some way. Interfaces with any input discards should be investigated. Also note that Simple Network Management Protocol uses the term discards, while the CLI often uses the term drops. The CLI output would look something like this:
GigabitEthernet1/10 is up, line protocol is up (connected)
Hardware is C6k 1000Mb 802.3, address is 0008.6a2f.18a5
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 116123
5 minute input rate 62675000 bits/sec, 6824 packets/sec
5 minute output rate 14618000 bits/sec, 4944 packets/sec
Calculate the percentage of discards with respect to the total number of packets output on the interface. Any interface that shows more than 0.0001% drops/discards should be investigated because this is the point at which applications that use TCP will be affected. An alternative is to investigate any interface that shows more than some threshold, perhaps 10,000 drops per day.
Once a congested interface is identified, it can be handled by either applying quality of service (QoS) or increasing the interface speed. We'll take a look at QoS first, and then see which factors determine how link congestion can be addressed and whether the link speed must be upgraded.
Understanding QoS and its role in solving link congestion
QoS can perform multiple functions:
Prioritization: Forward more important network traffic before less important network traffic. Use prioritization to handle time-sensitive packets like voice or video while delaying traffic that is less sensitive to small delays. Small delays are on the order of a few milliseconds (ms) and should be short enough to prevent packet loss or retransmissions by upper-layer protocols.
Policing: Limit the amount of a specified type of network traffic by dropping excess packets above some threshold. Use policing to prevent a specified type of traffic from hogging the network.
Shaping: Delay short bursts of some network traffic so it fills gaps within the total volume of traffic that's sent on an interface, thus resulting in smoothed network utilization. Use shaping to govern traffic like email or file transfers that can tolerate small delays without affecting overall throughput.
QoS is very effective if link congestion is not too severe or if there is a type of traffic that can be policed (dropped). In one consulting case, our customer had a T3 link that was congested during the day. We determined that half of the traffic was from video downloads and music-streaming services. It was easy to configure QoS to prioritize business traffic over the multimedia data, as well as police the video and music to a suitable volume. We also had to increase buffering for the business traffic because it sent many small packets that caused it to exceed the high priority queue's buffer pool.
It is a good idea to limit any buffer increases to the minimum number to make the application work well. Too much buffering creates a situation called buffer bloat that causes TCP's windowing algorithm to become confused, resulting in much lower throughput than normal.
Ideally, QoS should normally be deployed consistently across an enterprise. However, it may also be used on specific links that have serious problems, such as the T3 link situation described above.
When must a link be upgraded?
There are times where a link is saturated and QoS is not a good option. Prioritization can't be used to drop unimportant traffic when the majority of the network traffic is important. Shaping can't work effectively at high-interface utilization because there aren't gaps where the delayed packets can be transmitted.
Here's an example that exemplifies the need to increase bandwidth in a bid to reduce link congestion. Two sites were connected via a 1G bps metro link with a 2-ms round-trip time. Both sites contained data centers and a large user community. Extensive file sharing was being performed across the link; the files were typically very large. None of the traffic could be placed in low-priority queue where it could be dropped (policed) when congestion occurred. The customer saw that the link had a high number of drops, so it increased the number of buffers to shape the traffic. Unfortunately, so many buffers were created that it resulted in 14 ms of buffering. The outcome? Buffer bloat reduced throughput and congestion remained a problem.
The congested link yielded another symptom: an extraordinarily high number of TCP retransmissions on clients and servers, illustrated in the output of the command netstat –s –p tcp (available on Windows, Unix and Linux operating systems). Look for large numbers of retransmissions, perhaps on the order of 10,000 to 50,000 per day. The correct operation of TCP causes a small number of retransmissions as it finds the point at which the path becomes congested.
Fortunately, the customer was planning to upgrade the link. There were other steps that it could have taken to make the existing bandwidth more effective, such as reducing the number of remotely mounted disk drives or replicating some data during the night when network use was minimal.
When a link is congested, QoS can be used to handle short bursts, or it can identify and drop low-priority network traffic. However, when most traffic is of the same priority and the link is heavily congested, the only recourse is to add bandwidth. It is important to recognize when that point has occurred. Even better, reach that conclusion while you still have time to order your link upgrade.
Editor's note: The author of this feature, Terry Slattery, has more information about TCP performance, including its relationship to the Mathis Equation, how to understand interface errors and an additional discussion of TCP/IP performance factors.
Calculating video requirements over the WAN
Understanding the basics of QoS
Bandwidth management and BYOD