Understanding TCP behavior can help IT engineers pinpoint network problems or packet loss, which can result in WAN performance problems. The tip below explores how the TCP transport can cause latency in the network and advises what IT professionals can do about it. To read the previous article in this How much bandwidth is enough? tip series, go back to part 5, Rate delay product.
Troublesome TCP behavior
When comparison of end traces showed that physical packet losses were occurring, the network technician examined port statistics on each router interface along the path common to just the slow user locations. Sure enough, one interface was found experiencing CRC errors, owing to a defective leased line. Correcting this problem eliminated one of the largest delay components, and because TCP does not know whether packets are being lost to errors or to congestion, so TCP just slows down at the sender (creating network latency), regardless of how packets are being lost. This is because neither it nor IP has been given any congestion-notification capabilities that routers could use explicitly to warn end nodes of link congestion.
We can see the large effect of this TCP behavior in the next graph:
Server-side and station-side traces of losses show only a few attempts at fast recovery. A thrice-repeated ACK occurs only twice, from the node sensing a loss, and the server doesn't do a very fast retransmit. The upshot is that it always takes about 2.5 seconds to recover from a loss, whether one, two or three packets are lost from the server.
This graph plots the differences between the lines (red = black minus red) in the previous throughput graph and shows (red) how far behind (or ahead of) average throughput the transfer is at any second. This gives better resolution to the flow. The blue markers just indicate where each 21-packet (31 KB) SMB block ends (where the 150 msec, delayed ACK penalty is paid):
- Without delayed ACKs, the blue marks would be much closer together, the red curve much steeper downward, and the graph's length shorter.
- Gaps among the blue markers are caused by packet losses. There are 14 such 1 to 2-second gaps in 160 seconds.
- The downward-sloped pieces of the red curve indicate that throughput is catching up to -- or moving ahead of -- average, after loss recovery.
The red downward slopes are all about 6,200 Bps faster than the average. So, when there is no packet loss, 27 KB + 6.2 KB = 33.2 KB of data are transferred per second. This is a 23% improvement in throughput. The implication should be clear: If a 0.25% packet loss rate can reduce throughput by 20%, either TCP should be reconfigured to properly use the newer recovery rules (Fast Retransmission via Repeated Acks), or it is not the right transport to use, so extreme vigilance must be maintained at every network interface carrying its traffic.
These examples should show how careful an architect, manager and technician must all be to:
- Understand all aspects of candidate physical-network components and links
- Understand all aspects of candidate network protocols
- Understand what tools can be used to expose what network parameters
- Understand what protocol-stack vendors' products provide, in terms of default and adjustable parameters
- And be kind to users.
Remember, be ready for anything, and don't just buy a faster switch or lease a faster line!
For more information:
About the author:
Alexander B. Cannara, PhD, is an electrical engineer, a software and networking consultant, and an educator. He has 18 years of experience in the computer-networking field, including 11 years in managing, developing and delivering technical training. He is experienced in many computer languages and network protocols and is a member of IEEE, the Computer Society, and the AAAS. Alex lives with his wife and son in Menlo Park, California.