The horrible fact is that TCP/IP was simply not designed for today's high performance applications.
"Best effort" most accurately describes your typical TCP/IP performance. For some applications that simply isn't good enough. And yet doing any better is an up-hill battle.
TCP, or Transmission Control Protocol, often poses the greatest obstacle. Originally developed in the late 70s, the version 4 of TCP and IP remains largely unchanged. At that time, networks were simpler, the capacities were much smaller and the applications far less demanding (typically file transfer, remote login, and email). Frankly it is a huge credit to its designers, Robert Kahn and Vinton Cerf, and to its other contributors that TCP/IPv4 has worked as well as it has.
However, a range of new technologies has pushed TCP/IP to its limits: Gigabit Ethernet and 10 Gigabit, wireless and 3G, trans-continental and satellite links are all demanding in different ways. In addition, now common-place applications like Voice-over-IP, emergency data recovery and data replication, and, of course, the Web itself have placed incredible stress on these fundamental networking protocols.
So why is TCP a problem?
While IP defines "packets" and how they are routed through the network, TCP specifies the terms of delivery of those packets between two hosts, as well as providing mechanisms for recovering from errors and controlling the use of network resources during the delivery. One of the most key features of TCP is congestion avoidance — by constantly monitoring the round trip times (RTT) of transmitted and acknowledged packets, and recording re-transmit requests that indicate packet loss, the TCP implementation (or stack) on a host tries to determine if it is encountering cross-traffic on the network path. If it detects indications of congestion, such as loss or extreme RTT, it reduces the load it is sending until those indications disappear. Then it slowly ramps back up until the symptoms re-appear. Eventually it reaches an equilibrium point that represents its fair share of the available network resource.
This tendency to fair use is, well, nice — and, as we all know, nice guys finish last. So achieving high performance with TCP, particularly under extreme or less-than-ideal conditions, can be very challenging. Native versions of TCP like Reno, Vegas, and Tahoe often do poorly under conditions such as:
- high latencies ( > 100 ms)
- large capacities ( > 100 Mbps)
or a combination of all of them. TCP will slow down in response to congestion that isn't really present. Or it will take an exceedingly long time to speed up to full capacity (the so-called slow start mechanism). Or it will share bandwidth resource with other applications that don't require it.
There are a variety of relatively common situations that can lead to significant performance degradations due to TCP, or its interaction with other network behaviors:
- Wireless - due to the inherently lossy nature of the medium, packet loss is typical and highly variable. TCP stacks such as Reno incorrectly infer that loss is congestion and slows down inappropriately. Bad cables, or other forms of malfunctioning media, can induce the same effect.
- Bandwidth/delay product - very high latencies, for example network paths through satellites ( > 250ms), aggravate TCP's normal slow start mechanism, causing it to take very long times to reach full capacity. Once at full capacity, even a single lost packet can set back TCP's progress dramatically, particularly on high capacity links.
- Gigabit Ethernet - poorly configured window sizes, or insufficient transmit or receive buffers, can impact the operation of TCP, restricting it unnecessarily or even causing it to fail. Tuning of the stack is possible although manual configuration is not a scalable solution.
- MTU black holes - use of jumbo frames or badly implemented VPNs can cause MTU constrictions that fail to generate RFC1191-compliant error messages. Packets seem to disappear once the TCP window reaches a size greater than the hidden path MTU. Subsequently slow start and congestion avoidance interact cyclically with the black hole.
Of course network dysfunction like duplex mismatch can similarly create performance problems but TCP isn't particularly at fault in that case – usually auto-negotiation is to blame.
So what to do?
Fortunately, network researchers such as Sally Floyd of the International Computer Science Institute, Matt Mathis of Pittsburgh Supercomputing, and Stanislav Shalunov of Internet2 are actively developing a range of solutions. Some efforts are aimed at manual wizard tuning techniques for existing TCP stacks, as well as auto-tuning mechanisms, such as Web100. Other projects are intended to offer unique transport protocols for specialized contexts such as TCP Westwood for wireless. The most broadly applicable projects though, such as FAST and BIC-TCP, are aimed at next-generation versions of TCP.
These new implementations of TCP face challenging constraints. In particular, they are typically required to be "friendly" with respect to legacy TCP traffic - applications that are employing enhanced stacks should not overrun applications without - while offering significantly improved performance. The end result is a TCP stack that has all the desirable legacy characteristics, such as true congestion avoidance and "fair sharing", but none of the detrimental ones. Stacks like FAST and BIC-TCP show most of the requisite behavior with little negative tradeoff.
If you are looking for truly outstanding network performance, consider looking closely at your TCP implementation. High performance begins with a clean IP network and an effective TCP stack.