Problem solve Get help with specific problems with your technologies, process and projects.

NASA space probes, throughput and error probability

This tip delves deep into the myth of bandwidth. Part 4 demonstrates how in an environment such as NASA's space probes, there isn't much tolerance for network latency -- and introduces how errors can slow down network performance.

Get a better understanding of bandwidth and network throughput. Part 4 of the How much bandwidth is enough? tip series demonstrates how in an environment such as NASA's space probes, there isn't much tolerance for network latency -- and introduces how errors can slow down network performance. Read part 3 to learn how network path congestion and delayed ACKs affect network performance, or continue below to see how NASA achieves good network performance.

NASA space probes and throughput

Before moving to more details of how parameters affect throughput, let's consider an extreme example: NASA must communicate with its space probes and landers. While sailing in space, detailed radio command-response events and computer modeling allow accurate tracking of a craft's position.

However, this accuracy is gained only because we understand gravity, planets' orbits, and magnetic and other effects, and we're forecasting positions based on that knowledge. We have no real idea where the craft is at time T, only where it was and how it was pointed at time T-t, when it responded to us. History, plus a model's output, is all we really have when talking about where the craft is positioned "now." For a Mars craft, t is about three minutes: Half the RTT for any command-response signaling event. For Jupiter, t is almost an hour.

It's no wonder that NASA landers don't have a video-game display and joystick here to drive around with there. NASA engineers must have an accurate model of what the robot will do for any command they give it, and they have to couple up many commands to be sent at one time so that the robot's tasks can actually be accomplished in our lifetimes. Waiting more than six minutes per command at Mars may be OK for some, but waiting two hours for a left turn to be acknowledged on Titan is out of the question. The key in this extreme command-response latency environment is intelligence (i.e., navigational) in the far node, plus command-sequence buffering and far-node modeling at our end. Wise design in this kind of situation means we'll still have to wait to see whether the robot has driven over to rock x to sample its surface, but we won't have to issue more than one command-packet burst to get results in one RTT.

Back on Earth, our worst latencies, of only physical causes, are geostationary satellite links -- one-fourth second RTT. In contrast, a leased land line across the U.S. can give command-response times of less than 40 msec. Decades ago, Bell Labs scientists thoroughly researched what humans will tolerate in telephone channels, and a delay of 50 msec was deemed just acceptable. Greater RTTs led to annoying conversations. Large RTTs mean worse things for computer-to-computer exchanges.

  Understanding error probability

NASA addresses its throughput needs by massing commands and data, by understanding how the far nodes work, and by knowing how to code data using optimal protocols (not TCP/IP) for successful deep-space transmission. We've skipped over errors so far, but the effect of undetected, uncorrected errors on throughput is horrendous and protocol-dependent. To minimize error surprises, NASA has long used error-correcting codes to protect commands and data that are worth hundreds of millions -- or billions -- of dollars.

Our last example, therefore, includes basic protocol and path properties, such as windows, timers, RTT, MTU and bit/data rate, but also adds error probability (bit error rate (BER)). It doesn't matter if a network path has a 1 GBps, uncongested data rate; if it loses 10% of the packets being sent, throughput will suffer.

Under such conditions, the amount of application data actually making it through the path per second is at the mercy of the transport protocol chosen (TCP, UDP, SCP, RTP…) and its configurable parameters. If the protocol doesn't recover losses (i.e., UDP, VoIP), then the end application will have to do so, usually much more slowly, or just trundle on. If the transport protocol can't distinguish congestion packet losses (such as router overloads) from physical-error losses (such as CRC errors), then it can do the wrong thing, engaging the wrong recovery algorithm and slowing down throughput tremendously.

⇒ Continue reading in part 5: Rate delay product.

About the author:
Alexander B. Cannara, PhD, is an electrical engineer, a software and networking consultant, and an educator. He has 18 years of experience in the computer-networking field, including 11 years in managing, developing and delivering technical training. He is experienced in many computer languages and network protocols and is a member of IEEE, the Computer Society, and the AAAS. Alex lives with his wife and son in Menlo Park, California.

This was last published in September 2006

Dig Deeper on WAN optimization and performance