"Artificial intelligence," in the sense of thinking machines, has become an utterly defamed term. This is a direct consequence of its hyperbolic application over the last few decades in the computing sciences. Somehow, as if we had not learned to be a bit more humble, the word "intelligence" is over-hyped yet again. This time, it is spuriously applied to networks and their management.
Certainly there is some good and necessary work happening, carried out by industry leaders and implemented by visionary enterprises and early adopters, but the slightly mad marketing machine is threatening to dishonor those achievements with references to "intelligence in the network."
Time for a reality check
First, the prospect of self-healing and self-provisioning networks appears substantial and beneficial. But let's be realistic -- "autonomic" refers denotatively to the automatic processing of rule sets representing established relationships among system elements. More like monkeys turning cranks than Einstein-in-a-box. In the most popular context, the human autonomic systems such as respiration and blood pressure, the emphasis is on processes that occur below the level of consciousness, often without regard to our preferences or dignity (e.g., fainting).
Certainly, any level of autonomic behavior in networks will look practically like magic compared with what we do today. And such adjectives as "clever," "automated" and maybe even "aware" might apply. But it's not
OK, rant against marketing abuses aside, there are some further realities to consider.
Self-healing or self-provisioning demands some major advances in the generation and description of network state. For example, an automated repair mechanism will require unambiguous identification of the source of performance degradation, its location, and the requisite remedial action. Technology which can offer that has not (yet) fully matured.
Consider the dreaded duplex mismatch -- typically a NIC interface on a workstation connected to a switch port, each using a different duplex behavior (one using half-duplex, taking turns transmitting, and the other using full-duplex, receiving and transmitting simultaneously).
This regularly happens on most IP networks where interfaces are manually set instead of trusting auto-negotiation, or a mix of auto-negotiation and hand-set is intentionally (but mistakenly) used. The problem is simple to fix -- but very hard to find. And devastating to applications of all sorts.
An automated network management system (NMS) would need to detect the presence of duplex conflict in a network path, localize it to two interfaces, change one or both settings, and then validate the effect. Naturally, it seems appropriate to manage this fault in terms of devices -- inspect all the device interfaces and change them as needed. However, the network path as seen by the application and the logical diagram associated with a device management system can easily be different -- maintaining an NMS is a full-time job (actually, several full-time jobs). Further, many of those devices on a given end-to-end path may be on someone else's network. And finally, validating the effect from the application perspective cannot be done at the device level.
But let us make the unlikely assumption that a duplex conflict can be handled through device management.
Consider, then, that a duplex conflict is just one of a host of possible network degradations that need remediation:
- Half/full-duplex conflicts
- Poorly performing NIC and drivers
- MTU conflicts
- Unintended bandwidth bottlenecks
- Rate-limiting queues
- Media errors
- Overlong half-duplex
- Artificial congestion
- High latencies
- Broken QoS mechanisms
- Routing flapping
- Undetected failover
Some are configuration-related (e.g., duplex- and MTU-conflicts), some are Layer 1 (e.g., wireless loss), others belong to Layer 2 or 3 (e.g., bad NICs) or are specific to end-to-end behavior such as broken QoS, and some are inappropriately applied mechanisms (e.g., CAR-style rate-limiting for real-time applications).
The significance of these various forms of network degradation is routinely quantified in terms of their effects -- their apparent transience, the attendant levels of loss or jitter and, more recently, the impact on particular applications such as VoIP through such measures as MOS. This is a symptomatic approach to troubleshooting that doesn't differentiate one effect from another. That leaves it up to smart engineers to conclusively diagnose the source of the problem -- often after many days, or even weeks, of painfully time-consuming investigation.
Sure, knowing you have a problem is half the battle. But it's only half.
Symptoms such as loss and jitter, and even MOS, are simply too ambiguous for effective diagnosis -- by smart humans or even mildly clever machines. So they won't be sufficient to enable automated remediation mechanisms. They will need the "whole nine yards" (why not 10? Click here for an explanation of this idiom.)
Of course, prevention would be ideal. But, consider again the duplex conflict example -- auto-negotiation has so far been part of the duplex-conflict problem, not the solution (although the industry has been making steady progress, and auto-negotiation is now -- anecdotally at least -- more reliable).
What is the alternative?
As Einstein once said:
The device-level approach, while still offering an opportunity for improving the situation, will not be the source of the solution. One alternative is the end-to-end characterization of the network path as it pertains to the application and its performance.
"The significant problems we have cannot be solved at the same level of thinking with which we created them."
Instead of relying solely on coarse-grained packet-level symptoms such as loss, a fully characterized path offers views on application performance that can uniquely identify and localize the source of degradation. That characterization includes the nature of the path, the contributions of the individual elements, the requirements of the application, and an accounting of the current conditions.
New technologies are being developed that contribute to this picture (see The new network science). The optimal implementation will probably require a blend of capabilities that include passive packet monitoring, active testing, application performance modeling, device management, NMS, and effective mechanisms for remediation and provision (SNMP needs to go away -- now!).
At the end of the day, it is clear that autonomics and self-healing will require specific details of the network, not symptomatic generalities, for any remedial or provisioning action to be reliably executed. Meanwhile, we wait and rely on a dwindling supply of smart, highly educated engineers and operators to keep our network infrastructures going. And try to pick through the marketing hype for diamonds in the technological rough.
Remember – if it says only "loss and jitter," it just isn't good enough.
Chief Scientist for Apparent Networks, Loki Jorgenson, PhD, has been active in computation, physics and mathematics, scientific visualization, and simulation for over 18 years. Trained in computational physics at Queen's and McGill universities, he has published in areas as diverse as philosophy, graphics, educational technologies, statistical mechanics, logic and number theory. Also, he acts as Adjunct Professor of Mathematics at Simon Fraser University where he co-founded the Center for Experimental and Constructive Mathematics (CECM). He has headed research in numerous academic projects from high-performance computing to digital publishing, working closely with private sector partners and government. At Apparent Networks Inc., Jorgenson leads network research in high performance, wireless, VoIP and other application performance, typically through practical collaboration with academic organizations and other thought leaders such as BCnet, Texas A&M, CANARIE, and Internet2. www.apparentnetworks.com
This was first published in June 2006