Problem solve Get help with specific problems with your technologies, process and projects.

The truth about network diagnostics

We often say "diagnostics" but we don't really mean it -- at least not in the context of networks. In this featured column, Loki Jorgenson explains why the burden of analysis that leads to identify a network problem usually falls on a network expert.

We often say "diagnostics" but we don't really mean it -- at least not in the context of networks. Today's network...

technologies are just beginning to give us an indication when something is wrong, but that's hardly diagnostic. That would be similar to going to the doctor to hear that you're sick, but obviously you knew that already! Considering what to do about it is the critical next step.

The term "diagnostic" is well defined. According to Merriam-Webster Online:

Main Entry: 1di·ag·nos·tic
Pronunciation: -'näs-tik
Variant(s): also di·ag·nos·ti·cal
Function: adjective
1 a : of, relating to, or used in diagnosis
: using the methods of or yielding a diagnosis
2 : serving to distinguish or identify

And "diagnosis" is defined by "an investigation or analysis of the cause or nature of a condition, situation, or problem." As a noun, "diagnostic" is defined as a "distinguishing mark."

The emphasis in all of these definitions, particularly as routinely applied in medicine, is toward identifying a cause based on outward symptoms or indications. A popular example in networking might be the fabled duplex mismatch – when two interfaces are incorrectly set to half- and full-duplex respectively resulting in devastating performance issues. A good diagnosis might look like: "there is a duplex conflict between workstation X and switch Y at port Y15."

Unfortunately, the burden of the analysis that leads to an identification of the problem usually falls on a network expert. Rarely does network technology provide the concrete, actionable answers or conclusion. An alarm accompanied by some measure of packet loss is more likely. We are left to do the actual troubleshooting. And, as anyone who has had to identify and localize a duplex mismatch knows it is a painful, time-consuming process.

This reality leads to several inevitable consequences.

First, references to "diagnostic" capabilities are often misleading – they more typically mean that there are indications when things are "not quite right." A red light will go on. An alarm will sound. A trigger will be sent. That's the signal for an engineer to get to work.

Problem management or problem resolution?
It also means that the manager or operator is getting lots of alarms. And that leads to other technologies that refine how alarms are generated, maybe against adaptive thresholds or the like, with sophisticated expert systems that can eliminate redundant alarms. This approach leads to "problem management" instead of "problem resolution." Obviously, if there was an identified problem, it would only need to be resolved, not managed.

Second, generating all of these assessments and alarms means more data. Not necessarily anything actionable. Just more meta-data, about other data. So not only are there no conclusions, but more data is being generated. Hopefully it is at least being organized well.

A recognizable sign of such an approach is the presentation of graphs. Graphs are great and they are very powerful tools. However, it should be noted that they are intended to communicate relationships or patterns within data to humans and do not actually offer any solutions. A skilled engineer may recognize a behavior associated with a particular problem and thus the graph can help identify the issue. But it doesn't usually solve the problem on its own.

Third, the human is now the bottleneck in the troubleshooting process and thus only scalable with the availability of additional expert resources. They are often expensive and in short supply. Therefore, the typical approach has been to "manage" problems until they can be resolved by skilled people.

Support centers as exemplar
For example, support centers are rife with people, who are not network experts, dealing with network problems (on customer's networks) with limited information:
         "Have you tried turning it on and off?"
Without experts, these sorts of problems are routinely addressed by recipes, workarounds and guessing; escalated to the over-worked experts; or left unresolved.

Most recently, Mean Opinion Score (MOS) has appeared as the latest form of a "diagnostic indicator." It may tell us if something is awry but not what or where and further, how to fix it. MOS is 2.7 and that's unsatisfactory. While it's a step forward -- it's not truly diagnostic.

The denotative meaning of "diagnostic" points toward actionable identification. However, the connotative meaning is more like "supports troubleshooting done by humans."

Surprising? No, not really. As I recently pointed out in a column on "network retardation," most network technologies are still at stage 2 of the Gartner IT maturity model. At this level, most processes are "human-based and assisted," otherwise referred to as "pro-active." So it might be reasonable to expect that network diagnostics are still the province of humans.

Just a lot of noise about semantics for nothing then? Certainly not. The denotative meaning of "diagnostic" points at the yet-to-be-realized potential of "self-aware" networks – part of the future that IBM calls "autonomic." And the clear distinction between the denotative and the connotative raises human awareness of what to look for in the emerging technologies .

The bottom line: Choose enabling technologies that arm your best people with enhanced diagnostic capabilities and make your non-experts more effective. Interpret the meaning of marketing phrases appropriately -- make sure your expectations are met. Avoid self-serving approaches to "managing problems" and focus on the solutions that really identify and resolve them. Those will be the ones that will scale into the near-future.

Chief Scientist for Apparent Networks, Loki Jorgenson, PhD, has been active in computation, physics and mathematics, scientific visualization, and simulation for over 18 years. Trained in computational physics at Queen's and McGill universities, he has published in areas as diverse as philosophy, graphics, educational technologies, statistical mechanics, logic and number theory. Also, he acts as Adjunct Professor of Mathematics at Simon Fraser University where he co-founded the Center for Experimental and Constructive Mathematics (CECM). He has headed research in numerous academic projects from high-performance computing to digital publishing, working closely with private sector partners and government. At Apparent Networks Inc., Jorgenson leads network research in high performance, wireless, VoIP and other application performance, typically through practical collaboration with academic organizations and other thought leaders such as BCnet, Texas A&M, CANARIE, and Internet2.
This was last published in January 2006

Dig Deeper on Network management and monitoring

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.