Problem solve Get help with specific problems with your technologies, process and projects.

Network fault analysis: Putting the guru in the analyzer

Today's TCP/IP networks require packet-level analyzers for proper troubleshooting, just as older networks did, but these analyzers must be smarter and more efficient than ever.

The Internet was supposed to simplify networking by converging everything on IP for subnets and TCP for transport, but over time, things have become more complex than ever. We developed our own private intranets, moving to private IP addressing, and added NAT routers. Then, privacy became a concern and we added firewalls, and we needed to deal with security issues like intrusion detection, worms, viruses, spam, and even disgruntled employees. In addition, we added new technology and applications, including local switch ports for every user, wireless for mobility, gigabit speeds, and now the latest -- the expectation that our data networks will carry VoIP with the quality and reliability we've come to expect from our phone system.

Much of the new technology and services standardized over the past few years involves new protocols, from the physical up through the application layers. Even the old stalwarts like IP and TCP are still evolving as vendors tweak their protocol stacks and the Internet Engineering Task Force (IETF) introduces optimizations like TCP selective acknowledgements and scalable windows. With so many devices and protocols in play over broadly distributed networks, the old model of site-based troubleshooting falls short.

Traditionally, companies had a handful of gurus who were specialists in specific protocols or problem sets, and who carried around cable testers or hardware-based protocol analyzers, relied on SNMP "red light/green light" tools, and dug through sys logs or performance monitoring information to troubleshoot network faults. But these days, it takes far too long to get answers using these methods, especially when IP networks span the globe and a worm can decimate hundreds of desktops and servers within an hour.

New infrastructure requires new troubleshooting tools

A user is experiencing problems with a particular based application, including long wait times for certain transactions. A good place to start is to view the user's application in action using our console, which is connected to the remote expert analysis engine. Because of security concerns, the communication to the remote engine must be able to run over existing VPNs, or if no VPN is available, to provide encryption based on secure government standards. To minimize bandwidth requirements between the console and remote engine, packets and other data must be compressed and transferred only if the user needs to view or save it.

To facilitate rapid responses, we need the option to have the analysis engine send us a page or an email when a critical event is diagnosed. Further, the remote engine should be able to continuously save buffers of captured packets to its local hard drive, in case we can't examine the data in real time. Such a capture is limited only by the capacity of the hard drive, allowing us to go back hours, or even days to a particular event.

At the console, we need to be able to view the remote engine's expert system to get a view of client and server flows (a conversation or connection between the client and server) and a summary of problems such as "Busy Network or Server," "Slow Response Time," or a very specific diagnosis like "SQL Server Client Error." The console and engine need to work as one, providing an integrated GUI allowing the engineer to click on a specific problem (or problems) and have the expert automatically select the only problem packets of interest and switch to a of the packet buffer located at the remote engine. To simplify diagnosis, we should have an option to easily hide all other extraneous packets and problems so we can focus strictly on the problem at hand.

Now that we are zeroed in on only the specific packets pertaining to the problem, further examination reveals that the user is experiencing a database deadlock condition followed tens of seconds with a problem in attempting to rollback because the deadlock was not resolved. These features and requirements allow us to literally analyze the client/sever flow and drill down to the problem in less than 60 seconds.

This level of integration can only be achieved with a system based on a console and remote engine architecture that breaks away from the old model of simply sending display bitmap image updates from an application running on a remote probe. It also requires tools with enough native expertise to recognize packets related to specific problems and provide tools to quickly isolate and diagnose them.

For rapid responses to today's complex problems, we need tools that put the guru in the analyzer.

About the author: Scott Haugdahl, CTO of WildPackets Inc. brings over two decades of experience in network analysis, troubleshooting, and optimization to his new role. In 1996, he founded and presided over the Net3 Group, a WildPackets strategic partner that specialized in developing intelligent tools for the professional network engineer. Net3 Group was acquired by WildPackets in 2000, and Haugdahl then assumed the role of VP of Analysis Technology for the newly-combined organization. Prior to that time, Haugdahl consulted to enterprise clients, wrote a regular column for Network Computing Magazine, authored a popular text on network analysis and troubleshooting, designed and taught professional networking classes worldwide, and was one of the original architects and presenters of technical network analysis forums at NetWorld+Interop.

Today's TCP/IP networks require packet-level analyzers for proper troubleshooting, just as older networks did, but these analyzers must be smarter and more efficient than ever. Lets look at a brief case study to look at some of these capabilities in more detail, and to see how they might help in today's environment. The focus is on how quickly the system can diagnose faults and assist the user in drilling down to the precise nature of the problem.
This was last published in October 2004

Dig Deeper on Network management and monitoring

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.