There are "tools-a-plenty" to assist administrators with identifying when a network is down and several approaches to react to the alarms. But what method is best? The short answer is, none of them. No single method works in every situation. This article will help explain some of the methodologies used in troubleshooting and where they may fall short as a permanent fix.
There are basically two approaches to troubleshooting, top down and bottom up. Today we'll discuss the top down approach, and in my next tip, we'll cover the bottom up approach.
In a top down approach, you'll begin at the upper layers of the OSI protocol stack. You'll test the application to be sure it is working, then ping the servers, and so forth until you are at the bottom of the stack or the physical layer. This approach is the best to use if multiple users initiate the help desk calls. It is very rare that physical layer problems will be an issue for all users, unless of course, it happens to be the only server connection. This methodology allows you to determine if the application or server is down, slow, or for some reason non-responsive to network commands. In order to be effective it is generally aided by some tool or network monitoring application that can provide some type of trending and actionable data.
Actionable data could be as simple as a ping that results in a host unreachable all the way to monitoring bit errors and other errors delivered via an SNMP trap. The real trick, however, is to determine the cause of the errors. In order to be effective in doing so, a methodical troubleshooting plan should be used. This should certainly include more than rebooting a server. If a server is going down, there is something causing it to do so. It may be a memory leak, over-utilization in the processors, or other issue, but rebooting should be considered a bandage, not a solution. So, what exactly is actionable data? It is data that provides enough information to be useful and clear enough to determine a plan of action.
Most management packages and monitoring tools allow a network administrator to set thresholds for performance outside of an acceptable range. Knowing where to set these for specific issues will require a bit of trial and error. Set too low, they will make a pager a belt massager, too high and they will result in unemployment. Blindly accepting the defaults can result in under utilization of the tools. Any time you deploy management software, I recommend you cough up the bucks and get trained. The best training would ideally be on site -- in your environment, by someone certified in the software package. That way you can eliminate the modules you don't want or need to use and tune the ones that will provide you with the best information. Bandwidth heavy applications and heavily utilized servers will require the most tuning to be of benefit.
Another benefit of management software is the ability to query disparate equipment and retain statistics and trends in one reporting tool. In the old days, and still in many environments today, we had to double click on each switch in a wide variety of interfaces depending on the server software and active electronics. Not quite a time saver! Now, with a single tool, trending and overall traffic reports can be exported, sorted, etc. These can be used to justify new equipment and upgrades (just a little side perk). An advantage of a trending and utilization model is that it allows you to determine which servers could benefit from multiple network cards. It also allows you to segment your switches so that you balance the amount of packets within each switch so that one is not over utilized while the others are under utilized. It also helps you to know what types of packets are moving where so that can be optimized as well.
Stay tuned for my next tip where I'll cover the bottom up approach.
Carrie Higbie, Global Network Applications Market Manager, The Siemon Company
Carrie has been involved in the computing and networking industries for nearly 20. She has worked with manufacturing firms, medical institutions, casinos, healthcare providers, cable and wireless providers and a wide variety of other industries in both networking design/implementation, project management and software development for privately held consulting firms and most recently Network and Software Solutions.
Carrie currently works with The Siemon Company where her responsibilities include providing liaison services to electronic manufacturers to assure that there is harmony between the active electronics and existing and future cabling infrastructures. She participates with the IEEE, TIA and various consortiums for standards acceptance and works to further educate the end user community on the importance of a quality infrastructure. Carrie currently holds an RCDD/LAN Specialist from BICSI, MCNE from Novell and several other certifications.