It has been said, "If you can't measure it, you can't manage it." This familiar adage holds true in most cases,...
especially when a company is seeking to improve profit margins, staff performance, or the functioning of its IT infrastructure. When it comes to successful network management, measurement is critical. However, network management often only becomes an issue after disaster has struck. So what can a responsible IT manager do to prevent this?
In general, IT managers understand the value of network analysis tools during a crisis. However, many of them do not realize that most crises could have been avoided had they been monitoring and testing the network continuously. To effectively manage a network, IT managers need to be able to recognize trouble developing before it becomes a user irritation and causes expensive downtime. Many corporate administrators use ineffective analysis tools, or, if they have the right tools, use them in the wrong way -- practicing crisis management rather than network management.
So how do IT managers monitor a network in a preventative, proactive manner rather than merely reacting to problems as they arise? Effectively managing a network means obtaining accurate, objective, and pertinent metrics before you make any decisions, especially decisions to spend money. Setting a baseline to define what is normal activity on a network, including capacity planning and finally monitoring the network to ensure complaints and downtime are kept at a minimum, are all requirements to understanding a network and managing it in a proactive manner.
As with any task, anything is possible with the right tools. Proactive network management is easily attainable with a robust, distributed, multi-topology network analyzer that has management capabilities such as long-term logging, configurable alarms and traffic generation capabilities.
Effectively managing a network prevents downtime by alerting managers to developing problems and making the correct solution clear through detailed, accurate, and pertinent analysis, therefore helping them control the network at all times. IT managers of unmanaged or ineffectively managed networks are always putting out fires. Users often complain about slow servers and unavailable databases that are continually going down for unscheduled troubleshooting and upgrades. The "solutions" often make problems worse. The best tool to provide all the metrics an IT manager needs is a distributed network analyzer, along with probes (which collect network traffic) for core switches and for each link.
An investment in a network analyzer will pay for itself many times over. For example, such tools can immediately inform IT managers when bandwidth usage spikes unusually, threatening network health. A mature analyzer includes the ability to page an administrator when user-configurable network conditions are sensed in time to take action to avoid a flurry of complaints when the network is already down.
For example, consider the following scenario: One company's router usage alarms were triggering all over the network. Users were not yet complaining, but if the trend continued upward at the existing rate, the network would be showing sporadic problems within days, and perhaps totally overwhelmed within a matter of weeks. Should the company buy faster routers or faster WAN service, or both?
Good decisions require pertinent and accurate information, so the manager used the analyzer to decode the increased traffic. She discovered that the increased bandwidth usage was the result of various file-sharing applications. Looking at long-term trending logs provided by the analyzer, she could even determine that the file sharing applications "spread" from the marketing department, where the first concentrations of suspicious port/protocol traffic were seen.
Most companies do not want to upgrade hardware or services so that employees can trade illegal files over their corporate network. Armed with the information provided by the analyzer, the manager instructed human resources to update the employee handbook to explicitly forbid file sharing applications. Given the detailed information from the analyzer, the administrator was able to configure the firewall to block the ports known to carry such traffic.
This scenario demonstrates some benefits of a proactively managed network. First of all, there was no money wasted in system downtime because the bandwidth spike was caught by IT before users noticed any slowdowns. Secondly, there was no money wasted on hardware upgrades or other solutions that would not have solved the real, underlying problem (inappropriate use of the corporate network). In addition, the solution (enforcing corporate network usage policies) not only saves the IT department money by making upgrades unnecessary, the entire company saves money through improved staff productivity. Finally, the solution eliminates some potentially serious legal liability for the company.
The key issue is that even when a problem does reach the point where users complain, having the knowledge of a network speeds up resolution, preventing failed solutions and repeated complaints. This knowledge is best gained through use of an enterprise-strength analyzer.
A few months later, the same company ran into another problem. A group of users connected to a particular router intermittently complained of network response times so slow as to make the network unusable. Whenever the manager looked at response times, however, nothing seemed to be wrong. The protocol analyzer did indicate, however, that router utilization was reasonably light (less than 5%) whenever the manager was actually looking at the router's response time. Examining the long-term log, the manager noticed that router utilization had spiked recently, at about the same time as the helpdesk logged the "network down" user complaint. She hypothesized that the router started to fall behind only after a certain level of bandwidth saturation was reached.
Her network analyzer let her test the hypothesis. She could do more than just look at snapshots of router traffic; the analyzer also generates traffic, letting her simulate the stress conditions suspected of causing the router problem. As it turns out, the router's performance started degrading exponentially at about 50% utilization. This utilization rate, although still rare for that particular router, was likely to become more common as VoIP traffic started to hit the network, which was scheduled for the next quarter. The IT manager went ahead and scheduled the router for upgrade.
Although it may be true that smart people coupled with smart tools can lead to smart business decisions, even very smart people make very bad decisions if they have no information (or worse, incorrect or misleading information). Let's look at another company, one with an ineffectively managed network having similar router issues, for a good example of this.
The router issues this company was experiencing were actually due to one of the junior manager's covert file downloading sessions. The company went ahead and automatically upgraded the router without knowing what exactly the problem was. As the company was not deploying VoIP or any other bandwidth-sucking technology, this router upgrade was unnecessary, and came only after repeated, unresolved user complaints had taxed the already overworked administrator's time. Here we see that bad information leads to bad assumptions and bad business decisions. A company will not survive a competitive market for long if they repeat this pattern in any aspect of its business, especially crucial IT infrastructure.
A network analyzer should include the following list of measurement and testing tools if IT managers are using it to manage a corporate network with the aim of avoiding user complaints and, more importantly, downtime:
- Distributed analyzers overcome the technical hurdles inherent in capturing and decoding traffic from multiple sites. The most complete solutions also handle wireless, WAN, gigabit speeds, and other topologies with seamlessly interoperable probes (data collectors) and consoles (analysis viewers).
- Triggered notifications let IT managers set conditions under which they want an automatic heads-up notification. The best include a rule-based filter engine that lets users program alarms based on flexible criteria -- statistical conditions, particular protocols or particular packets (or even bits within packets), etc.
- Long term logging (also known as network trending) capability lets IT managers analyze their network over time. Knowing what is normal lets them see the anomalies more clearly, anomalies that might mean problems in the future.
- Traffic generation capability will help IT managers test out hypotheses during troubleshooting and planning when they need to see the effects of certain levels and kinds of traffic on a particular part of the network.
By using a distributed network analyzer, IT managers will be able to prevent problems before they happen and resolve issues as they arise, thus keeping users happy and businesses running smoothly and profitably. Armed with such a tool, IT managers will be able to both measure and manage corporate networks and they will have the information that can lead to smart business decisions. Good information leads to good assumptions and good business decisions.
About the author:
Charles Thompson, Senior Systems Engineer for Network Instruments, LLC (www.networkinstruments.com), works with the Network Instruments sales organization to provide technical expertise and in-depth product information to enterprise accounts. Network Instruments is a developer of network management, analysis and troubleshooting solutions. Charles can be reached at 952-932-9899 x234 or