Why you need network monitoring systems and what they can do for you
A collection of articles that takes you from defining technology needs to purchasing options
Of all the network management technologies available to system administrators, network availability monitoring is one of the most fundamental. Available as software or an integrated hardware appliance, network monitoring tools discover, monitor, assess and troubleshoot network devices to ensure not only the health and availability of those devices but also the health of the overall network.
Typical network availability monitoring tools use Simple Network Management Protocol (SNMP) to collect monitoring data from network devices. They use SNMP to initially discover the devices on the network and, in many cases, organize those discovered devices into a topology map that reveals device dependencies to the network manager. This presentation allows the network manager to understand how a problem with one device might affect other connected devices. Depending on which availability monitoring tool the network manager chooses, it may also baseline the behavior and status of network devices and provide alerts and reports on anomalies that indicate a problem.
Network availability monitoring tools track bandwidth utilization, uptime, availability and response times of network devices, and they provide granular information on the status of individual device components, such as ports, memory, processors and power supplies. Many of these tools even present a dashboard report on the health and status of network devices, using visual cues like green, yellow and red indicators to inform a network manager of the overall health of devices and device components.
Network availability monitoring systems send alerts to users in the form of email messages, text messages or event logs that can be forwarded to an event management system.
Some of these systems also have the ability to make changes to network devices via SNMP. This feature is especially useful in the context of a troubleshooting workflow, where network managers can discover and correct a network equipment problem using a single management interface.
SNMP: The foundation of network availability monitoring
Most network availability monitoring systems collect and analyze data from network devices via SNMP, an application-layer network protocol that dates back to 1988. SNMP is used for the exchange of information between networked devices and a network management platform. The most recent version of the protocol, SNMPv3, is most notable for its security enhancements, including mechanisms for the authentication and encryption of SNMP messages and access control to management objects.
SNMP offers a combination of push and pull communication between devices and a management system. On the pull side of the equation, a network monitoring system can poll the network to discover devices and then poll those network devices on a regular basis for status information. An SNMP agent on a network device collects this data constantly and will send information to the network monitoring system upon request. The agent typically doesn't store the device metrics it collects. Instead it is constantly refreshing the data and can forward it upon request.
On the push side, these agents also have the ability to send traps to the monitoring system. According to the SNMP standard, a trap is a message that an agent sends to a monitoring tool when some aspect of the network -- for example, CPU or bandwidth utilization -- crosses a certain threshold.
A combination of polling and traps enables a network monitoring system to maintain a full view of the network's health. Traps can alert the system to triggered events on the device, while polling can extract data for trending and analysis. Polling also detects when a device has failed. Since a device in a failure state will be unable to send a trap on this event, polling becomes essential in that scenario.
Automating device configuration changes
SNMP also includes an "inform" message type that allows a network monitoring tool to acknowledge messages from a device. This allows the agent to reset any triggered alerts. SNMP also specifies how network monitoring tools can use a "set" message to make changes to that device through the SNMP agent. This capability allows either the network manager or the tool itself (depending on an enterprise's policies on network automation) to make device configuration changes in response to network events. For instance, if a network monitoring tool discovers that a switch is being saturated by network traffic, an administrator can reconfigure the overall network so the traffic is distributed across multiple devices.
SNMP relies on the concept of a management information base (MIB) to organize the communication of device metrics between an SNMP agent and the network monitoring system. The MIB is a standard formal description of the objects that comprise the components and the device status information of a network device. Although the MIB is a standard, most vendors add proprietary extensions to their MIB implementations to account for the unique characteristics of their individual products.
Given that each network equipment manufacturer implements a unique MIB, network availability monitoring vendors must add support for each individual implementation. Network managers must verify that the network equipment they use is supported by the network availability monitoring tools they choose -- or if MIB extensions can be added manually.
This extensible model of SNMP MIBs also allows manufacturers of other network-connected technologies to implement SNMP so a network availability monitoring tool can manage them. Storage systems, servers, printers and other devices can run SNMP agents. Additionally, MIBs can be created for IP video cameras, vehicles, industrial equipment, medical equipment and other devices. As long as a network monitoring tool has properly modeled and compiled the MIB for any of these device types, it should be able to report and alert on their status.
There are several free and open source network availability monitoring tools available today. There are also dozens of commercial vendors, both large and small, that offer their own products. Commercial products are typically more scalable, easier to deploy and maintain, and capable of monitoring a wider variety of network devices.
Network availability monitoring architecture
A typical network availability monitoring tool has multiple components. Some tools have fully integrated architectures, and a network manager will need only to install a single piece of software to get started. Other tools might have several individual components that must be installed and managed separately. These components include a polling engine, a database, an analytics server, a user console and other elements.
The implementation of this architecture will affect many aspects of how network availability monitoring tools scale and operate. Some tools can monitor hundreds of devices, while others can monitor thousands. Some can monitor multiple networks across several locations, while others can only monitor a single site. A network manager also might find variability in terms of how granular the network monitoring tool is. In other words, how often can the tool poll network devices for metrics? Granularity is usually something that users can adjust for themselves, but higher monitoring granularity might affect the performance of certain monitoring tools that are not architected to analyze data inputs at a high rate.
Beyond SNMP and simple availability monitoring
Many network monitoring tools will rely on a variety of other methods to collect and exchange data with devices on the network. A tool may use Windows Management Instrumentation (WMI) to monitor and interact with Microsoft-based systems. A tool may also integrate with APIs on a hypervisor management system in order to monitor all aspects of an enterprise's virtual infrastructure.
These monitoring systems will also often interact with various other network protocols. Some tools will use vendor-specific protocols like Cisco's IP SLA and NBAR to collect performance metrics. Others will add collection and analytics engines for flow records like NetFlow, jFlow, sFlow, and IPFIX. These technologies essentially expand the tool's capabilities from purely "network availability monitoring" to the related category of "network performance monitoring." Vendors offer various levels of integration between network availability monitoring and network performance monitoring, giving network managers the ability to correlate and analyze performance and availability. This integration helps network managers with capacity planning, troubleshooting and optimization.
Many enterprises will integrate network availability monitoring tools into a broader set of infrastructure management tools. For instance, some network managers will integrate the tool with a network configuration management system. This integration helps network managers correlate network availability problems with device configuration errors. Management vendors have long recognized the value of broad integration. They have responded by building large management suites of network management tools that combine device metrics from network availability monitoring with performance metrics from flow protocols like NetFlow and packet-based analysis. Some vendors even combine systems, storage and application management tools with network management tools for an integrated infrastructure management suite.
Network availability monitoring is a foundational element of a network management tool set. At its core, such a tool collects, analyzes, and reports on metrics it collects from network devices from SNMP and other protocols and interfaces. The tool will help a network manager understand the health and availability of individual network devices and the overall network. It is often the starting point for network operations. Other tools will integrate with and complement network availability monitoring. When thinking about how you want to implement this management system in your environment, think about the types of devices you want to monitor and how you want to integrate that monitoring into your overall management tools and practices.
Editor's note: This article was updated in August 2016.
Learn how network monitoring can help large, mobile enterprises.
Use NetFlow monitoring to make efficient use of your bandwidth.
Learn how logging is critical to network monitoring.
Know what to look for when buying a network availability monitoring tool.