Sergey Nivens - Fotolia
In constructing a service-level agreement, or SLA, with a service provider, network availability and network reliability are two of the most critical metrics used to gauge a provider's performance. Although the terms are sometimes erroneously used interchangeably, they are distinctly different.
Network availability is the percentage of time the infrastructure is operational during a given time period. In other words, uptime divided by total in-service time.
Network reliability tracks how long the infrastructure is functional without interruption. Reliability is measured using a couple of different equations. The first is mean time between failures (MTBF), which is the total time in service divided by number of failures. The second is failure rate, which is the number of failures divided by total time in service.
Different ways to measure performance
Network availability provides a good snapshot of infrastructure accessibility by quantifying the percentage of time the network is up and running. However, in most cases, network availability offers only a limited perspective into actual operational performance.
A network can be highly available, but not particularly reliable. A network that achieves 99.5% availability is actually down for nearly nine hours annually. And a network that is available 99% of the time is actually out of service for more than three days a year.
Network reliability, on the other hand, spotlights how well the infrastructure runs to support functional processes. A network with a lengthy MTBF or a low failure rate is likely to complete transactions and processes on a consistent basis.
IT managers can track reliability and availability of individual equipment, such as routers and switches, but the best measure of real operational performance is to examine connection uptime. In other words, total connection uptime divided by total time in service.
Network managers can drill down and isolate availability and reliability metrics for different segments and paths on the network to uncover configuration inefficiencies and better plan for redundancies.
Reactive, proactive monitoring provide additional tools
Two other techniques can also be used to help managers understand real-world operational conditions. The first, reactive monitoring, measures availability and reliability of a production network on an ongoing basis.
The second, proactive monitoring, employs synthetic traffic that is sent across the network. Its transmission is measured by performance tools that can also be used for troubleshooting and to determine optimal performance.
Test traffic is also generated to diagnose configuration errors and equipment issues. The data derived from proactive monitoring can also be used in other areas, as well. For example, prior to deploying a new application, IT can test it on the network to identify any potential issues, so code changes or other adjustments can be made in advance of the rollout.
Finally, proactive monitoring can be used to validate reactive data. This information can be helpful to support SLA metrics and identify where changes should be made to better meet operational goals.
Dig Deeper on Network management and monitoring
Related Q&A from Amy Larsen DeCarlo
Security policies, optimal service levels, configuration changes and even cost control measures all encompass the different types of network ... Continue Reading
AI network monitoring has great potential, but companies need to pare down the number of network monitoring tools they use now in order to reap AI's ... Continue Reading
Multi-tenant public cloud monitoring can pose significant challenges. Luckily, some tools can help track these environments and even flag potential ... Continue Reading