Like a misbehaving child, why the network acts the way it does is a challenge to understand.
Typically, if there are performance problems with the network or the applications running on it, the users are the first to know, and they alert IT. Either that, or network managers set thresholds to alert staff to potential problems -- but that's not foolproof. Set the thresholds too high, and some critical flaws may go unnoticed. Set them too low, and someone dropping a pen in Accounting could sound an alarm. It's hard to determine whether that's a win-win or lose-lose situation.
Pete Gunn, project leader for global network systems at shipping giant UPS, knows all about thresholds and alarms. Gunn also admits that there's no way to operate entirely without them. However, UPS is testing out a new way to understand what the network is doing -- how it's behaving and how that behavior could evolve.
UPS recently put nGenius Analytics from NetScout into a testing environment to better understand network behavior without the use of hard-and-fast thresholds. And, according to Gunn, the tool could very well answer three questions he's constantly asking: "What is running across our network? What are the applications? What is the impact?"
According to Jim Frey, NetScout's marketing vice president, finding and correcting network performance problems is slow, labor-intensive and dependent on a person's troubleshooting experience. Traditional approaches, such as alarms set on a static threshold, require repetitive adjustments on a large number of monitored metrics. Oftentimes, the alarms are inaccurate or ineffective because they have been tuned down or set too high so that IT isn't overwhelmed with false alerts.
A 2005 survey of IT professionals by Enterprise Management Associates found that only 18% of respondents said they receive alerts of application problems before they are notified by a user.
Frey said nGenius Analytics uses statistical behavior modeling and anomaly detection, applying them to application flow data to deliver early warnings of performance problems. When installed, nGenius Analytics takes a short time to automatically learn network and application behavior patterns and then identifies performance anomalies without manual configuration and the guesswork of setting thresholds. Frey added that nGenius Analytics couples alarms with diagnostics to give managers clear information that they can act on before users are affected.
Key features of nGenius Analytics include proactive alarms, with contextual diagnostic information; alarm viewing through the nGenius Performance Manager for further investigation; alarm integration with third-party management platforms and event consoles; analysis of flow-based data from NetScout nGenius Probes and Flow Collectors for NetFlow; and monitoring and analysis of virtualized networks such as VLANs, VPNs and MPLS.
According to Frey, it answers the common question: "How are we going to make performance management and performance data relevant in real time?"
Gunn said that he's already seen some benefits in testing.
"It's a good heads up," he said. "It gives more insight to behavioral changes so you can understand why something happened. It draws your attention without sounding alarms."
Static thresholds are often a challenge because of the influx of false alarms that result, Gunn added. Something like nGenius Analytics can lighten the threshold load.
Being able to look at network and application activity and how they deviate from the norm is helpful, according to Gunn. It also allows him and his staff to investigate traffic flows and later go back to application groups and developers to tell them where things need some fine-tuning.
The ability to detect potential problems before they actually cause major trouble and affect performance is also a bonus, Gunn said. It can help catch something that may cause headaches down the road but may go unnoticed if IT is waiting for an alarm to sound.
"It's very important to us to understand not only where things have been, but where things are going," he said, adding that nGenius Analytics can act as a sort of crystal ball that lets him say, "Hey, I saw this coming. I'm going to prepare for this now."
In the past, UPS was more interested in monitoring the network to look at what's driving costs and for capacity planning and control. Now, Gunn said, better monitoring capabilities give deeper insight into the "underlying drivers or causes of what influences application or network behavior" and "bubble underlying issues to the top."
And when your company has tens of thousands of users worldwide, understanding network behavior and patterns helps, Gunn said.
"Obviously, we want to get out of the reactive business and get into being proactive," he said. "We want to say, 'Hey, there's a pattern that's developing or evolving here.' We can predict the coming impact on capacity and infrastructure."
When UPS started monitoring the network, one thing that opened a lot of eyes was how email consumption was affecting the network. Increased visibility into application performance and use lets UPS control email traffic better and adapt to its increased usage.
"When we first set up monitoring our network environment, we were looking at things from a capacity side only," Gunn said. "It got to a point where I said, 'Let's get some insight into it.' It opened our eyes to a lot of things that no one had been aware of before."