The majority of networking professionals spend more than two months a year troubleshooting network performance problems.
Fifty-nine percent of 592 network engineers, IT directors and CIOs surveyed by Network Instruments said they spend more than 25 days a year just replicating problems on the network in order to understand what's going on. And 71% said they spend more than 25 days a year trying to identify the source of the problem they replicated.
According to the State of the Network Global Survey, which Network Instruments released this week, 24% of networking professionals said they spend more than 75 days searching for problem sources.
It's hard to say how much is too much when to comes to the number of days networking professionals spend on troubleshooting. Charles Thompson, manager of systems engineering at Network Instruments, said it depends on the organization and how dependent it is on network communications. Network Instruments makes network monitoring and analysis tools.
For example, Thompson said, a company that delivers applications through the Software-as-a-Service (SaaS) model cannot afford to spend much time troubleshooting network problems. Any network issues in that company would impact service-level agreements. The company must resolve such issues quickly -- in minutes, not days.
If network engineers spend so much time trying to diagnose downtime or performance issues, then those problems are typically persisting, Thompson said. This can heavily impact the business.
"Let's say the issue relates around a server that's persistently becoming unreachable or extremely slow for an end user," Thompson said. "As I'm trying to identify the source of that issue, the problem continues. If that takes me four or five hours of trying to identify the source of the problem, that's four or five hours that the end user has had to deal with that issue."
John Ahearn, corporate director of IT for American Career College in Los Angeles, said time spent on troubleshooting can get expensive in multiple ways.
"If you quantify it in dollars, and you figure you're paying an engineer $80,000 a year, [then] maybe a third of his salary [goes to troubleshooting]," he said. "That's just in labor costs. That's not to mention the wasted time and poor performance. Looking at it over a year, [it's] maybe $1 million -- in not wasted labor, but lost opportunities -- because machines don't work correctly and they're too slow and employees are more than willing to just sit there waiting for [an application] to refresh."
Thompson said organizations that spend time troubleshooting and replicating problems are taking a very reactive approach to network monitoring. These companies, according to Thompson, might have some sort of network analysis tool installed on a laptop that they turn to once a problem occurs. Network troubleshooters will deploy monitoring and analysis tools to the problem area, but they have to hope it occurs while they're watching. Otherwise that problem remains a mystery.
Bill Ross, CEO and president of the consulting firm Network Performance Systems Inc., said he dealt with this issue of timing when he worked as a senior telecommunications engineer for Los Angeles County several years ago. The county's centralized network services organization, which served 40 agencies -- such as the district attorney's office, the Department of Mental Health and the county's lifeguards -- called on Ross whenever their network experienced difficult performance problems that affected productivity.
"It was really difficult to troubleshoot those issues because there were so many hops in the network and boundaries between organizations," Ross said.
When the county implemented MPLS, executives figured all the bandwidth issues would go away, Ross said. Instead, the Department of Mental Health started complaining of slowness on the network within two months.
"They played with the problem for two months and the [mental health] department became very angry with the central management organization," Ross said. The department sent Ross into the field, where a network probe revealed that employees' use of peer-to-peer and music-sharing applications such as Kazaa and LimeWire were draining 85% of the available bandwidth.
Ross said most organizations have a plethora of tools for solving problems like this quickly, but they don't use them effectively -- often because employee turnover has left the IT department devoid of relevant training.
"They're not actively engaged and that's usually because one regime buys the tools, then people go from the organization and [the company] loses the expertise for those tools," he said. "When I consult with somebody, I try to look at the tools they have and try to narrow it down to what they need and what is easiest to use that will not require the most manpower."
Thompson said organizations need the proper tools to do problem-solving work, and networking professionals have to be able to do what he calls "retrospective network analysis." He said Network Instruments Observer, for instance, will record all network data that goes over the network -- so if a problem occurs, engineers can go back in time and look at what happened.
But such tools need to be deployed correctly, Thompson said. "One of the biggest challenges is where to put my monitor in my network. In the past, you put the analyzer in the core network. That worked well," he said, "but what about organizations that are starting to decentralize? What about organizations that are starting to run MPLS and are getting away from hub-and-spoke technologies and are starting to run full network mesh?"
Ahearn said his organization was having persistent voice quality issues with VoIP when he started working there one and a half years ago. He said he resolved that problem using Network Instruments' Observer technology.
"[The college] had spent a lot of money on consultants and the consultants would come in and they wouldn't have the right tools," Ahearn said. "None of them could really fix it. I came in using Network Instruments and I was able to modify their communications systems so they would work and make a change on their routers and switchers and things. Mainly I identified set-up and configuration to allow quality of service of voice traffic to coexist with data traffic."
Ahearn said he's worked for some organizations in which network engineers have spent so long trying to troubleshoot problems that the organization becomes superstitious.
"These people will resort to superstition – turn this [device] on only at this time – without knowing exactly what's going on. All the packets inside the network, just millions of packets going all over the place, and [they] resort to recreating the wheel or superstition. They're wasting a lot of time. Don't look at the router wrong or it will stop working."
Let us know what you think about the story; email: Shamus McGillicuddy, News Editor