To address our methods, we are using Concord Network Health reporting. Concord uses 5 minute polling samples of
Requires Free Membership to View
We arrived at the current comparison metrics by "guessing" at the threshold for application performance or response time change. Since we are a service provider, we wanted to optimize our time and perform a 5,000 foot view. This method allows us to dig deeper if consistent normal traffic flow percentages are higher, or move on to other possible causes of poor application performance.
...<38%, as a base, seemed like a safe threshold because of the 10 hour time frame and the fact that the morning hours could be highly utilized, but afternoon hours may be lower. Rather than spending on upgrades perhaps scheduling changes could help keep costs down. The other percentages just step up from there. The flaw in this thinking is if in fact the morning hours show very high utilization, and normal business practices cannot change.
What is important to us is that application response times are reasonable. In my experience there are many causes of poor application performance. Our goal in measuring WAN utilization is to eliminate WAN as a factor in poor app performance, so that we can focus on client, host processing, or application engineering flaws (I know - spoken like a true network person.)
Thanks again for your time.
Mike Schwarz
More from Dr. Jorgenson
Mike - that's much clearer. I am still a little in the dark about the nature of the network applications that you support – sounds like some or all is data transfer such as for files or Web pages. So I will assume that to be the case for now.Your methodology makes good sense. For file transfers you should be able to develop reasonable response criteria – it is hard for me to comment on your numbers though – I would have to get deep into the details with you.
I would make one additional comment though. I would not look just at averages over long periods (like hours) – I would look at the distribution of local maxima and minima. Or alternately, identify local maximums (minimums) and measure percentage of time above (below) some selected thresholds.
What does that get you? If you pick a reasonable threshold for the maximum and a tolerance/response time for the application, you will be able to establish a metric for the network response – the rate at which it fails to respond at an acceptable level. That measure may be quite different from a global average for the reasons that I mentioned in my first response (burst traffic). It is the rate of occurrence of the instantaneous utilization exceeding an acceptable threshold. Or, in other words, how often an application experienced a WAN link that was not performing within specification.
Here's where a graph could be really handy.
I trust that you get my meaning and that this helps.
Network Management Strategies for the CIO

Join the conversationComment
Share
Comments
Results
Contribute to the conversation