What are some good management practices for monitoring a network's performance?
"Best Practices" have a short half-life in our industry. The biggest problem I see with today's management techniques and tools is that they grew up in the client/server era and haven't evolved to serve today's realities.
In the old world...
- you owned or controlled most of the networks your key applications depended on
- you could predict where critical traffic would flow
- complex, long-deployment, agent-based systems were approved for deployment because the problems client/server created were new, acute and very scary
- you dealt with a dog's breakfast of vendor-specific protocols
In today's world...
- you depend on networks you do not own or control (ISP, ASP, customer, supplier, etc.)
- you can't predict where tomorrow's traffic will flow or what will break next
- there is less (no?) time and money for deployment or maintenance of big, complex network management systems
- it's "IP everything/everywhere"
So what should "best practices" look like?
I'll save the long version for a proper whitepaper (due out June 2003 from www.jaalam.com/wp/). But the short version might go like this:
- Be able to see end-to-end, from the application's view point
- Be able to deploy "Just in time" network management infrastructure - rapidly, where needed, when needed, on demand
- Be able to see into and through networks that you don't own
- Employ monitoring technologies that provide thorough network awareness on an on-going basis, not piece-meal views
- Rely less on trend analysis and more on real-time assessment
- Emphasize "effective" over "absolute" - implement management solutions that resolve your most common, most expensive problems most quickly
- Focus on application performance after the fundamental networking performance aspects have been addressed
- Use methodologies and technologies that fit your network and needs, not the other way around
The approach to this might be laid out in two steps:
1. Continuous monitoring of performance (not just availability) as a essential starting point, ideally at the layer 3 or 4 demarcation point (at a minimum) so you can separate network performance issues from application ones quickly. This has to be end-to-end along all critical paths, and most others of interest, with constant updating.
2. Rapid response to performance problems that slip through the cracks. That requires a real-time measurement/assessment/problem diagnosis capability that delivers quickly, without pre-deployed infrastructure, and can be used remotely.