Manage Learn to apply best practices and optimize your operations.

Event escalation policies

Event errors do not always lead you the root of the problem. Use escalation policies to save time when troubleshooting.

Network error events can be simple things like a printer runs out of paper or a network connection fails, or more complex things such as a network service failure. Depending on the kind of failure, your log files will start to generate multiple and often increasing numbers of events related to the failure. The first part of an event escalation policy is to be able to analyze what the causal event in a cascading and expanding tree of events is in order to reach the root problem. This isn't as easy as it might sound.

Consider the problem of a downed network connection. That connection might generate a print error if the connection was to the print server. The printer's fine, but you get a communication error instead. If the printer was simply out of paper that would be easy to understand, but a communications error is more ambiguous. What's required to solve escalating events is a rule based set of policies that logically addresses the relationships between events. Thus you might write a rule so that when a connection error occurs to a specific system; than all other services supplied by that system are ignored (or suppressed) until that problem is isolated.

It's not really possible to write all the rules you might want to handle escalating events. If you have a rule that states something simple such as an event of a certain type that isn't solved within 60 minutes is sent to (escalated) to the next level of support that's pretty easy to implement. But for complex event escalation you'll want to depend on a commercial products implementation to assist you in this process. You'll find event escalation capabilities or product offerings for the big network frameworks such as HP OpenView, CA Unicenter, and IBM Tivoli. For example, Tivoli's NetView program is used to send fault information to the Tivoli Enterprise Console, it includes both discovery, monitoring, query, and drill down capabilities.

Since event escalation software can locate problems and their root causes they can save your staff many hours of hard work, and while they may be expensive they typically have a strong ROI.

Barrie Sosinsky is president of consulting company Sosinsky and Associates (Medfield MA). He has written extensively on a variety of computer topics. His company specializes in custom software (database and Web related), training and technical documentation.

This was last published in February 2005

Dig Deeper on Network management and monitoring

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.