Tip

Event escalation policies

Network error events can be simple things like a printer runs out of paper or a network connection fails, or more complex things such as a network service failure. Depending on the kind of failure, your log files will start to generate multiple and often increasing numbers of events related to the failure. The first part of an event escalation policy is to be able to analyze what the causal event in a cascading and expanding tree of events is in order to reach the root problem. This isn't as easy as it might sound.

Consider the problem of a downed network connection. That connection might generate a print error if the connection was to the print server. The printer's fine, but you get a communication error instead. If the printer was simply out of paper that would be easy to understand, but a communications error is more ambiguous. What's required to solve escalating events is a rule based set of policies that logically addresses the relationships between events. Thus you might write a rule so that when a connection error occurs to a specific system; than all other services supplied by that system are ignored (or suppressed) until that problem is isolated.

It's not really possible to write all the rules you might want to handle escalating events. If you have a rule that states something simple such as an event of a certain type that isn't solved within 60 minutes is sent to (escalated) to the next level of support that's pretty easy to implement. But for complex

    Requires Free Membership to View

event escalation you'll want to depend on a commercial products implementation to assist you in this process. You'll find event escalation capabilities or product offerings for the big network frameworks such as HP OpenView, CA Unicenter, and IBM Tivoli. For example, Tivoli's NetView program is used to send fault information to the Tivoli Enterprise Console, it includes both discovery, monitoring, query, and drill down capabilities.

Since event escalation software can locate problems and their root causes they can save your staff many hours of hard work, and while they may be expensive they typically have a strong ROI.


Barrie Sosinsky is president of consulting company Sosinsky and Associates (Medfield MA). He has written extensively on a variety of computer topics. His company specializes in custom software (database and Web related), training and technical documentation.


This was first published in February 2005

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.