Manage Learn to apply best practices and optimize your operations.

The paradox of service level management

Learn how perception affects user service, and what you can do to ensure the best user experience possible.

"The customer is always right."

The clerk at the candy store, the waitress in the diner and the IT support technician will all tell you that when it comes to service, "the customer is always right." Ironically, this maxim is at the heart of the central paradox of service level management. Large enterprises spend a great deal of time and money to try to obtain accurate measurements of end-user service levels with surprisingly little success at capturing the end-user experience.

The paradox is that we already have the most accurate measurement of end-user service levels right in front of our noses but can't do anything with it. By definition, whatever the end user describes as their quality of service is the most accurate measurement available, but it is also the least useful. If an end user complains that they are receiving "dreadful" service, then service is dreadful. We can appreciate that the user is telling the truth, but how does an organization fix "dreadful?" To do so, it needs a way to correlate the subject measurement "dreadful" into a quantifiable and accurate measurement that reflects the end user's experience.

Unfortunately, most of the objective performance measurements that IT departments take today are focused on the performance of the individual components of their infrastructure. They rarely collect data that looks at the whole picture from end to end and correlates directly to the subjective user experience. While it is useful to look at network utilization, server utilization, circuit load, and router performance, it is not sufficient. Since none of these metrics directly describe the service levels that customers are experiencing it creates a disconnect between IT, which believes the numbers that tell them that they are providing good service, and the end users, who believe that they are getting "dreadful" service based on a small number of subjective but accurate experiences that they have.

"Perception is reality."

Why is it that the airlines lose less than .001% of bags checked, yet customer satisfaction surveys report that public perception is that airlines routinely lose checked baggage? The perception of poor service is formed by the user experience. As anyone who has had their luggage lost can attest, the experience is a frustrating one. You stand in front of a conveyor belt watching bag after bag come down until there are only one or two bags left going around and around on the belt. After a while, you conclude that your luggage isn't going to come out so you go to the ticket agent to report your bags as lost.

Nowhere in this process are your expectations being managed. The airlines don't make you feel like they know where your luggage is and that they are on top of the problem, because it is you that has to tell them that there is a problem. Contrast this with the experience you get when you ship a package with UPS or Federal Express. You can track a package from door to door on the shipper's Web site and find out where your package actually is at any time. While the shipper may not meet your service expectations 100% of the time, it provides high customer satisfaction because it makes you feel like it is always in control of your package and has the ability to correct any problems that may arise.

So why do most large organizations approach IT service level management using the airlines' approach instead of the shipping companies' approach?

The key to improving service levels is accurate and objective measurement.

By applying some basic rules of thumb, IT has effected considerable performance improvements in the past ten years. Broadband and dedicated high-speed links have replaced 56kb lines, and fiber-optic backbones have been installed and coupled with high-end routers backed up by redundant fail-over systems. But while all the "rule of thumb" and brute force enhancements represent great strides forward that bring reliability up toward 99%, the disconnect between IT and the end users continues to widen.

Now that technology infrastructures are generally reliable, performance problems can be hidden within the averages. Ten thousand transactions with sub-second response time will easily mask two transactions that took 25 seconds to execute. Those two "slow transactions" will be invisible to IT because the "average response time" still looks excellent. Just like the frazzled traveler who must inform the airline that they lost their luggage, the end user is the one to inform IT that they experienced poor performance.

Many enterprises track service levels using synthetic transactions ("active measurements"). These measurements are then summarized and checked against service level agreement thresholds for possible violations. So if a software agent checks a business service transaction every five minutes and reports back that the transaction is operating properly it reassures the IT management manager that service levels are being met. Meanwhile, if a user experiences a 30 second delay during the five minute interval between synthetic measurements, he'll experience a subjective "dreadful" experience at the exact same time that IT's report will show that everything is normal. In this case, IT's measurements wind up being counter-productive.

How can we close the gap between the end-user view of reality (performance is dreadful) and IT's view of reality (utilization numbers look good)? IT managers need to be able do two things when they get a user complaint: show users that they can "see" the problem and show that they are taking action to resolve it. Just as UPS tracks every last package that it handles, IT needs to track every last transaction that it provides.

This isn't as daunting as it sounds. It is now technically and financially feasible for even large enterprises to instrument every transaction executed on their infrastructure, 24 hours a day, seven days a week. This is accomplished by a combination of passive monitoring tools and statistical analysis tools tuned to track performance. Typically, a set of passive monitoring appliances is installed to collect data about the operation of every transaction. The resulting metrics collected are then uploaded to a performance database that can cross-reference these measurements by location, equipment, application, business function and, of course, by end user.

Instead of measuring average performance long after the fact, these tools provide granular metrics that make it possible to automatically flag individual transactions that violate service level agreements, generate trouble tickets to kick off corrective action, and even reason out root causes.

Providing excellent customer service requires more than just getting the job done. It requires precise knowledge about the execution of that service, knowing instantly about those rare cases where poor service has been provided, knowing who was affected, and how badly. What ultimately causes the negative perception of service is not the instance of bad service itself, it is the feeling the customer gets when he tries to resolve that problem.

IT organizations that provide truly great service know before a customer has a chance to complain that their application is slow and that service (for that customer at least) is "dreadful." In those organizations, IT will call the user to let them know that a problem has been identified and that actions have already been taken to address the problem. By resolving the paradox of service level management, forward-thinking organizations bridge the gap between perception and reality.

About the author:
Bernie Davidovics is the chief executive officer, chief technologist, and founder of SeaNet Technologies, bringing the company over 25 years of experience in information technology management and consulting. Prior to SeaNet Technologies, Mr. Davidovics founded the performance practice for Predictive Systems and served in several consulting positions for companies including JP Morgan, Chase Manhattan Bank, Time Inc., and RJR Nabisco.

Mr. Davidovics has published numerous papers and performs speaking engagements on systems management, performance management and capacity planning. Mr. Davidovics has also been a member of the Computer Measurement Group for 15 years.

This was last published in April 2004

Dig Deeper on Network management and monitoring

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.