Problem solve Get help with specific problems with your technologies, process and projects.

Reliability, availability and survivability

Any telecommunication or IT system should be able to operate indefinitely without degradation. Learn how...

A short synopsis

We should all be fully aware of the importance of these attributes. Any telecommunication or IT system should be able to operate indefinitely without degradation. This is unachievable in the very best of circumstances.

How do we define degradation? Often we say that performance has passed below a certain point that the system operation is unacceptable. A good example is the digital network. Here we have a benchmark – a bit error ratio (BER) of 1X10-3, the threshold of supervisory signaling. If our BER is poorer than this, say 2x10-3 or worse, the supervisory signaling is lost and the circuit will drop out. Supervisory signaling tells the associated equipment that the line is busy or idle.

The International Telecommunication Union (ITU) expands on this benchmark to define unavailable time, and its inverse, available time. Quoting in part from ITU-T Rec. G.821:

            "A period of unavailable time begins when the bit error ratio (BER) in each second is worse than 1X10-3 for a period of 10 consecutive seconds. These ten seconds are called severely errored seconds (SES) and are considered unavailable time.

The unavailable time begins from the first severely errored second in the sequence. In addition:

            The period of unavailable time terminates when the BER in each second is better than 1x10-3 for a period of 10 consecutive seconds. These 10 seconds are considered available time.

Available time starts at the first error free second in the sequence. Thus unavailability or downtime of a channel begins when the first of 10 consecutive severely errored seconds to consecutive non-SESs occurs. An SES is a second in which the BER is poorer than 1x10-3. "

Availability is also defined in the equation below:

            A% = MTBF/(MTBF + MTTR) X 100.

            Where MTBF is mean time between failures and MTTR is mean time to repair. Both parameters are measured in hours.

The following example will clarify the use of this important equation. Assume that the MTBF of a fiber optic amplifier is 10,000 hours and the MTTR is 60 minutes or one hour. Calculate the availability of the amplifier.

            A% = 10,000/(10,000 + 1)X100 = 0.99990001X100 = 99.990001%

Of the two variables in the equation, MTTR is the most difficult to quantify with any certainty of being correct. When there is a failure, there must be in place and operable a means of localizing a failed part or element. Ideally we would like to have the ability to determine the failure down to the circuit board or even one level below. There should be an automatic readout of this information at the network control center.

Once the defective module or part has been located in the system, is there a spare replacement available? If the failed part is at a remote location, technician travel time must be included in the MTTR value. If the replacement module or part is not available in local or central storeroom, then it must be ordered from the manufacturer. The value now used for MTTR may skyrocket. Can the capital expenditure program afford to keep on hand a variety of replacement parts that may never be used at all in the life of the system? Further, the cost of a remote status and alarm system is not trivial.

Other nuances often missed in availability calculations are such items as technician travel time to the remote location in question, the cost of remoting information on failed cards and failed modules to the network control center and the cost inside the cards themselves for failure information. These are usually based on a "go, no-go" basis, a binary condition: it is working or it has failed.

The error rate of 1X10-3 or one error in one thousand bits may be acceptable for digital telephony but it is completely unacceptable in this age of computer data communication.

About the author:
Roger Freeman has worked in telecommunications since 1946 when he joined the Navy and became an aviation radioman. Later, Roger served as a radio officer in the merchant marine for nearly 10 years. He then held several positions with ITT assigned to their Spanish Standard Electrica subsidiary. He also served the International Telecommunication Union as Regional Planning Expert for Northern Latin America based in Quito, Ecuador. Roger is bilingual. His last employee position was principal engineer with the Raytheon Company, Marlboro, MA where he took early retirement in 1991 to establish Roger Freeman Associates, Independent Consultants in Telecommunications. He has been giving seminars in telecommunication disciplines at the University of Wisconsin, Madison for nearly 20 years. Roger has been writing books on various telecommunication subjects for John Wiley & Sons since 1973. There are seven titles which he keeps current including the two-volume work, Reference Manual for Telecommunication Engineers, now in 3rd edition. He holds two degrees from NYU. His Web site is and his e-mail address is

Dig Deeper on Network management and monitoring

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.