Fault management is the component of network management concerned with detecting, isolating and resolving problems. Properly implemented, fault management can keep a network running at an optimum level, provide a measure of fault tolerance and minimize downtime. A set of functions or applications designed specifically for this purpose is called a fault-management platform.
Important functions of fault management include:
- Definition of thresholds for potential failure conditions.
- Constant monitoring of system status and usage levels.
- Continuous scanning for threats such as viruses and Trojans.
- General diagnostics.
- Remote control of system elements including workstations and servers from a single location.
- Alarms that notify administrators and users of impending and actual malfunctions.
- Tracing the locations of potential and actual malfunctions.
- Automatic correction of potential problem-causing conditions.
- Automatic resolution of actual malfunctions.
- Detailed logging of system status and actions taken.