Eventually all computer systems fail, even very stable ones such as network servers. Since it is particularly important to isolate the cause of a server crash, Red Hat has added what it calls its Network Crash Dump utility to Red Hat Advanced Server 2.1. NETDUMP provides what Red Hat calls first fault analysis, or the ability to correct a problem without having to recreate the problem or suffer a second crash. Linux provides a signature of a crash by storing the processor state, a stack trace, and part of the instruction trace and any OOPS, BUG, or PANIC messages. This system will often provide the needed clues to ascertain the cause of the problem involved.
The network console utility offers a log of all Kernel messages and crash signature messages to the network syslog server, and that syslog server can be on any Linux server. The network console also adds a memory dump of the kernel image to the aforementioned messages, and in difficult-to-determine events, such as hardware errors, this memory dump can be valuable in ascertaining the cause of your crash. The rationale for storing the dump to the network instead of the more traditional Unix swap volume is that in some cases you can overwrite important file data, or even fail to write the memory dump in the first place, due to a hardware error. Thus it is claimed that a network dump is safer and more effective when using Linux.
Some Unix versions store the memory dump to a data file and then have a second known-good
Barrie Sosinsky is president of consulting company Sosinsky and Associates (Medfield MA). He has written extensively on a variety of computer topics. His company specializes in custom software (database and Web related), training and technical documentation.
This was first published in July 2003