One of the most critical tasks in any data center is that of monitoring the networking equipment, the servers, and the data center itself. There is quite a bit of planning that goes into choosing an effective monitoring solution and into the ongoing management of the equipment within the data center network.
Whether administrators are across the hall or miles away from a data center, there needs to be an effective alerting mechanism in place. You can't just assume that someone is going to walk into the data center and notice the console screen that indicates an imminent failure. This is why it is so important to make sure that you have a good network management and monitoring solution in place. Without it, you may never even know about problems until the phones start ringing.
What do you need to monitor?
A lot of planning needs to go into monitoring a data center because there are so many different things that need to be monitored. It's easy to think of data center monitoring as keeping tabs on the servers, but there is really a lot more to it than that. For example, Microsoft's System Center Operations Manager does a great job of monitoring Windows Servers and can be deployed with minimal planning. Even so, it doesn't really help you if you have servers that are running non-Windows operating systems.
There are other factors that you need to monitor, though, besides just server operating systems and applications. For example, it is important to keep tabs on the temperature within the data center. Most servers have a built-in safety mechanism that will cause the server to shut down before damage can occur if the server's temperature exceeds a certain threshold value. A good monitoring solution should be able to tell you the data center's ambient temperature, but it should also be able to alert you if the temperature in any given server begins to approach a critical level.
The same thing goes for power management. If a power failure occurs, backup batteries will typically keep the servers online for a predetermined length of time. More elaborate data centers may also rely on backup generators. In any case, you need to be alerted to power failures, and you also need a way of knowing how much reserve power is available at any given time.
A good monitoring solution needs to be able to alert you to issues with server hardware, operating system errors, application errors, networking hardware issues, and environmental issues. This is a tall order, to say the least, and that is a big part of why proper planning is so important. To the best of my knowledge, there is no single monitoring solution that can perform all of these functions. Typically, network architects will need to invest in several monitoring solutions and set them all up to deliver alerts in a uniform way. This alert might come in the form of a text message to an administrator's mobile device or an email message sent to the help desk, or some other type of alert. The important thing is that all of the alerts come to one place.
Virtualization complicates data center network monitoring
As you shop for a monitoring solution, it is important to remember that there are factors, such as virtualization, that can complicate the monitoring process. For instance, there are various monitoring applications on the market that can monitor a server's hardware for signs of a failure. Such an application might look for excessive server temperatures, SMART disk warnings, or even the failure of one of the cooling fans within the server. The problem is that if a monitoring solution is not aware that it is monitoring a virtual server, it may not be aware of hardware issues that could potentially affect the server's availability.
The monitoring software should pick up on problems with the host server's hardware. But if the host is at risk, so are any virtual machines running on the host. Therefore, if your organization is going to be making use of virtual machines, you will need a way of differentiating between physical servers and virtual servers and of knowing which virtual machines are running on which host servers. You will also need to have the ability to move the guest machines quickly to a different host server in the event that hardware problems occur.
Finally, management and monitoring go hand in hand. Monitoring is no good unless you also have good management capabilities in place. This is especially true in situations in which the staff is located off-site. For instance, what good does it do to have your monitoring software tell you that a critical failure is about to occur if the administrative staff has no way of getting to the ailing server in time to prevent the failure? This is why it is so important to be able to monitor and remotely interact with every server and every major piece of hardware in the data center.
About the author:
Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server and IIS. Brien has served as CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. As a freelance technical writer, he has written for Microsoft, CNET, ZDNet, TechTarget, MSD2D, Relevant Technologies and other technology companies. You can visit Brien's personal website at www.brienposey.com.