How can a system administrator monitor a large number of machines and services to proactively address problems before anyone else suffers from them?

The answer is Nagios.

Nagios is an open source network monitoring tool. It is free, powerful and flexible. It can be tricky to learn and implement, but can reduce enormously the amount of time required to keep track of how your organization's IT infrastructure is performing.

I'll cover the usefulness and architecture of Nagios in part one of this two-part column. In part two, I'll offer configuration examples and advice.

To understand the usefulness of Nagios, consider a typical IT infrastructure that one or more system administrators are responsible for. Even a small company may have a number of pieces of hardware with many services and software packages running on them. Larger companies may have hundreds or even thousands of items to keep up and running. Both small and large companies may have decentralized operations, implying a decentralized IT infrastructure, with no ability to physically see many of the machines at all.

Naturally, each piece of hardware will have a unique set of software products running on it. Faced with a multitude of hardware and software to monitor, administrators cannot pay attention to each specific item; the default posture in this kind of situation is to respond to service outages on a reactive basis. Worse, awareness of the problem usually comes only after an end-user complains.

The link for this article located at SearchEnterpriseLinux.com is no longer available.