Monitor remote systems with Nagios
Takeaway: Vincent Danen introduces you to the potential of Nagios for monitoring your remote systems. Get to know this flexible and powerful tool.
Nagios is a fantastic program that allows you to monitor remote systems for availability. Nagios, available from http://www.nagios.org/, is often provided by Linux vendors, so it should be an apt-get or urpmi away.
Nagios makes extensive use of configuration files, typically located in /etc/nagios. The main configuration is /etc/nagios/nagios.cfg and, amongst other configuration options, points to other configuration files using cfg_file directives:
cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/hosts.cfg
cfg_file=/etc/services.cfg
The files above further configure and refine how Nagios works. For instance, the contacts.cfg might contain:
define contact{contact_name admin
alias admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options c,r
host_notification_options d,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email admin@mysite.com
}
This defines who will be receiving alerts, what kind of alerts, and when. Here you can see that the admin is available 24 hours a day, seven days a week, and receives service notifications for critical/recovery notifications for hosts, as well as down/recovery notifications for services.
The hosts.cfg file would contain host definitions of the systems being monitored and the definitions look like:
define host{name linux-server
use generic-host
check_period 24x7
max_check_attempts 10
check_command check-host-alive
notification_period workhours
notification_interval 120
notification_options d,u,r
contact_groups admins
register 0
}
define host{use linux-server
host_name surtr
alias surtr.mysite.com
address 127.0.0.1
}
The first definition is a template (register is set to 0). Other definitions can use this template to build upon, preventing useless duplication of information. The second definition is the actual host, providing what template to use (linux-server), with the host name and alias and IP address. You can define as many hosts as you like and as many templates as you like.
The services.cfg file contains service definitions that are used when monitoring hosts. For instance, here is an entry to check if the POP3 server is available:
define service{use local-service
hostgroup_name remote
service_description POP3 Availability
check_command check_pop
}
The first use command indicates a template to build upon. The hostgroup_name defines which hosts should be using this service (defined elsewhere, such as hostgroups.cfg). The check_command is the script or command (plugin) to use.
The hostgroups.cfg file might contain an entry like:
define hostgroup{hostgroup_name remote
alias Remote Servers
members hades,titan
}
This would be the definition for the remote hostgroup, used in the POP3 check illustrated previously. In this case, two hosts (hades and titan) are defined as being included in this group. You can have any number of host groups, with any number of hosts in them, and hosts can be members of multiple host groups.
Finally, the commands.cfg file would contain the actual commands or plugins to use:
define command{command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$
}
This defines the check_pop command, used in the previous POP3-checking service as defined in services.cfg. The check_pop program defined here is a plugin, usually available in /usr/libexec/nagios (or wherever the vendor installs the plugins). This is a simple program that returns status information, such as:
# /usr/local/nagios/libexec/check_pop hades.mysite.com
POP OK - 0.025 second response time on port 110 [+OK Hello there.]
|time=0.024849s;0.000000;0.000000;0.000000;10.000000
Nagios itself interprets those responses to determine if the service is up and running. Because the output is pretty simplistic, you can write your own plugins for Nagios using shell script, Perl, or any other language.
This has only scratched the surface of what can be done with Nagios. You can observe Nagios reports and trends for hosts using the Web interface to view data, and there are a lot of different pre-existing plugins that can be used to check host uptime and availability, services like LDAP, SSH, FTP, and more. Nagios can be a little time-consuming to set up, but the end result is worth it, especially if you are in charge of watching even a few different systems and want early warnings of problems or potential problems.
Delivered each Tuesday, TechRepublic's free Linux NetNote provides tips, articles, and other resources to help you hone your Linux skills. Automatically sign up today!
Print/View all Posts Comments on this article
SponsoredWhite Papers, Webcasts, and Downloads
- Live Webcast: Simplified IT with Software-as-a-Service (SaaS) ZDNet
- Next Generation Mobility Now Sprint
- Yankee Group: Exploring the Benefits of 3G Wireless Integrated into Business-Class Routers Sprint
- How File Fragmentation Occurs on Windows XP / Windows Server 2003 Diskeeper
- Does fragmentation affect SANs, NAS, and RAID? Diskeeper
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET
