UW SSEC Lustre Statistics How-To: Difference between revisions

From OpenSFS Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 23: Line 23:
https://mathias-kettner.de/checkmk_multisite_ldap_integration.html
https://mathias-kettner.de/checkmk_multisite_ldap_integration.html


Additionally, we setup HTTPS for our web access:
Additionally, we setup HTTPS for our web access to OMD:
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-May/012225.html
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-May/012225.html


At this point, you can start configuring your monitoring server to monitor hosts! Check_MK has a lot of configuration options, but it's a lot better than managing Nagios configurations by hand.
At this point, you can start configuring your monitoring server to monitor hosts! Check_MK has a lot of configuration options, but it's a lot better than managing Nagios configurations by hand. Fortunately, Check_MK is widely used and well documented.


=== Deploying Agents to Lustre Hosts ===
=== Deploying Agents to Lustre Hosts ===
To operate, the check_mk_agent on hosts runs as an xinetd service with a config file at /etc/xinetd.d/check_mk. That file includes the IP addresses allowed to access the agent. I rebuilt the RPM using rpmrebuild to include our updated IP addresses.
After rebuilding the RPM, push out the RPM to all hosts that will be monitored. We use a custom repository and Puppet for managing our existing software, so adding the RPM to the repo and pushing out via Puppet can be done with a simple module.


=== Writing Local Checks to Run via Agents ===
=== Writing Local Checks to Run via Agents ===

Revision as of 13:14, 5 February 2015

Introduction

This guide will take the user step-by-step through the Lustre Monitoring deployment that the Space Science and Engineering Center uses for monitoring all of its Lustre file systems. The author of this guide is Andrew Wagner ([email protected]).

Building the Lustre Monitoring Deployment

Setting up an OMD Monitoring Server

The first thing that we needed for our new monitoring deployment was a monitoring server. We were already using Check_MK with Nagios on our older monitoring server but the Open Monitoring Distribution nicely ties all of the components together. The distribution is available at http://omdistro.org/ and installs via RPM.

On a newly deployed Centos6 machine, I installed the OMD-1.20 RPM. This takes care of all of the work of installing Nagios, Check_MK, PNP4Nagios, etc.

After installation, I created the new OMD monitoring site:

omd create ssec

This creates a new site that runs its own stack of Apache, Nagios, Check_MK and everything else in the OMD distribution. Now we can start the site:

omd start ssec

You can now nagivate to http://example.fqdn.com/sitename of your server, i.e. http://example.ssec.wisc.edu/ssec and login with the default OMD credentials.

We chose to setup LDAPS authentication versus our Active Directory server to manage authentication. There is a good discussion of how to do this here: https://mathias-kettner.de/checkmk_multisite_ldap_integration.html

Additionally, we setup HTTPS for our web access to OMD: http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-May/012225.html

At this point, you can start configuring your monitoring server to monitor hosts! Check_MK has a lot of configuration options, but it's a lot better than managing Nagios configurations by hand. Fortunately, Check_MK is widely used and well documented.

Deploying Agents to Lustre Hosts

To operate, the check_mk_agent on hosts runs as an xinetd service with a config file at /etc/xinetd.d/check_mk. That file includes the IP addresses allowed to access the agent. I rebuilt the RPM using rpmrebuild to include our updated IP addresses.

After rebuilding the RPM, push out the RPM to all hosts that will be monitored. We use a custom repository and Puppet for managing our existing software, so adding the RPM to the repo and pushing out via Puppet can be done with a simple module.

Writing Local Checks to Run via Agents

Check_MK RRD Graphs

Deploying Graphite/Carbon

Deploying Grafana

Using Graphios to Redirect Lustre Stats to Carbon