UW SSEC Lustre Statistics How-To

From OpenSFS Wiki
Revision as of 11:55, 3 February 2015 by AndrewWagner (talk | contribs)
Jump to navigation Jump to search

Introduction

This guide will take the user step-by-step through the Lustre Monitoring deployment that the Space Science and Engineering Center uses for monitoring all of its Lustre file systems. The author of this guide is Andrew Wagner ([email protected]).

Building the Lustre Monitoring Deployment

Setting up an OMD Monitoring Server

The first thing that we needed for our new monitoring deployment was a monitoring server. We were already using Check_MK with Nagios on our older monitoring server but the Open Monitoring Distribution nicely ties all of the components together. The distribution is available at http://omdistro.org/ and installs via RPM.

On a newly deployed Centos6 machine, I installed the OMD-1.20 RPM. This takes care of all of the work of install Nagios, Check_MK, PNP4Nagios, etc.

After installation, I created the new OMD monitoring site:

omd create ssec

This creates a new site that runs its own stack of Apache, Nagios, Check_MK and everything else in the OMD distribution. Now we can start the site:

omd start ssec


Deploying Agents to Lustre Hosts

Writing Local Checks to Run via Agents

Check_MK RRD Graphs

Deploying Graphite/Carbon

Deploying Grafana

Using Graphios to Redirect Lustre Stats to Carbon