Lustre Monitoring and Statistics Guide: Difference between revisions
No edit summary |
No edit summary |
||
Line 155: | Line 155: | ||
==== Collectl and Ganglia ==== | ==== Collectl and Ganglia ==== | ||
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre | |||
This process is not based on the new versions, but they should work similarly. | |||
# collectl - does the '''gather''' by writing to a text file on the host being monitored | |||
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting. | |||
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia | |||
==== Perl and Graphite ==== | ==== Perl and Graphite ==== | ||
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use. | |||
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket. | |||
Lustrestats.pm - perl module to parse different types of lustre stats | |||
lustrestats scripts (lost to the sands of time?) | |||
==== check_mk and Graphite ==== | ==== check_mk and Graphite ==== | ||
Instead of directly sending with perl, use check_mk | |||
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple | |||
graphios plugin | |||
show data timestamp delay | |||
==== Logstash, python, and Graphite ==== | |||
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html | |||
==== NCI project ==== | ==== NCI project ==== |
Revision as of 15:27, 5 November 2014
DRAFT IN PROGRESS
Introduction
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.
This does not include Lustre log analysis.
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.
Lustre Versions
This information is based on working mostly with Lustre 2.4 and 2.5.
Reading /proc vs lctl
'cat /proc/fs/lustre...' vs 'lctl get_param' With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax.
Data Formats
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools.
Stats
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data.
Example:
obdfilter.scratch-OST0001.stats= snapshot_time 1409777887.590578 secs.usecs read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304 write_bytes 16230483 samples [bytes] 1 1048576 14761109479164 get_info 3735777 samples [reqs]
snapshot_time = when the stats were written
For read_bytes and write_bytes: First number = number of times (samples) the OST has handled a read or write. Second number = the minimum read/write size Third number = maximum read/write size Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.
Jobstats
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data.
Example:
obdfilter.scratch-OST0000.job_stats=job_stats: - job_id: 56744 snapshot_time: 1409778251 read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 } write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 } setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs } - job_id: . . . ETC
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?
Single
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example:
[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384 osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540 osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532
Histogram
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)
- brw_stats
- extent_stats
Interesting Statistics Files
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.
- Host Type = MDS, OSS, client
- Target = "lctl get_param target"
- Format = data format discussed above
Host Type | Target | Format | Discussion |
---|---|---|---|
MDS | mdt.*MDT*.num_exports | single | number of exports per MDT - these are clients, including other lustre servers |
MDS | mdt.*.job_stats | jobstats | Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption. |
OSS | obdfilter.*.job_stats | jobstats | the per OST jobstats. |
MDS | mdt.*.md_stats | stats | Overall metadata stats per MDT |
MDS | mdt.*MDT*.exports.*@*.stats | stats | Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example. |
OSS | obdfilter.*.stats | stats | Operations per OST. Read and write data is particularly interesting |
OSS | obdfilter.*OST*.exports.*@*.stats | stats | per-export OSS statistics |
MDS | osd-*.*MDT*.filesfree or filestotal | single | available or total inodes |
MDS | osd-*.*MDT*.kbytesfree or kbytestotal | single | available or total disk space |
OSS | obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal | single | inodes and disk space as in MDS version |
OSS | ldlm.namespaces.filter-*.pool.stats | stats | lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience. |
OSS | ldlm.namespaces.filter-*.lock_count | single | lustre distributed lock manager (ldlm) locks |
OSS | ldlm.namespaces.filter-*.pool.granted | single | lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match. |
OSS | ldlm.namespaces.filter-*.pool.grant_rate | single | ldlm lock grant rate aka 'GR' |
OSS | ldlm.namespaces.filter-*.pool.grant_speed | single | ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume. |
Working With the Data
Packages, tools, and techniques for working with Lustre statistics.
Open Source Monitoring Packages
- LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt
- lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop
Commercial Monitoring Packages
- Terascala 'teraos'
- DDN datablarker
- Intel Enterprise Edition for Linux Managerator
Build it Yourself
Here are some basic steps and techniques for working with the Lustre statistics.
- Gather the data on hosts you are monitoring. Deal with the syntax, extract what you want
- Collect the data centrally - either pull or push it to your server, or collection of monitoring servers.
- Process the data - this may be optional or minimal.
- Alert on the data - optional but often useful.
- Present the data - allow for visualization, analysis, etc.
Here are details of some solutions tested or in use:
Collectl and Ganglia
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre
This process is not based on the new versions, but they should work similarly.
- collectl - does the gather by writing to a text file on the host being monitored
- ganglia does the collect via gmond and python script 'collectl.py' and present via ganglia web pages - there is no alerting.
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia
Perl and Graphite
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use.
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket.
Lustrestats.pm - perl module to parse different types of lustre stats
lustrestats scripts (lost to the sands of time?)
check_mk and Graphite
Instead of directly sending with perl, use check_mk local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple graphios plugin show data timestamp delay
Logstash, python, and Graphite
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html
NCI project
Note: I don't know if this will have source available. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf
References and Links
- Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf
- Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf
- Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf
- Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf
- Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf