http://wiki.opensfs.org/api.php?action=feedcontributions&user=Scottn&feedformat=atomOpenSFS Wiki - User contributions [en]2024-03-29T14:21:00ZUser contributionsMediaWiki 1.39.3http://wiki.opensfs.org/index.php?title=UW_SSEC_Lustre_Statistics_How-To&diff=1739UW SSEC Lustre Statistics How-To2015-04-14T12:52:56Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
This guide will take the user step-by-step through the Lustre Monitoring deployment that the Space Science and Engineering Center uses for monitoring all of its Lustre file systems. The author of this guide is Andrew Wagner (andrew.wagner@ssec.wisc.edu). Where possible, I have linked to our production configuration files for software to give readers a good idea of the possible settings they can or should use for their own setups.<br />
<br />
== Hardware Requirements ==<br />
<br />
Any existing server can be used for a proof of concept version of this guide. The requirements for several thousands checks per minute are low - a small VM can easily handle the load.<br />
<br />
Our productions server can easily handle ~150k checks per minute and from a processing/disk I/O perspective can handle much more. Here are the specs:<br />
<br />
*Dell PowerEdge R515<br />
**2x 8-Core AMD Opteron 4386<br />
**300GB RAID1 15K SAS<br />
**200GB Enterprise SSD<br />
**64GB RAM<br />
<br />
== Software Requirements ==<br />
<br />
*Centos 6 x86_64<br />
*Centos 6 EPEL Repository<br />
*Configuration Management System (Puppet, Ansible, Salt, Chef, etc)<br />
<br />
== Building the Lustre Monitoring Deployment ==<br />
<br />
=== Setting up an OMD Monitoring Server ===<br />
<br />
The first thing that we needed for our new monitoring deployment was a monitoring server. We were already using Check_MK with Nagios on our older monitoring server but the Open Monitoring Distribution nicely ties all of the components together. The distribution is available at http://omdistro.org/ and installs via RPM.<br />
<br />
On a newly deployed Centos6 machine, I installed the OMD-1.20 RPM. This takes care of all of the work of installing Nagios, Check_MK, PNP4Nagios, etc.<br />
<br />
After installation, I created the new OMD monitoring site:<br />
<br />
<code>omd create ssec</code><br />
<br />
This creates a new site that runs its own stack of Apache, Nagios, Check_MK and everything else in the OMD distribution. Now we can start the site:<br />
<br />
<code>omd start ssec</code><br />
<br />
You can now nagivate to http://example.fqdn.com/sitename of your server, i.e. http://example.ssec.wisc.edu/ssec and login with the default OMD credentials.<br />
<br />
We chose to setup LDAPS authentication versus our Active Directory server to manage authentication. There is a good discussion of how to do this here:<br />
https://mathias-kettner.de/checkmk_multisite_ldap_integration.html<br />
<br />
Additionally, we setup HTTPS for our web access to OMD:<br />
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-May/012225.html<br />
<br />
At this point, you can start configuring your monitoring server to monitor hosts! Check_MK has a lot of configuration options, but it's a lot better than managing Nagios configurations by hand. Fortunately, Check_MK is widely used and well documented. The Check_MK documentation root is available at http://mathias-kettner.de/checkmk.html. <br />
<br />
=== Deploying Agents to Lustre Hosts ===<br />
<br />
To operate, the Check_MK agent on hosts runs as an xinetd service with a config file at /etc/xinetd.d/check_mk. That file includes the IP addresses allowed to access the agent in the '''only_from''' parameter. The OMD distribution comes with Check_MK agent RPMs. I rebuilt the RPM using rpmrebuild to include our updated IP addresses for our monitoring servers.<br />
<br />
After rebuilding the RPM, push out the RPM to all hosts that will be monitored. We use a custom repository and Puppet for managing our existing software, so adding the RPM to the repo and pushing out via Puppet can be done with a simple module.<br />
<br />
After deployment, we can verify the agents work by adding them to Check_MK via the GUI or configuration file and inventorying them. This will allow us to monitor a wide array of default metrics such as CPU Load, CPU Utilization, Memory use, and many others.<br />
<br />
=== Writing Local Checks to Run via Check_MK Agent ===<br />
<br />
Now that the Check_MK agents are deployed to the Lustre servers, we can add Check_MK local agent checks to measure whatever we want. The documentation for local checks is here: http://mathias-kettner.de/checkmk_localchecks.html.<br />
<br />
The output of the check has to have a Nagios status number, Name, Performance Data, and Check Output.<br />
<br />
Check out the examples in the Check_MK documentation for formatting of output. You can use whatever language your server supports to execute the local check. At SSEC, Scott Nolin has implemented several Perl scripts to poll Lustre statistics and output in the Check_MK format. You can read more about the checks here:<br />
http://wiki.opensfs.org/Lustre_Statistics_Guide.<br />
<br />
=== Check_MK RRD Graphs ===<br />
<br />
Once you start collecting this performance data, OMD automatically uses PNP4NAGIOS to create RRD graphs for each collected metric. Check_MK then will display these RRDs in the monitoring interface. This can be useful for small scale testing where you are only collecting a few tens of metrics. However, a thorough stat collection on large Lustre file systems can yield hundreds or even thousands of individual metrics. Check_MK and PNP4NAGIOS are thoroughly outclassed when asked to display such a large number of RRD graphs and respond poorly to high I/O situations.<br />
<br />
Thus, we turn to the Graphite/Carbon metric storage system.<br />
<br />
=== Deploying Graphite/Carbon ===<br />
<br />
The Graphite/Carbon software package collects metrics and stores them in Whisper databases files. Graphite is the web frontend and Carbon is the backend that controls the Whisper database files. Whisper files are similar to RRD files in that they have a defined size and fixed constraints on how the file manages time series data as time passes. However, it has many key improvements as described here: http://graphite.readthedocs.org/en/latest/whisper.html<br />
<br />
The installation and basic setup of Graphite and Carbon is pretty easy. We used the version of Graphite found in EPEL.<br />
<br />
<code> yum install graphite-web </code><br />
<br />
This installs both Graphite/Carbon. Graphite is a basic web frontend for visualizing data. The web configuration can be found at /etc/httpd/conf.d/graphite-web.conf. While the Graphite frontend works alright, at SSEC we vastly prefer the usability of Grafana. The next section describes how that frontend is deployed and configured. <br />
<br />
There are three Carbon services that need to be set to run on startup:<br />
<br />
*carbon-aggregator<br />
*carbon-cache<br />
*carbon-relay<br />
<br />
The Carbon configuration files can be found at /etc/carbon. Below, I've linked to our settings for the various different Carbon configuration files. I don't attest to the correctness of these settings, but if you have no idea how where to start, these will at least get you up and running!<br />
<br />
*http://www.ssec.wisc.edu/~andreww/files/carbon.conf<br />
*http://www.ssec.wisc.edu/~andreww/files/storage-aggregation.conf<br />
*http://www.ssec.wisc.edu/~andreww/files/storage-schemas.conf<br />
<br />
Once Carbon is running, you can actually use the Graphite/Carbon installation if you don't want to have dashboards and such. Graphite is well documented and you can read more about the software here: http://graphite.readthedocs.org/en/latest/<br />
<br />
==== Graphite Metric Namespace ====<br />
<br />
Creating an appropriate namespace for Graphite metrics is difficult. We've gone through a dozen iterations at SSEC before arriving at one that is now largely satisfactory. The Graphite namespace refers to how you organize your metrics in the Graphite/Carbon system and how you will access them laster.<br />
<br />
Below is an example namespace for an SSEC Lustre OSS in our Delta filesystem:<br />
<br />
<code>lustre.oss.delta.delta-1-21.delta-OST0010.stats.write_bytes</code><br />
<br />
The above namespace describes the namespace for the bytes written to OST0010 on the delta-1-21 server under the lustre.oss category. You can almost think of these as paths to the RRD files in which Graphite stores metrics. Each field between the periods is mutable. For example, I could change write_bytes to read_bytes to get that metric for OST0010 on delta-1-21 or perhaps even change delta-1-21 to delta-3-11 and pick a different OST entirely. Each of those fields be named anything logical and then accessed with Graphite or Grafana to create graphs of what you are interested in visualizing. <br />
<br />
Here's another example:<br />
<br />
<code>servers.iris.compute.r720-0-5.mem.buffers</code><br />
<br />
The above namespace links to the memory buffer metrics for a server named r720-0-5 of the compute type in the SSEC HPC Clustre Iris. Look at the chart below to get the best idea of how the namespace works:<br />
<br />
{| class="wikitable"<br />
|-<br />
!servers!!iris!!compute!!r720-0-5!!mem!!buffers<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|}<br />
=== Deploying Grafana ===<br />
<br />
Grafana is the dashboard that SSEC prefers to use for data visualization. <br />
<br />
* To read about Grafana, check out this link: http://docs.grafana.org/<br />
* To try a test install of Grafana to get a feel for use, go here: http://play.grafana.org<br />
* To install Grafana, we used the RPM available here: http://docs.grafana.org/installation/rpm/<br />
<br />
Building dashboards via the grafana gui is easy, and becomes the 'analyst tool of choice' for understanding data. These dashboards will serve for many needs.<br />
<br />
When these type of dashboards are not enough, or the workflow makes them too tedious to build you can create *scripted dashboards* in grafana. These are javascript programs, and require some coding, so are more work to create - but potentially very powerful.<br />
<br />
=== Using Graphios to Redirect Lustre Stats to Carbon ===</div>Scottnhttp://wiki.opensfs.org/index.php?title=UW_SSEC_Lustre_Statistics_How-To&diff=1738UW SSEC Lustre Statistics How-To2015-04-14T12:51:32Z<p>Scottn: /* Deploying Grafana */</p>
<hr />
<div>== Introduction ==<br />
This guide will take the user step-by-step through the Lustre Monitoring deployment that the Space Science and Engineering Center uses for monitoring all of its Lustre file systems. The author of this guide is Andrew Wagner (andrew.wagner@ssec.wisc.edu). Where possible, I have linked to our production configuration files for software to give readers a good idea of the possible settings they can or should use for their own setups.<br />
<br />
== Hardware Requirements ==<br />
<br />
Any existing server can be used for a proof of concept version of this guide. The requirements for several thousands checks per minute are low - a small VM can easily handle the load.<br />
<br />
Our productions server can easily handle ~150k checks per minute and from a processing/disk I/O perspective can handle much more. Here are the specs:<br />
<br />
*Dell PowerEdge R515<br />
**2x 8-Core AMD Opteron 4386<br />
**300GB RAID1 15K SAS<br />
**200GB Enterprise SSD<br />
**64GB RAM<br />
<br />
== Software Requirements ==<br />
<br />
*Centos 6 x86_64<br />
*Centos 6 EPEL Repository<br />
*Configuration Management System (Puppet, Ansible, Salt, Chef, etc)<br />
<br />
== Building the Lustre Monitoring Deployment ==<br />
<br />
=== Setting up an OMD Monitoring Server ===<br />
<br />
The first thing that we needed for our new monitoring deployment was a monitoring server. We were already using Check_MK with Nagios on our older monitoring server but the Open Monitoring Distribution nicely ties all of the components together. The distribution is available at http://omdistro.org/ and installs via RPM.<br />
<br />
On a newly deployed Centos6 machine, I installed the OMD-1.20 RPM. This takes care of all of the work of installing Nagios, Check_MK, PNP4Nagios, etc.<br />
<br />
After installation, I created the new OMD monitoring site:<br />
<br />
<code>omd create ssec</code><br />
<br />
This creates a new site that runs its own stack of Apache, Nagios, Check_MK and everything else in the OMD distribution. Now we can start the site:<br />
<br />
<code>omd start ssec</code><br />
<br />
You can now nagivate to http://example.fqdn.com/sitename of your server, i.e. http://example.ssec.wisc.edu/ssec and login with the default OMD credentials.<br />
<br />
We chose to setup LDAPS authentication versus our Active Directory server to manage authentication. There is a good discussion of how to do this here:<br />
https://mathias-kettner.de/checkmk_multisite_ldap_integration.html<br />
<br />
Additionally, we setup HTTPS for our web access to OMD:<br />
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-May/012225.html<br />
<br />
At this point, you can start configuring your monitoring server to monitor hosts! Check_MK has a lot of configuration options, but it's a lot better than managing Nagios configurations by hand. Fortunately, Check_MK is widely used and well documented. The Check_MK documentation root is available at http://mathias-kettner.de/checkmk.html. <br />
<br />
=== Deploying Agents to Lustre Hosts ===<br />
<br />
To operate, the Check_MK agent on hosts runs as an xinetd service with a config file at /etc/xinetd.d/check_mk. That file includes the IP addresses allowed to access the agent in the '''only_from''' parameter. The OMD distribution comes with Check_MK agent RPMs. I rebuilt the RPM using rpmrebuild to include our updated IP addresses for our monitoring servers.<br />
<br />
After rebuilding the RPM, push out the RPM to all hosts that will be monitored. We use a custom repository and Puppet for managing our existing software, so adding the RPM to the repo and pushing out via Puppet can be done with a simple module.<br />
<br />
After deployment, we can verify the agents work by adding them to Check_MK via the GUI or configuration file and inventorying them. This will allow us to monitor a wide array of default metrics such as CPU Load, CPU Utilization, Memory use, and many others.<br />
<br />
=== Writing Local Checks to Run via Check_MK Agent ===<br />
<br />
Now that the Check_MK agents are deployed to the Lustre servers, we can add Check_MK local agent checks to measure whatever we want. The documentation for local checks is here: http://mathias-kettner.de/checkmk_localchecks.html.<br />
<br />
The output of the check has to have a Nagios status number, Name, Performance Data, and Check Output.<br />
<br />
Check out the examples in the Check_MK documentation for formatting of output. You can use whatever language your server supports to execute the local check. At SSEC, Scott Nolin has implemented several Perl scripts to poll Lustre statistics and output in the Check_MK format. You can read more about the checks here:<br />
http://wiki.opensfs.org/Lustre_Statistics_Guide.<br />
<br />
=== Check_MK RRD Graphs ===<br />
<br />
Once you start collecting this performance data, OMD automatically uses PNP4NAGIOS to create RRD graphs for each collected metric. Check_MK then will display these RRDs in the monitoring interface. This can be useful for small scale testing where you are only collecting a few tens of metrics. However, a thorough stat collection on large Lustre file systems can yield hundreds or even thousands of individual metrics. Check_MK and PNP4NAGIOS are thoroughly outclassed when asked to display such a large number of RRD graphs and respond poorly to high I/O situations.<br />
<br />
Thus, we turn to the Graphite/Carbon metric storage system.<br />
<br />
=== Deploying Graphite/Carbon ===<br />
<br />
The Graphite/Carbon software package collects metrics and stores them in Whisper databases files. Graphite is the web frontend and Carbon is the backend that controls the Whisper database files. Whisper files are similar to RRD files in that they have a defined size and fixed constraints on how the file manages time series data as time passes. However, it has many key improvements as described here: http://graphite.readthedocs.org/en/latest/whisper.html<br />
<br />
The installation and basic setup of Graphite and Carbon is pretty easy. We used the version of Graphite found in EPEL.<br />
<br />
<code> yum install graphite-web </code><br />
<br />
This installs both Graphite/Carbon. Graphite is a basic web frontend for visualizing data. The web configuration can be found at /etc/httpd/conf.d/graphite-web.conf. While the Graphite frontend works alright, at SSEC we vastly prefer the usability of Grafana. The next section describes how that frontend is deployed and configured. <br />
<br />
There are three Carbon services that need to be set to run on startup:<br />
<br />
*carbon-aggregator<br />
*carbon-cache<br />
*carbon-relay<br />
<br />
The Carbon configuration files can be found at /etc/carbon. Below, I've linked to our settings for the various different Carbon configuration files. I don't attest to the correctness of these settings, but if you have no idea how where to start, these will at least get you up and running!<br />
<br />
*http://www.ssec.wisc.edu/~andreww/files/carbon.conf<br />
*http://www.ssec.wisc.edu/~andreww/files/storage-aggregation.conf<br />
*http://www.ssec.wisc.edu/~andreww/files/storage-schemas.conf<br />
<br />
Once Carbon is running, you can actually use the Graphite/Carbon installation if you don't want to have dashboards and such. Graphite is well documented and you can read more about the software here: http://graphite.readthedocs.org/en/latest/<br />
<br />
==== Graphite Metric Namespace ====<br />
<br />
Creating an appropriate namespace for Graphite metrics is difficult. We've gone through a dozen iterations at SSEC before arriving at one that is now largely satisfactory. The Graphite namespace refers to how you organize your metrics in the Graphite/Carbon system and how you will access them laster.<br />
<br />
Below is an example namespace for an SSEC Lustre OSS in our Delta filesystem:<br />
<br />
<code>lustre.oss.delta.delta-1-21.delta-OST0010.stats.write_bytes</code><br />
<br />
The above namespace describes the namespace for the bytes written to OST0010 on the delta-1-21 server under the lustre.oss category. You can almost think of these as paths to the RRD files in which Graphite stores metrics. Each field between the periods is mutable. For example, I could change write_bytes to read_bytes to get that metric for OST0010 on delta-1-21 or perhaps even change delta-1-21 to delta-3-11 and pick a different OST entirely. Each of those fields be named anything logical and then accessed with Graphite or Grafana to create graphs of what you are interested in visualizing. <br />
<br />
Here's another example:<br />
<br />
<code>servers.iris.compute.r720-0-5.mem.buffers</code><br />
<br />
The above namespace links to the memory buffer metrics for a server named r720-0-5 of the compute type in the SSEC HPC Clustre Iris. Look at the chart below to get the best idea of how the namespace works:<br />
<br />
{| class="wikitable"<br />
|-<br />
!servers!!iris!!compute!!r720-0-5!!mem!!buffers<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|}<br />
=== Deploying Grafana ===<br />
<br />
Grafana is the dashboard that SSEC prefers to use for data visualization. <br />
<br />
* To read about Grafana, check out this link: http://docs.grafana.org/<br />
* To try a test install of Grafana to get a feel for use, go here: http://play.grafana.org<br />
* Some example snapshots of SSEC dashboards - these are somewhat interactive but don't provide the full edit capabilitis of the http://play.grafana.org site:<br />
* Lustre Metadata - http://snapshot.raintank.io/dashboard/snapshot/Xd0kcR3gi655kV5WJQ8jC146JiMVaHXU <br />
* Lustre details - http://snapshot.raintank.io/dashboard/snapshot/NNqPFFUaRlxzLLw0RzlmHsC75Z1yLxoi <br />
* Lustre Routers - http://snapshot.raintank.io/dashboard/snapshot/GIupHGMjf4hS12IYvSoGbf0a3q1wjW1t <br />
* single host - http://snapshot.raintank.io/dashboard/snapshot/3SaSSlqEGyO9IjrfGru4V0nebn5TdiaD <br />
* coolers - http://snapshot.raintank.io/dashboard/snapshot/WXZG341nFdmWdcoEovHBPCWYmbEeumDv <br />
* MDF switch - http://snapshot.raintank.io/dashboard/snapshot/lw0eZBCgUwHtZ2hEtF1At82c6bpl443l<br />
* To install Grafana, we used the RPM available here: http://docs.grafana.org/installation/rpm/<br />
<br />
Building dashboards via the grafana gui is easy, and becomes the 'analyst tool of choice' for understanding data. These dashboards will serve for many needs.<br />
<br />
When these type of dashboards are not enough, or the workflow makes them too tedious to build you can create *scripted dashboards* in grafana. These are javascript programs, and require some coding, so are more work to create - but potentially very powerful.<br />
<br />
=== Using Graphios to Redirect Lustre Stats to Carbon ===</div>Scottnhttp://wiki.opensfs.org/index.php?title=UW_SSEC_Lustre_Statistics_How-To&diff=1737UW SSEC Lustre Statistics How-To2015-04-14T12:38:03Z<p>Scottn: /* Deploying Grafana */</p>
<hr />
<div>== Introduction ==<br />
This guide will take the user step-by-step through the Lustre Monitoring deployment that the Space Science and Engineering Center uses for monitoring all of its Lustre file systems. The author of this guide is Andrew Wagner (andrew.wagner@ssec.wisc.edu). Where possible, I have linked to our production configuration files for software to give readers a good idea of the possible settings they can or should use for their own setups.<br />
<br />
== Hardware Requirements ==<br />
<br />
Any existing server can be used for a proof of concept version of this guide. The requirements for several thousands checks per minute are low - a small VM can easily handle the load.<br />
<br />
Our productions server can easily handle ~150k checks per minute and from a processing/disk I/O perspective can handle much more. Here are the specs:<br />
<br />
*Dell PowerEdge R515<br />
**2x 8-Core AMD Opteron 4386<br />
**300GB RAID1 15K SAS<br />
**200GB Enterprise SSD<br />
**64GB RAM<br />
<br />
== Software Requirements ==<br />
<br />
*Centos 6 x86_64<br />
*Centos 6 EPEL Repository<br />
*Configuration Management System (Puppet, Ansible, Salt, Chef, etc)<br />
<br />
== Building the Lustre Monitoring Deployment ==<br />
<br />
=== Setting up an OMD Monitoring Server ===<br />
<br />
The first thing that we needed for our new monitoring deployment was a monitoring server. We were already using Check_MK with Nagios on our older monitoring server but the Open Monitoring Distribution nicely ties all of the components together. The distribution is available at http://omdistro.org/ and installs via RPM.<br />
<br />
On a newly deployed Centos6 machine, I installed the OMD-1.20 RPM. This takes care of all of the work of installing Nagios, Check_MK, PNP4Nagios, etc.<br />
<br />
After installation, I created the new OMD monitoring site:<br />
<br />
<code>omd create ssec</code><br />
<br />
This creates a new site that runs its own stack of Apache, Nagios, Check_MK and everything else in the OMD distribution. Now we can start the site:<br />
<br />
<code>omd start ssec</code><br />
<br />
You can now nagivate to http://example.fqdn.com/sitename of your server, i.e. http://example.ssec.wisc.edu/ssec and login with the default OMD credentials.<br />
<br />
We chose to setup LDAPS authentication versus our Active Directory server to manage authentication. There is a good discussion of how to do this here:<br />
https://mathias-kettner.de/checkmk_multisite_ldap_integration.html<br />
<br />
Additionally, we setup HTTPS for our web access to OMD:<br />
http://lists.mathias-kettner.de/pipermail/checkmk-en/2014-May/012225.html<br />
<br />
At this point, you can start configuring your monitoring server to monitor hosts! Check_MK has a lot of configuration options, but it's a lot better than managing Nagios configurations by hand. Fortunately, Check_MK is widely used and well documented. The Check_MK documentation root is available at http://mathias-kettner.de/checkmk.html. <br />
<br />
=== Deploying Agents to Lustre Hosts ===<br />
<br />
To operate, the Check_MK agent on hosts runs as an xinetd service with a config file at /etc/xinetd.d/check_mk. That file includes the IP addresses allowed to access the agent in the '''only_from''' parameter. The OMD distribution comes with Check_MK agent RPMs. I rebuilt the RPM using rpmrebuild to include our updated IP addresses for our monitoring servers.<br />
<br />
After rebuilding the RPM, push out the RPM to all hosts that will be monitored. We use a custom repository and Puppet for managing our existing software, so adding the RPM to the repo and pushing out via Puppet can be done with a simple module.<br />
<br />
After deployment, we can verify the agents work by adding them to Check_MK via the GUI or configuration file and inventorying them. This will allow us to monitor a wide array of default metrics such as CPU Load, CPU Utilization, Memory use, and many others.<br />
<br />
=== Writing Local Checks to Run via Check_MK Agent ===<br />
<br />
Now that the Check_MK agents are deployed to the Lustre servers, we can add Check_MK local agent checks to measure whatever we want. The documentation for local checks is here: http://mathias-kettner.de/checkmk_localchecks.html.<br />
<br />
The output of the check has to have a Nagios status number, Name, Performance Data, and Check Output.<br />
<br />
Check out the examples in the Check_MK documentation for formatting of output. You can use whatever language your server supports to execute the local check. At SSEC, Scott Nolin has implemented several Perl scripts to poll Lustre statistics and output in the Check_MK format. You can read more about the checks here:<br />
http://wiki.opensfs.org/Lustre_Statistics_Guide.<br />
<br />
=== Check_MK RRD Graphs ===<br />
<br />
Once you start collecting this performance data, OMD automatically uses PNP4NAGIOS to create RRD graphs for each collected metric. Check_MK then will display these RRDs in the monitoring interface. This can be useful for small scale testing where you are only collecting a few tens of metrics. However, a thorough stat collection on large Lustre file systems can yield hundreds or even thousands of individual metrics. Check_MK and PNP4NAGIOS are thoroughly outclassed when asked to display such a large number of RRD graphs and respond poorly to high I/O situations.<br />
<br />
Thus, we turn to the Graphite/Carbon metric storage system.<br />
<br />
=== Deploying Graphite/Carbon ===<br />
<br />
The Graphite/Carbon software package collects metrics and stores them in Whisper databases files. Graphite is the web frontend and Carbon is the backend that controls the Whisper database files. Whisper files are similar to RRD files in that they have a defined size and fixed constraints on how the file manages time series data as time passes. However, it has many key improvements as described here: http://graphite.readthedocs.org/en/latest/whisper.html<br />
<br />
The installation and basic setup of Graphite and Carbon is pretty easy. We used the version of Graphite found in EPEL.<br />
<br />
<code> yum install graphite-web </code><br />
<br />
This installs both Graphite/Carbon. Graphite is a basic web frontend for visualizing data. The web configuration can be found at /etc/httpd/conf.d/graphite-web.conf. While the Graphite frontend works alright, at SSEC we vastly prefer the usability of Grafana. The next section describes how that frontend is deployed and configured. <br />
<br />
There are three Carbon services that need to be set to run on startup:<br />
<br />
*carbon-aggregator<br />
*carbon-cache<br />
*carbon-relay<br />
<br />
The Carbon configuration files can be found at /etc/carbon. Below, I've linked to our settings for the various different Carbon configuration files. I don't attest to the correctness of these settings, but if you have no idea how where to start, these will at least get you up and running!<br />
<br />
*http://www.ssec.wisc.edu/~andreww/files/carbon.conf<br />
*http://www.ssec.wisc.edu/~andreww/files/storage-aggregation.conf<br />
*http://www.ssec.wisc.edu/~andreww/files/storage-schemas.conf<br />
<br />
Once Carbon is running, you can actually use the Graphite/Carbon installation if you don't want to have dashboards and such. Graphite is well documented and you can read more about the software here: http://graphite.readthedocs.org/en/latest/<br />
<br />
==== Graphite Metric Namespace ====<br />
<br />
Creating an appropriate namespace for Graphite metrics is difficult. We've gone through a dozen iterations at SSEC before arriving at one that is now largely satisfactory. The Graphite namespace refers to how you organize your metrics in the Graphite/Carbon system and how you will access them laster.<br />
<br />
Below is an example namespace for an SSEC Lustre OSS in our Delta filesystem:<br />
<br />
<code>lustre.oss.delta.delta-1-21.delta-OST0010.stats.write_bytes</code><br />
<br />
The above namespace describes the namespace for the bytes written to OST0010 on the delta-1-21 server under the lustre.oss category. You can almost think of these as paths to the RRD files in which Graphite stores metrics. Each field between the periods is mutable. For example, I could change write_bytes to read_bytes to get that metric for OST0010 on delta-1-21 or perhaps even change delta-1-21 to delta-3-11 and pick a different OST entirely. Each of those fields be named anything logical and then accessed with Graphite or Grafana to create graphs of what you are interested in visualizing. <br />
<br />
Here's another example:<br />
<br />
<code>servers.iris.compute.r720-0-5.mem.buffers</code><br />
<br />
The above namespace links to the memory buffer metrics for a server named r720-0-5 of the compute type in the SSEC HPC Clustre Iris. Look at the chart below to get the best idea of how the namespace works:<br />
<br />
{| class="wikitable"<br />
|-<br />
!servers!!iris!!compute!!r720-0-5!!mem!!buffers<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example || Example || Example<br />
|}<br />
=== Deploying Grafana ===<br />
<br />
Grafana is the dashboard that SSEC prefers to use for data visualization. To read about Grafana, check out this link: http://docs.grafana.org/<br />
<br />
To install Grafana, we used the RPM available here: http://docs.grafana.org/installation/rpm/<br />
<br />
Building dashboards via the grafana gui is easy, and becomes the 'analyst tool of choice' for understanding data. These dashboards will serve for many needs.<br />
<br />
<br />
When these type of dashboards are not enough, or the workflow makes them too tedious to build you can create *scripted dashboards* in grafana. These are javascript programs, and require some coding, so are more work to create - but potentially very powerful.<br />
<br />
=== Using Graphios to Redirect Lustre Stats to Carbon ===</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1391Lustre Monitoring and Statistics Guide2015-01-14T17:37:29Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
This guide is by Scott Nolin (scott.nolin@ssec.wisc.edu), of the University of Wisconsin Space Science and Engineering Center.<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.<br />
<br />
=== Adding to This Guide ===<br />
<br />
If you have improvements, corrections, or more information to share on this topic please contribute to this page. Ideally this would become a community resource.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats? but unsure of all fields meaning || lustre distributed lock manager (ldlm) stats. I do not fully understand these stats or the format. It also appears that these same stats are duplicated a single stats. My understanding of these stats comes from http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || number of locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted lock<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.cancel_rate || single || ldlm lock cancel rate aka 'CR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1390Lustre Monitoring and Statistics Guide2015-01-14T17:03:12Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
This guide is by Scott Nolin (scott.nolin@ssec.wisc.edu), of the University of Wisconsin Space Science and Engineering Center.<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.<br />
<br />
=== Adding to This Guide ===<br />
<br />
If you have improvements, corrections, or more information to share on this topic please contribute to this page. Ideally this would become a community resource.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. My understanding of these stats comes from http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || number of locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted lock<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.cancel_rate || single || ldlm lock cancel rate aka 'CR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1389Lustre Monitoring and Statistics Guide2015-01-14T16:57:06Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
This guide is by Scott Nolin (scott.nolin@ssec.wisc.edu), of the University of Wisconsin Space Science and Engineering Center.<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.<br />
<br />
=== Adding to This Guide ===<br />
<br />
If you have improvements, corrections, or more information to share on this topic please contribute to this page. Ideally this would become a community resource.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. My understanding of these stats comes from http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || unsure?<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted loc<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted lock<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1388Lustre Monitoring and Statistics Guide2015-01-14T16:55:45Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
This guide is by Scott Nolin (scott.nolin@ssec.wisc.edu), of the University of Wisconsin Space Science and Engineering Center.<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.<br />
<br />
=== Adding to This Guide ===<br />
<br />
If you have improvements, corrections, or more information to share on this topic please contribute to this page. Ideally this would become a community resource.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. My understanding of these stats comes from http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || unsure?<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1387Lustre Monitoring and Statistics Guide2015-01-14T04:17:16Z<p>Scottn: added link to ldlm_pool_8c source for lock grant reference</p>
<hr />
<div>== Introduction ==<br />
<br />
This guide is by Scott Nolin (scott.nolin@ssec.wisc.edu), of the University of Wisconsin Space Science and Engineering Center.<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.<br />
<br />
=== Adding to This Guide ===<br />
<br />
If you have improvements, corrections, or more information to share on this topic please contribute to this page. Ideally this would become a community resource.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. My understanding of these stats comes from http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1379Lustre Monitoring and Statistics Guide2015-01-09T15:09:43Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
This guide is by Scott Nolin (scott.nolin@ssec.wisc.edu), of the University of Wisconsin Space Science and Engineering Center.<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems.<br />
<br />
=== Adding to This Guide ===<br />
<br />
If you have improvements, corrections, or more information to share on this topic please contribute to this page. Ideally this would become a community resource.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Statistics_Guide&diff=1285Lustre Statistics Guide2014-12-03T21:04:26Z<p>Scottn: Scottn moved page Lustre Statistics Guide to Lustre Monitoring and Statistics Guide</p>
<hr />
<div>#REDIRECT [[Lustre Monitoring and Statistics Guide]]</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1284Lustre Monitoring and Statistics Guide2014-12-03T21:04:25Z<p>Scottn: Scottn moved page Lustre Statistics Guide to Lustre Monitoring and Statistics Guide</p>
<hr />
<div>== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1228Lustre Monitoring and Statistics Guide2014-11-13T20:24:49Z<p>Scottn: corrected "exports" definition as suggested</p>
<hr />
<div>== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. The exports subdirectory lists client connections by NID. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1227Lustre Monitoring and Statistics Guide2014-11-11T23:39:22Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above.<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1226Lustre Monitoring and Statistics Guide2014-11-11T22:47:58Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
* First number = number of times (samples) the OST has handled a read or write. <br />
* Second number = the minimum read/write size<br />
* Third number = maximum read/write size<br />
* Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN <br />
* Intel Enterprise Edition for Linux <br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1225Lustre Monitoring and Statistics Guide2014-11-11T22:33:17Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN <br />
* Intel Enterprise Edition for Linux <br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1224Lustre Monitoring and Statistics Guide2014-11-11T22:29:10Z<p>Scottn: </p>
<hr />
<div>== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
==== A Note about Jobstats ====<br />
<br />
If using a whisper or RRD-file based solution, jobstats may not be a great fit. The strength of RRD or Whisper files are you have a set size for each metric collected. If your metrics are now per-job as opposed to only per-export or per-server, this means your ''number of metrics'' is now growing without bound.<br />
<br />
Solutions anyone?<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1223Lustre Monitoring and Statistics Guide2014-11-11T22:21:02Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are basic steps and techniques for working with the Lustre statistics. <br />
<br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite and Carbon - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts - these are simply run every minute via cron on the servers you monitor. For the SSEC prototype we simply sent text data via a TCP socket. The check_mk scripts in the next section have replaced these original test scripts.<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but is not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Another option is instead of directly sending with perl, use a check_mk local agent check.<br />
<br />
The local agent and pnp4nagios mean a reasonable infrastructure is already in place for alerting and also collecting performance data.<br />
<br />
While collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, so timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this difference. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this was acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. http://www.ssec.wisc.edu/~scottn/files/lustre_stats_mds.cmk http://www.ssec.wisc.edu/~scottn/files/lustre_stats_oss.cmk<br />
* graphios https://github.com/shawn-sterling/graphios - a python script to send your nagios performance data to graphite<br />
* Grafana - http://grafana.org - not required, but convenient for dashboards.<br />
<br />
'''Grafana Lustre Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Collectd plugin and Graphite ====<br />
<br />
This talk mentions a custom collectd plugin to send stats to graphite:<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
Unsure if the source for that plugin is available.<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1216Lustre Monitoring and Statistics Guide2014-11-06T22:31:40Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods for collecting and working with them.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working primarily with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts (lost to the sands of time?) - these are simply run every minute via cron on the servers<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid, interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this is acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. <br />
* graphios<br />
* Grafana - http://grafana.org<br />
<br />
'''Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1209Lustre Monitoring and Statistics Guide2014-11-06T19:17:14Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts (lost to the sands of time?) - these are simply run every minute via cron on the servers<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid, interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this is acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. <br />
* graphios<br />
* Grafana - http://grafana.org<br />
<br />
'''Dashboard Screenshots:'''<br />
<br />
[[File:Meta-oveview.PNG|200px|Metadata for multiple file systems.]] [[File:Fs-dashboard.png|200px|Dashboard for a lustre file system.]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1208Lustre Monitoring and Statistics Guide2014-11-06T19:08:00Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts (lost to the sands of time?) - these are simply run every minute via cron on the servers<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid, interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this is acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. <br />
* graphios<br />
* Grafana - http://grafana.org<br />
<br />
Screenshots of a few panels - <br />
<br />
[[File:Meta-overview.PNG]] <br />
<br />
<br />
[[File:Fs-dahsboard.png]]<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=File:Fs-dashboard.png&diff=1204File:Fs-dashboard.png2014-11-06T16:31:26Z<p>Scottn: Example dashboard showing details of a single Lustre files system.</p>
<hr />
<div>Example dashboard showing details of a single Lustre files system.</div>Scottnhttp://wiki.opensfs.org/index.php?title=File:Meta-oveview.PNG&diff=1203File:Meta-oveview.PNG2014-11-06T15:59:27Z<p>Scottn: Example view of multiple Lustre metadata servers.</p>
<hr />
<div>Example view of multiple Lustre metadata servers.</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1202Lustre Monitoring and Statistics Guide2014-11-06T15:57:29Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Some recent tools for working with metrics and time series data have made some of the more difficult parts of this task relatively easy, especially graphical presentation.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly. <br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
<br />
Graphite is a very convenient tool for storing, working with, and rendering graphs of time-series data. At SSEC we did a quick prototype for collecting and sending MDS and OSS data using perl. The choice of perl is not particularly important, python or the tool of your choice is fine.<br />
<br />
Software Used:<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* lustrestats scripts (lost to the sands of time?) - these are simply run every minute via cron on the servers<br />
* Grafana - http://grafana.org - this is a dashboard and graph editor for graphite. It is not required, as graphite can be used directly, but is very convenient. I allows for not just ease of creating dashboards, but also encoruages rapid, interactive analysis of the data. Note that elasticsearch can be used to store dashboards for grafana, but not required.<br />
<br />
==== check_mk and Graphite ====<br />
<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this is acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
<br />
* Graphite - http://graphite.readthedocs.org/en/latest/<br />
* Lustrestats.pm - perl module to parse different types of lustre stats, used by lustrestats scripts<br />
* OMD - check_mk, nagios, pnp4nagios<br />
* check_mk local scripts - these are called via check_mk, at whatever rate is desired. <br />
* graphios<br />
* Grafana - http://grafana.org<br />
<br />
Screenshots of a few panels - <br />
<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1201Lustre Monitoring and Statistics Guide2014-11-06T15:38:58Z<p>Scottn: /* check_mk and Graphite */</p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly.<br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use. <br />
<br />
<br />
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket.<br />
<br />
<br />
Lustrestats.pm - perl module to parse different types of lustre stats<br />
<br />
lustrestats scripts (lost to the sands of time?)<br />
<br />
<br />
<br />
==== check_mk and Graphite ====<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this is acceptable. Sampling much more frequently will of course make the error smaller.<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1200Lustre Monitoring and Statistics Guide2014-11-05T22:50:18Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly.<br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use. <br />
<br />
<br />
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket.<br />
<br />
<br />
Lustrestats.pm - perl module to parse different types of lustre stats<br />
<br />
lustrestats scripts (lost to the sands of time?)<br />
<br />
<br />
<br />
==== check_mk and Graphite ====<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute: <br />
<br />
<br />
[[File:Cmk-perl.PNG|400px]]<br />
<br />
For our uses at SSEC, this wasn't important. Sampling much more frequently will of course make the error smaller.<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1199Lustre Monitoring and Statistics Guide2014-11-05T22:42:57Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly.<br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use. <br />
<br />
<br />
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket.<br />
<br />
<br />
Lustrestats.pm - perl module to parse different types of lustre stats<br />
<br />
lustrestats scripts (lost to the sands of time?)<br />
<br />
<br />
<br />
==== check_mk and Graphite ====<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
Collecting via perl allowed us to send the timestamp from the Lustre stats (when they exist!) directly to Carbon, Graphite's data collection tool. When using the check_mk method this timestamp is lost, timestamps are then based on when the local agent check runs. This will introduce some inaccuracy - a delay of up to your sample rate. <br />
<br />
Collecting via both methods allows you to see this. This graph shows all the "export" stats summed for each method, with derivative applied to create a rate of change. "CMK" is the check_mk data and "timestamped" was from the perl script. Plotting the raw counter data of course shows very little, but with this derived data you can see the difference.<br />
<br />
This data was sampled once per minute:<br />
<br />
[[File:cmk-perl.png]]<br />
<br />
For our uses at SSEC, this wasn't important. Sampling much more frequently will of course make the error smaller.<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=File:Cmk-perl.PNG&diff=1198File:Cmk-perl.PNG2014-11-05T22:37:09Z<p>Scottn: Comparison of data collected via perl with Lustre timestamp vs check_mk without timestamp.</p>
<hr />
<div>Comparison of data collected via perl with Lustre timestamp vs check_mk without timestamp.</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1197Lustre Monitoring and Statistics Guide2014-11-05T22:35:03Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly.<br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use. <br />
<br />
<br />
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket.<br />
<br />
<br />
Lustrestats.pm - perl module to parse different types of lustre stats<br />
<br />
lustrestats scripts (lost to the sands of time?)<br />
<br />
<br />
<br />
==== check_mk and Graphite ====<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
<br />
<br />
show data timestamp delay<br />
<br />
<br />
<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1196Lustre Monitoring and Statistics Guide2014-11-05T22:27:55Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
Collectl supports Lustre stats. Note there have recently been some changes, Lustre support in collectl is moving to plugins: http://sourceforge.net/p/collectl/mailman/message/31992463 https://github.com/pcpiela/collectl-lustre<br />
<br />
This process is not based on the new versions, but they should work similarly.<br />
<br />
# collectl - does the '''gather''' by writing to a text file on the host being monitored<br />
# ganglia does the '''collect''' via gmond and python script 'collectl.py' and '''present''' via ganglia web pages - there is no alerting.<br />
<br />
See https://wiki.rocksclusters.org/wiki/index.php/Roy_Dragseth#Integrating_collectl_and_ganglia<br />
<br />
<br />
==== Perl and Graphite ====<br />
<br />
Graphite (http://graphite.readthedocs.org/en/latest/) is a very convenient tool for storing, working with, and rendering graphs of time-series data. It is a fairly quick and easy tool for at least prototyping and even for some production use. <br />
<br />
<br />
At SSEC we did a quick prototype for MDS and OSS data using perl and simply sending data to a socket.<br />
<br />
<br />
Lustrestats.pm - perl module to parse different types of lustre stats<br />
<br />
lustrestats scripts (lost to the sands of time?)<br />
<br />
<br />
<br />
==== check_mk and Graphite ====<br />
Instead of directly sending with perl, use check_mk<br />
local agent and pnp4nagios mean a reasonable infrastructure already therealerting simple<br />
graphios plugin<br />
show data timestamp delay<br />
<br />
<br />
<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
Brock Palen discusses this method: http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1195Lustre Monitoring and Statistics Guide2014-11-05T20:47:51Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT - provides 'top' style monitoring of server nodes, and historical data via mysql. https://github.com/chaos/lmt<br />
*lltop and xltop - monitoring with batch scheduler integration. Newer Lustre versions with jobstats likely provide similar data very conveniently, but these are still very good for examples of working with monitoring data. https://github.com/jhammond/lltop https://github.com/jhammond/xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps and techniques for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
Here are details of some solutions tested or in use:<br />
<br />
==== Collectl and Ganglia ====<br />
<br />
note recent changes<br />
<br />
==== Logstash, python, and Graphite ====<br />
<br />
logstash as collector - Brock Palen http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
==== Perl and Graphite ====<br />
<br />
==== check_mk and Graphite ====<br />
<br />
==== NCI project ====<br />
<br />
'''Note: I don't know if this will have source available.'''<br />
http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1194Lustre Monitoring and Statistics Guide2014-11-05T20:35:13Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT<br />
*lltop and xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps for working with the Lustre statistics. <br />
# '''Gather''' the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# '''Collect''' the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# '''Process''' the data - this may be optional or minimal.<br />
# '''Alert''' on the data - optional but often useful.<br />
# '''Present''' the data - allow for visualization, analysis, etc.<br />
<br />
<br />
collectl / ganglia<br />
-note recent changes<br />
<br />
NCI mysterious project<br />
<br />
Analyze, Visualize, Present data<br />
<br />
logstash as collector - Brock Palen http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=User:Scottn&diff=1193User:Scottn2014-11-05T20:24:28Z<p>Scottn: </p>
<hr />
<div>Scott Nolin<br />
* Head of Technical Computing Group at the Space Science and Engineer Center, University of Wisconsin.<br />
<br />
<br />
<br />
[[Lustre Statistics Guide]]</div>Scottnhttp://wiki.opensfs.org/index.php?title=User:Scottn&diff=1192User:Scottn2014-11-05T20:24:02Z<p>Scottn: </p>
<hr />
<div>Scott Nolin<br />
* Head of Technical Computing Group at the Space Science and Engineer Center,<br />
University of Wisconsin.<br />
<br />
[[Lustre Statistics Guide]]</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1191Lustre Monitoring and Statistics Guide2014-11-05T20:17:05Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Working With the Data ==<br />
<br />
Packages, tools, and techniques for working with Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT<br />
*lltop and xltop<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Here are some basic steps for working with the Lustre statistics. <br />
# *Gather* the data on hosts you are monitoring. Deal with the syntax, extract what you want<br />
# *Collect* the data centrally - either pull or push it to your server, or collection of monitoring servers.<br />
# *Process* the data - this may be optional or minimal.<br />
# *Alert* on the data - optional but often useful.<br />
# *Present* the data - allow for visualization, analysis, etc.<br />
<br />
collectl / ganglia<br />
-note recent changes<br />
<br />
NCI mysterious project<br />
<br />
Analyze, Visualize, Present data<br />
<br />
logstash as collector - Brock Palen http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1190Lustre Monitoring and Statistics Guide2014-11-05T20:08:19Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics and methods.<br />
<br />
This does not include Lustre log analysis.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<p><br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Packages, Tools, and Techniques ==<br />
<br />
Ways to collect and analyzing Lustre statistics.<br />
<br />
=== Open Source Monitoring Packages ===<br />
<br />
*LMT<br />
*lltop and xltop<br />
*collectl - note the recent changes<br />
<br />
=== Commercial Monitoring Packages ===<br />
<br />
* Terascala 'teraos'<br />
* DDN datablarker<br />
* Intel Enterprise Edition for Linux Managerator<br />
<br />
=== Build it Yourself ===<br />
<br />
Deal with syntax<br />
<br />
Collect / Send data<br />
<br />
Analyze, Visualize, Present data<br />
<br />
logstash as collector - Brock Palen http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html<br />
<br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1189Lustre Monitoring and Statistics Guide2014-11-05T19:56:12Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on Lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br><br />
The presumed audience for this is system administrators attempting to better understand and monitor their Lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with Lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer Lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for Lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== Tools and Techniques ==<br />
<br />
<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1188Lustre Monitoring and Statistics Guide2014-11-05T18:25:12Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf<br />
* Christopher Morrone, "LMT Lustre Monitoring Tools", LUG2011. http://cdn.opensfs.org/wp-content/uploads/2012/12/400-430_Chris_Morrone_LMT_v2.pdf<br />
<br />
* https://github.com/jhammond/lltop<br />
* https://github.com/chaos/lmt<br />
* https://github.com/chaos/cerebro<br />
* http://graphite.readthedocs.org/en/latest/<br />
* https://mathias-kettner.de/check_mk<br />
* https://github.com/shawn-sterling/graphios</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1187Lustre Monitoring and Statistics Guide2014-11-05T17:12:31Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}<br />
<br />
== References and Links ==<br />
<br />
<br />
* Daniel Kobras, "Lustre - Finding the Lustre Filesystem Bottleneck", LAD2012. http://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf<br />
* Florent Thery, "Centralized Lustre Monitoring on Bull Platforms", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/11_Florent_Thery_LAD2013-lustre-bull-monitoring.pdf<br />
* Daniel Rodwell and Patrick Fitzhenry, "Fine-Grained File System Monitoring with Lustre Jobstat", LUG2014. http://www.opensfs.org/wp-content/uploads/2014/04/D3_S31_FineGrainedFileSystemMonitoringwithLustreJobstat.pdf <br />
* Gabriele Paciucci and Andrew Uselton, "Monitoring the Lustre* file system to maintain optimal performance", LAD2013. http://www.eofs.eu/fileadmin/lad2013/slides/15_Gabriele_Paciucci_LAD13_Monitoring_05.pdf</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1186Lustre Monitoring and Statistics Guide2014-11-05T16:53:26Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. For example, you will noticed these are mostly server stats. There are a wealth of client stats too not detailed here. Additions or corrections are welcome.<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR'. Or you can just get 'CR' from the stats file I assume.<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1185Lustre Monitoring and Statistics Guide2014-11-05T16:47:56Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.stats || stats || lustre distributed lock manager (ldlm) stats. I do not fully understand all of these stats. It also appears that these same stats are duplicated a single stats. Perhaps this is just a convenience.<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR' <br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1184Lustre Monitoring and Statistics Guide2014-11-05T16:44:13Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.lock_count || single || lustre distributed lock manager (ldlm) locks<br />
|-<br />
| Tip - if you want to collect all the grant information below, you can use use ldlm.*.filter-*.pool.grant*<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.granted || single || lustre distributed lock manager (ldlm) granted locks - normally this matches lock_count. I am not sure of what the differences are, or what it means when they don't match.<br />
|- | OSS || ldlm.namespaces.filter-*.pool.grant_plan || single || ldlm lock planned number of granted locks (see 'glossary' in http://wiki.lustre.org/doxygen/HEAD/api/html/ldlm__pool_8c_source.html)<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_rate || single || ldlm lock grant rate aka 'GR'<br />
|-<br />
| OSS || ldlm.namespaces.filter-*.pool.grant_speed || single || ldlm lock grant speed = grant_rate - cancel_rate. You can use this to derive cancel_rate 'CR' <br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1183Lustre Monitoring and Statistics Guide2014-11-05T16:10:50Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1182Lustre Monitoring and Statistics Guide2014-11-05T16:09:09Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*MDT*.num_exports || single || number of exports per MDT - these are clients, including other lustre servers <br />
| -<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1181Lustre Monitoring and Statistics Guide2014-11-05T16:07:05Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*@*.stats || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*@*.stats || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-*.*MDT*.filesfree or filestotal || single || available or total inodes<br />
|-<br />
| MDS || osd-*.*MDT*.kbytesfree or kbytestotal || single || available or total disk space<br />
|-<br />
| OSS || obdfilter.*OST*.kbytesfree or kbytestotal, filesfree, filestotal || single || inodes and disk space as in MDS version<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1177Lustre Monitoring and Statistics Guide2014-11-04T23:07:30Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Scripts to Parse Data Formats ==<br />
<br />
Here are some example perl modules to help parse the various data formats. Better, faster, stronger scripts and methods are welcome.<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*.stats (SCOTT, VERIFY THAT..) || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*.stats (VERIFY THAT TOO...) || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.filesfree or filestotal || single || *LDISKFS ONLY* available or total inodes<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytesfree or kbytestotal || single || *LDISKFS ONLY* available or total disk space<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytestotal || single || *LDISKFS ONLY* total disk space<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1176Lustre Monitoring and Statistics Guide2014-11-04T23:05:11Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== Interesting Statistics Files ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*.stats (SCOTT, VERIFY THAT..) || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*.stats (VERIFY THAT TOO...) || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.filesfree or filestotal || single || *LDISKFS ONLY* available or total inodes<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytesfree or kbytestotal || single || *LDISKFS ONLY* available or total disk space<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytestotal || single || *LDISKFS ONLY* total disk space<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1175Lustre Monitoring and Statistics Guide2014-11-04T23:03:55Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== List of stats ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*.stats (SCOTT, VERIFY THAT..) || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*OST*.exports.*.stats (VERIFY THAT TOO...) || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.filesfree or filestotal || single || *LDISKFS ONLY* available or total inodes<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytesfree or kbytestotal || single || *LDISKFS ONLY* available or total disk space<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytestotal || single || *LDISKFS ONLY* total disk space<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1174Lustre Monitoring and Statistics Guide2014-11-04T22:59:55Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== List of stats ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*.stats (SCOTT, VERIFY THAT..) || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| OSS || obdfilter.*.stats || stats || Operations per OST. Read and write data is particularly interesting<br />
|-<br />
| OSS || obdfilter.*-OST*.exports.*.stats (VERIFY THAT TOO...) || stats || per-export OSS statistics<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.(filesfree|filestotal) || single || *LDISKFS ONLY* available or total inodes<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.{kbytesfree|kbytestotal} || single || *LDISKFS ONLY* available or total disk space<br />
|-<br />
| MDS || osd-ldiskfs.*MDT*.kbytestotal || single || *LDISKFS ONLY* total disk space<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1173Lustre Monitoring and Statistics Guide2014-11-04T22:52:37Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems. <br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there is any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
It is useful to know the various formats of these files so you can parse the data and collect for use in other tools. <br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=<br />
snapshot_time 1409777887.590578 secs.usecs<br />
read_bytes 27846475 samples [bytes] 4096 1048576 14421705314304<br />
write_bytes 16230483 samples [bytes] 1 1048576 14761109479164<br />
get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===<br />
<br />
These really boil down to just a single number in a file. But if you use "lctl get_param" you get an output that is nice for parsing. For example: <br />
<pre>[COMMAND LINE]# lctl get_param osd-ldiskfs.*OST*.kbytesavail<br />
<br />
<br />
osd-ldiskfs.scratch-OST0000.kbytesavail=10563714384<br />
osd-ldiskfs.scratch-OST0001.kbytesavail=10457322540<br />
osd-ldiskfs.scratch-OST0002.kbytesavail=10585374532<br />
</pre><br />
<br />
=== Histogram ===<br />
<br />
Some stats are histograms, these types aren't covered here. Typically they're useful on their own without further parsing(?)<br />
<br />
<br />
* brw_stats<br />
* extent_stats<br />
<br />
== List of stats ==<br />
<br />
This is a collection of various stats files that I have found useful. It is *not* complete or exhaustive. Additions or corrections are welcome.<br />
<br />
host type, target, format, discussion<br />
<br />
* Host Type = MDS, OSS, client<br />
* Target = "lctl get_param target"<br />
* Format = data format discussed above<br />
<br />
{| class="wikitable"<br />
|-<br />
!Host Type !! Target !! Format !! Discussion<br />
|-<br />
| MDS || mdt.*.job_stats || jobstats || Metadata jobstats. Note that with lustre DNE you may have more than one MDT, so even if you don't it may be wise to design any tools with that assumption.<br />
|-<br />
| OSS || obdfilter.*.job_stats || jobstats || the per OST jobstats. <br />
|-<br />
| MDS || mdt.*.md_stats || stats || Overall metadata stats per MDT<br />
|-<br />
| MDS || mdt.*MDT*.exports.*.stats (SCOTT, VERIFY THAT..) || stats || Per-export metadata stats. Exports are clients, this also includes other lustre servers. The exports are named by interfaces, which can be unweildy. See "lltop" for an example of a script that used this data well. The sum of all the export stats should provide the same data as md_stats, but it is still very convenient to have md_stats, "ltop" uses them for example.<br />
|-<br />
| Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example<br />
|-<br />
| Example || Example || Example || Example<br />
|}</div>Scottnhttp://wiki.opensfs.org/index.php?title=Lustre_Monitoring_and_Statistics_Guide&diff=1172Lustre Monitoring and Statistics Guide2014-11-04T22:28:53Z<p>Scottn: </p>
<hr />
<div>== DRAFT IN PROGRESS ==<br />
<br />
<br />
== Introduction ==<br />
<br />
There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.<br />
<br />
The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems.<br />
<br />
== Lustre Versions ==<br />
<br />
This information is based on working mostly with lustre 2.4 and 2.5.<br />
<br />
== Reading /proc vs lctl ==<br />
<br />
'cat /proc/fs/lustre...' vs 'lctl get_param'<br />
With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax. <br />
<br />
== Data Formats ==<br />
Format of the various statistics type files varies (and I'm not sure if there's any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.<br />
<br />
=== Stats ===<br />
<br />
What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data. <br />
<br />
Example:<br />
<pre><br />
obdfilter.scratch-OST0001.stats=snapshot_time 1409777887.590578 secs.usecsread_bytes 27846475 samples [bytes] 4096 1048576 14421705314304write_bytes 16230483 samples [bytes] 1 1048576 14761109479164get_info 3735777 samples [reqs]<br />
</pre><br />
<br />
snapshot_time = when the stats were written<br />
<br />
For read_bytes and write_bytes:<br />
First number = number of times (samples) the OST has handled a read or write. <br />
Second number = the minimum read/write size<br />
Third number = maximum read/write size<br />
Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.<br />
<br />
=== Jobstats ===<br />
<br />
Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data. <br />
<br />
Example: <br />
<pre><br />
obdfilter.scratch-OST0000.job_stats=job_stats:<br />
- job_id: 56744<br />
snapshot_time: 1409778251<br />
read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }<br />
write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }<br />
setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }<br />
- job_id: . . . ETC<br />
</pre><br />
<br />
Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?<br />
<br />
=== Single ===</div>Scottn