|
|
| (40 intermediate revisions by one other user not shown) |
| Line 1: |
Line 1: |
| == DRAFT IN PROGRESS ==
| | This content has moved to the [http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide lustre.org Wiki]. |
| | |
| | |
| == Introduction ==
| |
| | |
| There are a variety of useful statistics and counters available on lustre servers and clients. This is an attempt to detail some of these statistics.
| |
| | |
| The presumed audience for this is system administrators attempting to better understand and monitor their lustre file systems.
| |
| | |
| == Lustre Versions ==
| |
| | |
| This information is based on working mostly with lustre 2.4 and 2.5.
| |
| | |
| == Reading /proc vs lctl ==
| |
| | |
| 'cat /proc/fs/lustre...' vs 'lctl get_param'
| |
| With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax.
| |
| | |
| == Data Formats ==
| |
| Format of the various statistics type files varies (and I'm not sure if there's any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.
| |
| | |
| === Stats ===
| |
| | |
| What I consider a "standard" stats files include for example each OST or MDT as a multi-line record, and then just the data.
| |
| | |
| Example:
| |
| <pre>
| |
| obdfilter.scratch-OST0001.stats=snapshot_time 1409777887.590578 secs.usecsread_bytes 27846475 samples [bytes] 4096 1048576 14421705314304write_bytes 16230483 samples [bytes] 1 1048576 14761109479164get_info 3735777 samples [reqs]
| |
| </pre>
| |
| | |
| snapshot_time = when the stats were written
| |
| | |
| For read_bytes and write_bytes:
| |
| First number = number of times (samples) the OST has handled a read or write.
| |
| Second number = the minimum read/write size
| |
| Third number = maximum read/write size
| |
| Fourth = sum of all the read/write requests in bytes, the quantity of data read/written.
| |
| | |
| === Jobstats ===
| |
| | |
| Jobstats are slightly more complex multi-line records. Each OST or MDT also has an entry for each jobid (or procname_uid perhaps), and then the data.
| |
| | |
| Example:
| |
| <pre>
| |
| obdfilter.scratch-OST0000.job_stats=job_stats:
| |
| - job_id: 56744
| |
| snapshot_time: 1409778251
| |
| read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 }
| |
| write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 }
| |
| setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs }
| |
| - job_id: . . . ETC
| |
| </pre>
| |
| | |
| Notice this is very similar to 'stats' above. But there's a lot of extra: { bling: }! Why? Just because it got coded that way?
| |
| | |
| === Single ===
| |
This content has moved to the lustre.org Wiki.