Lustre Monitoring and Statistics Guide
Jump to navigation
Jump to search
DRAFT IN PROGRESS
Introduction
what this is about
Reading /proc vs lctl
'cat /proc/fs/lustre...' vs 'lctl get_param' With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax.
Data Formats
Format of the various statistics type files varies (and I'm not sure if there's any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.
jobstats Jobstats are multi-line records. Each OST or MDT then has an entry for each jobid (or hostname, or however we collect job stats). Example:
obdfilter.scratch-OST0000.job_stats=job_stats:- job_id: 56744 snapshot_time: 1409778251 read: { samples: 18722, unit: bytes, min: 4096, max: 1048576, sum: 17105657856 } write: { samples: 478, unit: bytes, min: 1238, max: 1048576, sum: 412545938 } setattr: { samples: 0, unit: reqs } punch: { samples: 95, unit: reqs } - job_id: .... etc