Lustre Monitoring and Statistics Guide

From OpenSFS Wiki
Revision as of 13:56, 4 November 2014 by Scottn (talk | contribs) (Created page with "== DRAFT IN PROGRESS == == Introduction == what this is about == Reading /proc vs lctl == 'cat /proc/fs/lustre...' vs 'lctl get_param' With newer lustre versions, 'lctl ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DRAFT IN PROGRESS

Introduction

what this is about


Reading /proc vs lctl

'cat /proc/fs/lustre...' vs 'lctl get_param' With newer lustre versions, 'lctl get_pram' is the standard and recommended way to get these stats. This is to insure portability. I will use this method in all examples, a bonus is it can be often be a little shorter syntax.

Data Formats

Format of the various statistics type files varies (and I'm not sure if there's any reason for this). The format names here are entirely *my invention*, this isn't a standard for lustre or anything.

jobstats Jobstats are multi-line records. Each OST or MDT then has an entry for each jobid (or hostname, or however we collect job stats). Example:

obdfilter.scratch-OST0000.job_stats=job_stats:- job_id:          56744
  snapshot_time:   1409778251
  read:            { samples:       18722, unit: bytes, min:    4096, max: 1048576, sum:     17105657856 }
  write:           { samples:         478, unit: bytes, min:    1238, max: 1048576, sum:       412545938 }
  setattr:         { samples:           0, unit:  reqs }  punch:           { samples:          95, unit:  reqs }
- job_id: .... etc