BWG File System Monitoring

The task of the BWG File System Monitoring group is to:
 * 1) develop a list of existing parallel filesystem monitoring tools.
 * 2) identify their capabilities, any any others we think should exist.
 * 3) compare and contrast the tools to each other and the capabilities we think should exist.

Scenario: You've just deployed a copy of Spider II (http://www.hpcwire.com/hpcwire/2013-08-16/spider_ii_emerges_to_give_ornl_a_big_speed_boost.html) at your site. Now you need to instrument it a) to detect if a component has failed, b) to ensure you are meeting your target performance numbers, and c) to determine what future improvements can be made.


 * 1) Where do you start?
 * 2) What information do you think you should collect?
 * 3) What tools/utilities/commands do you reach for?

The Features List - what to monitor.

The Tools List - how to monitor.

An IO Process Model to get an idea of the space of variables that are usefully monitored.

Status-ey Stuff
 * LUG 2013 Report (final ppt).
 * SC13 Report (in development).

Return to Benchmarking Working Group page.