Difference between revisions of "BWG File System Monitoring"

From OpenSFS
Jump to: navigation, search
Line 5: Line 5:
  
  
[[BWG_FSM_Features_List|The Features List]] so far.
 
  
[[BWG_FSM_Tool_List|The Tools List]] so far.
+
Scenario: <br>
 +
You've just deployed a copy of Spider II (http://www.hpcwire.com/hpcwire/2013-08-16/spider_ii_emerges_to_give_ornl_a_big_speed_boost.html) at your site.  Now you need to instrument it a) to detect if a component has failed, b) to ensure you are meeting your target performance numbers, and c) to determine what future improvements can be made.
 +
 
 +
 
 +
# Where do you start?
 +
# What information do you think you should collect?
 +
# What tools/utilities/commands do you reach for?
 +
 
 +
 
 +
 
 +
[[BWG_FSM_Features_List|The Features List]] - what to monitor.
 +
 
 +
[[BWG_FSM_Tool_List|The Tools List]] - how to monitor.
  
 
[[BWG_FSM_IO_Process_Model|An IO Process Model]] to get an idea of the space of variables that are usefully monitored.
 
[[BWG_FSM_IO_Process_Model|An IO Process Model]] to get an idea of the space of variables that are usefully monitored.

Revision as of 09:19, 20 September 2013

The task of the BWG File System Monitoring group is to:

  1. develop a list of existing parallel filesystem monitoring tools.
  2. identify their capabilities, any any others we think should exist.
  3. compare and contrast the tools to each other and the capabilities we think should exist.


Scenario:
You've just deployed a copy of Spider II (http://www.hpcwire.com/hpcwire/2013-08-16/spider_ii_emerges_to_give_ornl_a_big_speed_boost.html) at your site. Now you need to instrument it a) to detect if a component has failed, b) to ensure you are meeting your target performance numbers, and c) to determine what future improvements can be made.


  1. Where do you start?
  2. What information do you think you should collect?
  3. What tools/utilities/commands do you reach for?


The Features List - what to monitor.

The Tools List - how to monitor.

An IO Process Model to get an idea of the space of variables that are usefully monitored.


Status-ey Stuff


Return to Benchmarking Working Group page.