BWG File System Monitoring: Difference between revisions

From OpenSFS Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Benchmarking_Working_Group|Back to the main BWG page]]
'''Lead:''' Liam Forbes<br>
'''Members:''' Alan Wild, Andrew Uselton, Ben Evans, Cheng Shao, Jeff Garlough, Jeff Layton, Mark Nelson, Nic Henke
<br>
The task of the BWG File System Monitoring group is to:
The task of the BWG File System Monitoring group is to:
# develop a list of existing parallel filesystem monitoring tools.
# develop a list of existing parallel filesystem monitoring tools.
# identify their capabilities, any any others we think should exist.
# identify their capabilities, add any others we think should exist.
# compare and contrast the tools to each other and the capabilities we think should exist.
# compare and contrast the tools to each other and the capabilities we think should exist.


Line 7: Line 14:


Scenario: <br>
Scenario: <br>
You've just deployed a copy of Spider II (http://www.hpcwire.com/hpcwire/2013-08-16/spider_ii_emerges_to_give_ornl_a_big_speed_boost.html) at your site.  Now you need to instrument it a) to detect if a component has failed, b) to ensure you are meeting your target performance numbers, and c) to determine what future improvements can be made.
You've just deployed a copy of [http://www.hpcwire.com/hpcwire/2013-08-16/spider_ii_emerges_to_give_ornl_a_big_speed_boost.html Spider II] at your site.  Now you need to instrument it a) to detect if a component has failed, b) to ensure you are meeting your target performance numbers, and c) to determine what future improvements can be made.




Line 25: Line 32:
Status-ey Stuff
Status-ey Stuff
* [http://wiki.opensfs.org/images/e/ec/LUG_2013_OpenSFS_BWG_update.pdf LUG 2013 Report] (final ppt).
* [http://wiki.opensfs.org/images/e/ec/LUG_2013_OpenSFS_BWG_update.pdf LUG 2013 Report] (final ppt).
* [[BWG_FSM_Report_SC13|SC13 Report]] (in development).




Return to [[Benchmarking_Working_Group|Benchmarking Working Group]] page.
Return to [[Benchmarking_Working_Group|Benchmarking Working Group]] page.

Latest revision as of 11:55, 6 March 2015

Back to the main BWG page

Lead: Liam Forbes
Members: Alan Wild, Andrew Uselton, Ben Evans, Cheng Shao, Jeff Garlough, Jeff Layton, Mark Nelson, Nic Henke


The task of the BWG File System Monitoring group is to:

  1. develop a list of existing parallel filesystem monitoring tools.
  2. identify their capabilities, add any others we think should exist.
  3. compare and contrast the tools to each other and the capabilities we think should exist.


Scenario:
You've just deployed a copy of Spider II at your site. Now you need to instrument it a) to detect if a component has failed, b) to ensure you are meeting your target performance numbers, and c) to determine what future improvements can be made.


  1. Where do you start?
  2. What information do you think you should collect?
  3. What tools/utilities/commands do you reach for?


The Features List - what to monitor.

The Tools List - how to monitor.

An IO Process Model to get an idea of the space of variables that are usefully monitored.


Status-ey Stuff


Return to Benchmarking Working Group page.