SC14 BWG face-to-face meeting
- Sarp Oral (ORNL)
- Andrew Uselton (Intel)
- Nathan Rutman (Seagate)
- Colin Faber (Seagate)
- Ben Evans (Terascala)
- Aurelien Degremont (CEA)
- Alan Wild (ExxonMobil)
- Andreas Dilger (Intel)
- John Carrier (Intel)
- Bob Kierski (Cray) (on the phone)
Andrew's report to the working group
- Sarp has asked me to assist with the OpenSFS Benchmarking Working Group.
- The OpenSFS board has empowered BWG to assist the open paralel file systems community with identifying and understanding file system performance, in all its complexity. The audience is novices, on the one hand, who need guidance in how to get started as well as the very sophisticated (sites and vendors) who want an agreed upon methodology for establishing performance results.
- The BWG feels that we've lost a little momentum, and we are examining how we can re-energize our efforts. From the beginning we identified several areas of examination that BWG could pursue. These included:
- hero runs (guidance on methodology)
- application I/O kernels
- metadata performance (guidance on methodology)
- workload characterization
- Over the last several years we've achieved some small success in two of these, and the other three have shown very little concrete results in the form of content on the OpenSFS web site. So how can we address this?
- Additional top-down pressure from the chairs?
- Reduce the scope of active efforts, so we pursue only one or two goals instead of five?
- Seek formal management endorsement of active participation by collaborators?
- Disband the working group?
I. The BWG Mission
The goal of the BWG is an educational one. We seek to empower the community to understand and evaluate the performance of scalable file systems. For file system novices and others new to performance evaluation activities the first point of understanding is in knowing what file systems do and how their performance can be measured. For the more advanced members of the community there is a need for resources helping to establish standards for best practices in support of performance analysis. In the realm of understanding performance there are several activities that are of interest:
- The first is to establish the nature and construction of file systems generally and of a given file system in particular.
- This will require knowing what sorts of architectural details are important, eg. how many disk drives, servers, and client nodes there are, as well as the networking and other interconnection details.
- Similar detail about the software running on the system is important.
- There are configuration settings and choices, available only to system staff, that need to be documented. There are also some configuration details that are under the control of the user.
For evaluating performance there are additional important activities:
- Identify the important metrics for file system performance.
- Identify the tools for measuring file systems to establish the value of a given metric in a given environment.
- There are many dimensions to the space of all possible performance tests, and some tools will have execution parameters that vary what sort of test gets run. Thus, it is important to identify all the possible ways that a given tool can be invoked. For each tool this establishes a parameter space.
- The values for a given metric that result from a series of runs of a given tool in a given file system with a variety of parameters selected from its parameter space then defines a function on that parameter space. Call this a parameter space survey.
- A theoretical model, based on vendor documentation and/or prior experience, will establish an expectation for the results of a particular parameter space survey. The difference (if any) between the expected and actual results is a subject that would need to be investigated and explained, perhaps leading to an improved (or at least modified) model, on the one hand, or modified settings to improve performance, on the other.
- Conversely, in the absence of an a priori expectation of the results, one can use the results of a survey to construct a model of the file system's behavior.
- Certain selectively chosen points in parameter space can be used to establish standard measures for the file system as a whole. For example large, sequential I/Os in sufficient volume to negate the effect of cache anywhere in the I/O path, can be used to establish the asymptotic limit of performance at scale: the so called "hero number". With some additional constraints on how such a test is performed, this can be a single representation of the file system's theoretical bandwidth. Such a number is widely used and quoted in the industry, so it is good to understand what the number means, how it is establish, what some of the pitfalls of such results are, and how such results can be abused if the tests establishing then are not properly constrained.
Tools for file system testing include:
- Synthetic benchmarks and "real world" applications
- Tools primarily aimed at bulk data transport versus tools for testing metadata performance.
- Tools that monitor and report the behavior of the file system during an experiment (independent of generating the load on the file system).
The BWG has been addressing the above with the five areas of activity listed above. We have conducted one survey and published its results and are now sending out a follow up survey. We published a "Zero to Hero" guide that addresses some of the concerns of a novice embarking on file system benchmarking.
Because of the wide range of our initial efforts we found that there was often too little energy in any one to make a significant contribution. For that reason we have decided to change how BWG goes about conducting these activities. Henceforth, the BWG bi-weekly conference call will address a specific issue of interest, and the participants in the call will work together to push that one topic along during the call (as opposed to only reporting on the progress of efforts conducted off-line).
Our initial focus will be to construct and explain a list of all the various elements that contribute to the architecture of a (some) file system along with the possible parameters controlling the behavior of a synthetic benchmark carrying out experiments on the bulk transport of data. This list will include items, definitions, and explanations. Some effort may also be included to relate the various elements to one another.
II. Reporting and Vetting Performance Results
A second topic of discussion was about gathering the results of testing activities conducted by people, presumably those who have used our resources to assist them in conducting their tests. Gathering such results might be of value to the community for several reasons.
- Well vetted experimental results from the community can act as guidance to those new to the process of designing and testing file systems. Just having a list of examples of the architectures of some previously implemented file system could go a long way towards simplifying the design of a new system.
- The vetting process itself can assist ongoing experimental work when those with more experience are in a position to give timely advice to the experimenters.
- Having a demonstrably useful repository of such results can enliven and encourage the community to contribute even more results raising enthusiasm about the role and importance of file systems in the design of a whole solution for HPC system procurement.
There is also a significant risk to such an activity. Having such a repository of results could be construed (intentionally or not) as a file system equivalent of a Top 500 List, with all its attendant flaws and concerns. In particular, OpenSFS is keen to keep its role entirely vendor-neutral. A list or other forum that becomes somehow competitive could undermine that appearance of neutrality.
Notes from the community events:
- Cray has some at-scale tests that Cory Spitzer knows about.
- Sarp might be able to identify a few that ORNL would be willing to export.
There are a couple of workshops coming:
- Lustre Developers Gathering at LLNL in January
- There will be a Lustre Ecosystem Workshop in Maryland March 3rd and 4th