Test framework requirements
This page is intended as the collection point for people to record their requirements of a new test-framework environment. The new test framework environment is intended to support Lustre today as well as into the future with exascale systems and so the entries on this page should encapsulate all that will be needed to test Lustre for the foreseeable exascale future. The current plan is that we should design the best possible new test framework environment and then assess if a transition from the current framework is possible, but we should not hinder the new framework environment in order to make the transition possible. Please add your requirements to the table below including child pages where you suggestion is a document or sizeable entry. We want to capture all the valuable input of people from ideas, to language requirements, high level architectural ideas, references to other test methods/language/environments that might be useful...
|CG||Make test scalable|| Historically test framework tests tend to address single clients against single oss and single ost, extra effort is required to create scaling tests - this behaviour is the inverse of what's required. A simple test should scale to 100,000+ clients, against 1000+ servers with no effort on behalf of the test writer.
The framework environment must cause scalable tests to be the natural way to write tests. The shortest and simplest test should be scalable, with singular non-scalable being more time consuming.
|NR||Parallel test execution||Related to the above, tests and the framework must both be able to run tests in parallel on multiple clients. Sadly, many tests today currently assume they are the only test running (e.g. the obd_fail_loc mechanism is not sophisticated enough to handle parallel tests).|
|CG||Be able to create safe live system test sets.||Test need to be organised on a one test per file basis. This will lead easier to read code but more importantly enable a set of tests to be created that are guaranteed to be safe for a live system. The problem with running tests by filter is that people make mistakes if it's possible to assemble an install of 'safe' tests then we are removing the live ammunition from the situation.|
|NR||Provide user control over safety||Individual administrators should be able to put limitations on what the test system is allowed to do. Some sites may allow recovery tests, some may not: we need a flexible control here. I suggested previously the idea of a permissions mask that explicitly grants permission for tests to perform various operations:
Every test that performs one of these actions should do so via an API call, which will verify the permissions given to the testing run before executing the operation.
|CG||Remove all test relationships|| A significant issue with fully exploiting the test-framework is that tests have interdependences with one test being required before another test if that later is to succeed.
A knock on effect is that setup / teardown of test requirements is often done by the first or last test in a sequence.
The requirement is that each test can be successfully run without a requirement for any other test to execute. This has the following knock on effects;
REQ-1: Identify all dependent tests and create a change request ticket for each test or set of tests. This work may have already been carried out by Xyratex and so the requirement is to create a change request ticket for those found.
REQ-2: Create a general mechanism that enables setup/teardown code to be separated from the tests and called in an efficient way. Submit this mechanism for review before implementing on each set of dependent tests.
REQ-3: Implement changes for each set of changes described by the ticket in req-1, where applicable apply the mechanism created in req-2 to deal with each setup/teardown requirement. Submit each change for review and landing, during the landing process ensure that extra testing is carried out to validate the modified tests are operating correctly.
|CG||Add Permissions/Run-Filter Capability|| The ability to run tests based on a class of requirement would allow targeted testing to take place more readily that today’s manual selection process using ONLY or EXCEPT, instead filtering by requirements such as file-system-safe, failover-safe…
To make this possible the run_test function must be updated to allow an AND/OR style mask to be applied. This mask could be bitwise, keyword or key/value pair match. It should not be a trigger for functionality, i.e. if bit x set then format the file-system before test, it should be a simple way of preventing a test from running.
Suitable and likely masks/filters are:
• File system safe. This would indicate that the test can be run on a production file system.
• Failover-unsafe. By default all tests should be failover safe, this flag allows a test to be specified explicitly as unsafe.• Metadata (see spec) key:value present
• Metadata (see spec) key:value not present
The method used should be extensible so that additional flags can be added with minimal overhead.In the above example the first two are examples of some attributes that might well be hardcode with the test, the last two attributes that would be a managed resource that appends additional information to the test.
It must be the case that the filter is applied not by the called by the test but by the test itself, or framework calling the test, this is so that to override for example file system safe the actual test must be changed.
As each test will need to be modified a bulk tool for changing all tests to the default state is probably worth developing.
REQ-1: Design, implement and have reviewed the mechanism for ensuring that the mask/filter is applied correctly for each test in a failsafe manner that allows for example the use of the test framework on a live file system.
REQ-2: Develop a bulk change tool that allows all tests to be set by default to the failsafe mode. This may require no changes or may require a standard piece of code to be inserted into each test procedure.
REQ-3: Implement changes for each set of changes described by the ticket in req-1, where applicable apply the mechanism created in req-2 to deal with each setup/teardown requirement.
REQ-4: Update the obvious candidates for the major masks such as file system safe, fail over unsafe, it may be required for the majority to be updated ad-hoc moving forwards.
|CG||Metadata Storage & Format|| To increase the knowledge of the individual tests we need to be able to store structured information about each test, this information is the test metadata.
This data needs to be stored as part of the source code, so that it travels with the source as it continues its evolution.
This metadata will record attributes of the test that are completely independent of the hardware that the test runs on.
Good metadata to store is;
Bad metadata to store is
What is required is a human readable and writeable format for storing this data.
This integration of the data with the source will make the data an integral and valued asset which will then be treated with the care and attention that it deserves.
The data should be stored in the source code as headers to the functions, in a way that can be read and written but third party applications. This method means that applications using the data can create a small library making the data storage medium invisible to the user.
I future for example with a new test framework the method might be something quite different.
An example of the methodology is doxygen which uses a formatted header to contain metadata about the function. A useful approach that should be examined is to use doxygen formatting, with something like the /verbatim or /xmlonly section being used to store the test metadata in yaml or xml. If this approach was chosen doxygen could be used for creating a test-manual whilst the yaml/xml in the /verbatim or /xmlonly section could be used to extract test-metadata to enable targeted testing.
The approach needed needs to be prototyped to enable people to review the likely look and feel of the commented section.
A possible header might look like this;
[Need an example here]
One issue that needs to be resolved is how this data is managed from a git/code review perspective. Each change will cause a git update unless some batching/branching takes place, if the tools are automatic do we need a full review process? These issues are not without complication and need to be addressed before automation is possible. This review/git issue means that the data must be long lived and likely to change only when the test itself changes, the data cannot and should not contain transient data like test results.
REQ-1: Design and document a header format that allows the flexible storage of test metadata. If possible include the ability to make use of a tool like doxygen.
REQ-2: Design and document the requirements for a library to allow automated read/write access of this data. to this Develop a bulk change tool that allows all tests to be set by default to the failsafe mode. This may require no changes or may require a standard piece of code to be inserted into each test procedure.
|CG||Metadata Storage Access Library|| A general purpose library is required that can be used to read/write that Metadata stored as part of the source code. This library should be callable from bash and other major languages such as Python, Perl, Ruby and C/C++, this might mean a simple language layer is required to enable each caller but the general design must allow varied callers.
The library should implement at a minimum the abilty to read, write and query for tests of specific attributes.
The library needs to offer the following capability;
Referring back to the Metadata Storage Format the library is going to need to be able cache writes to the metadata for extended periods of time so that updates can be batched and submitted as a single review. This change/review/merge process is going to need considerable thought, documentation and sign-off before it can be implemented.
REQ-1: Design, document and have reviewed an API that enables read, write and query
REQ-2: Investigate a widely language that allows this functionality to be made available as a library to a broad range of languages
REQ-3: Investigate, document and have reviewed a caching and update process for the changes made to the metadata, i.e. the process that occurs in the case of a write.
REQ-4: Implement and test the metadata library and include as part of the Lustre source tree.
|JG||Clearly expressible tests||That is, in looking at the test, it should be pretty obvious regarding "what the test is doing". This is similar to Nathan's "Add test metadata" section.|
|JG||Well specified system under test.||The bullet, "well specified system under test" is tied to the previous "introspection" bullet, but I think that it's easy to miss gathering a fairly detailed description of what is the system under test, making comparisons of similar systems more difficult. It would be more comprehensive than just the lustre specific information.|
|JG||A survey of existing test frameworks|| It's possible that there's a framework that provides 80% of the requirements, making the time to implement the solution much shorter. See frameworks comparsion : Exists_test_framework_evaluation
NR: I've tried to clarify a distinction between the Lustre Test Framework, described here, and a test automation framework, described there. I've renamed that page to automation framework evaluation and added some description.
|JG||Distributed infrastructure should be evaluated.||I think this issue should be a primary consideration, as it will effect everything else that's done with test framework. I may have the wrong perception, but it seemed as though the use of ssh was almost a given. This should be on the table for evaluation. As well as having a good test framework, it would be beneficial to have a good interactive tool for curiosity-driven research and analysis. ipython might fit that role. It's distributed infrastructure is based on zeromq, so that might be a good distributed layer to consider for the test framework too. For the project I'm working on, I'm looking at robotframework for the base of the test infrastructure and zeromq as the glue for distribute dwork. To this I would need to add a component that's work-load-management aware, eg, PBS, to put a load on the system from a Cray, or a cluster, and a driver to start multiple instances of robotframework, using different tests, in parallel.|
|CG||Communications Framework||Without doubt the new framework environment will use an implementation of MPI, Corba, Parallel Python, ICE etc. any implementation of the framework environment must isolate the tests from that communication layer. This 100% disconnect of the tests from the communication layer is so that the communication layer can be replaced as an need be.|
|CG||Language|| The language must be object orientated so that all functionality is of an object.action or object.value type notation. The language used to develop the framework need not be the same as that of the tests within the framework, although it is probably easier if the two are the same.
If a language can be found that allows development within an environment such as for example Eclipse or Textmate with tab completion of object members this would make the framework environment much more accessible to many. Careful choice of language will be important because the framework will be a big parallel application in its own right we will need to ensure that the language allows scaling; Memory and process issues (threads, process pinning) are easier to do with C, C++ and easier to investigate however a dynamic language like Ruby or Python is more likely to allow the creation of the easy to write and easy to read tests that are required.Language choice would also depend on availability of things suitable parallel frameworks like MPI, Corba or Parallel Python.
|NR||Inter-version testing||Currently, testing components are distributed among a number of nodes. If the Lustre versions don't match (typical between servers and clients), then testing components may not match. The test framework code should be developed independently and installed independently from any particular Lustre version. The framework should of course be aware of which versions of Lustre it is using and filter tests appropriately.|
|NR||Separate setup from test|| Current test-framework makes assumptions about the configuration and how it is allowed to interact with the configuration. Instead, the framework should be "hands-off" the configuration as much as possible (i.e. except where it is required for an individual test). The framework should be pointed at a Lustre system and gather all the information it needs.
A separate system can be used to create test systems if appropriate for particular users.
Output from both individual tests and from the framework should be easily understandable to both humans and machines. We should standardize on YAML formats for all test and framework output. Output should contain at least:
|NR||Framework Self Test||
The framework itself should be built with consistency and correctness unit tests, to verify the correct functionality of the test execution, measurement, and reporting functions. These should be executed independently from any Lustre features; ie. we can run test-framework-check without any Lustre installed.
|Current conditions of the configuration or filesystem state may exclude tests from executing correctly or usefully. These checks should be performed before trying to start a test, so that the output condition of a test is not “skip” (as is the case today), but rather a note in the test log saying the test was skipped due to failure of precondition X. The preconditions should be specified via a special section before each test, perhaps along with the test metadata.|
|NR||Automatic cleanup||The framework should enforce an API that allows for automatic cleanup of any test artifacts:
This should be called automatically when the test ends (for both pass and fail cases).
|NR||Constraint-based testing||To more fully characterize response in unusual circumstances, it would be useful for the framework to be able to control and vary some aspects of the environment. For example, if the framework were able to specify the amount of memory available to the kernel (e.g. when using a virtual machine), then OOM conditions and behavior could be easily tested. Similarly, limiting the number of MDS threads to (num_clients-1), running with a very high network latency, etc. would provide testing of code paths that are not normally tested.|
|NR||Enforceable API||The framework should have a clear, enforceable API for allowing interaction with the filesystem. Areas that should be covered in the API should include the following:
- CG = email@example.com
- JG = firstname.lastname@example.org
- NR = email@example.com