MDS PDirOps Implementation wiki version
Introduction
The following milestone completion document applies to Subproject 1.2 – Parallel Directory Operations subproject within the OpenSFS Lustre Development contract SFS-DEV-001 signed 7/30/2011.
Subproject Description
Per the contract, Implementation milestone is described as follows: “This subproject allows multiple RPC service threads to operate on a single directory without contending on a single lock protecting the underlying directory in the ldiskfs file system. Single directory performance is one of the most critical use cases for HPC workloads as many applications create a separate output file for each task in a job, requiring hundreds of thousands of files to be created in a single directory within a short window of time. Currently, both filename lookup and file system-modifying operations such as create and unlink are protected with a single lock for the whole directory.
This subproject will implement a parallel locking mechanism for single ldiskfs directories, allowing multiple threads to do lookup, create, and unlink operations in parallel. In order to avoid performance bottlenecks for very large directories, as the directory size increases, the number of concurrent locks possible on a single directory will also increase.”
Milestone Completion Criteria
Per the contract, Implementation milestone is described as follows: “Contractor shall complete implementation and unit testing for the approved solution. Contractor shall regularly report feature development progress including progress metrics at project meetings and engineers shall share interim unit testing results as they are available. OpenSFS at its discretion may request a code review. Completion of the implementation phase shall occur when the agreed to solution has been completed up to and including unit testing and this functionality can be demonstrated on a test cluster. Code Reviews shall include:
- Discussion led by Contractor engineer providing an overview of Lustre source code changes
- Review of any new unit test cases that were developed to test changes
Location of Subproject Code changes
Complete code is available at:
http://review.whamcloud.com/#change,375 Commit at which code completed Milestone review by Senior and Principal Engineer at:
Subproject Feature Confirmation
Multiple RPC service threads to operate on a single directory without contending on a single lock protecting the underlying directory in the ldiskfs file system Results from code runs was presented to the community at the OpenSFS Lustre Pavilion SC11. This presentation is available from the OpenSFS site and is included in Appendix 1.
The results included in Appendix 2 provide a detailed description of the completion of unit tests and benchmarks.
Conclusion
Implementation has been completed according to the agreed criteria. Appendix 1