MDS PDirOps Implementation wiki version

From OpenSFS
Jump to: navigation, search

Introduction

The following milestone completion document applies to Subproject 1.2 – Parallel Directory Operations subproject within the OpenSFS Lustre Development contract SFS-DEV-001 signed 7/30/2011.

Subproject Description

Per the contract, Implementation milestone is described as follows: “This subproject allows multiple RPC service threads to operate on a single directory without contending on a single lock protecting the underlying directory in the ldiskfs file system. Single directory performance is one of the most critical use cases for HPC workloads as many applications create a separate output file for each task in a job, requiring hundreds of thousands of files to be created in a single directory within a short window of time. Currently, both filename lookup and file system-modifying operations such as create and unlink are protected with a single lock for the whole directory.

This subproject will implement a parallel locking mechanism for single ldiskfs directories, allowing multiple threads to do lookup, create, and unlink operations in parallel. In order to avoid performance bottlenecks for very large directories, as the directory size increases, the number of concurrent locks possible on a single directory will also increase.”

Milestone Completion Criteria

Per the contract, Implementation milestone is described as follows: “Contractor shall complete implementation and unit testing for the approved solution. Contractor shall regularly report feature development progress including progress metrics at project meetings and engineers shall share interim unit testing results as they are available. OpenSFS at its discretion may request a code review. Completion of the implementation phase shall occur when the agreed to solution has been completed up to and including unit testing and this functionality can be demonstrated on a test cluster. Code Reviews shall include:

  1. Discussion led by Contractor engineer providing an overview of Lustre source code changes
  2. Review of any new unit test cases that were developed to test changes

Location of Subproject Code changes

Complete code is available at:

http://review.whamcloud.com/#change,375 Commit at which code completed Milestone review by Senior and Principal Engineer at:

http://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commit;h=19223651ed250966c0445c91dc91a5b9131dec35

Subproject Feature Confirmation

Multiple RPC service threads to operate on a single directory without contending on a single lock protecting the underlying directory in the ldiskfs file system Results from code runs was presented to the community at the OpenSFS Lustre Pavilion SC11. This presentation is available from the OpenSFS site and is included in Appendix 1.

The results included in Appendix 2 provide a detailed description of the completion of unit tests and benchmarks.

Conclusion

Implementation has been completed according to the agreed criteria. Appendix 1