Contract SFS-DEV-004

Overview
The Lustre client implementation for the IO path (called CLIO) is responsible for issuing RPC commands for reading and writing data to the OSTs. CLIO was reconstructed in Lustre 2.0 for cross-platform portability. The CLIO implementation is too complex for the current usage, thus making the code hard to understand and maintain. This project implements the tasks described during work for OpenSFS contract 003 (see below). This work includes:

cl_lock re-factoring
cl_lock is highly complex and difficult to maintain. As a result, enhancements to the client code are time consuming and a significant number of bugs have been traced to cl_lock portions of the code.

ioctl calls implementation
ioctl calls are inconsistently implemented in CLIO. By re-organizing these calls, the removal of the old OBD API becomes possible.

removal of obsolete OBD API call-backs
Remove unused code that misleads and confuses developers who are unfamiliar with Lustre client code.

removal of non-linux interfaces
Remove unused code that misleads and confuses developers who are unfamiliar with Lustre client code.

removal of strip md access beyond LOV layer
Remove code that does not observe public interfaces as it misleads and confuses developers who are unfamiliar with Lustre client code.

For the contract statement of work, see [[Media:SFS-DEV-004_SOW.pdf|SFS-DEV-004_SOW.pdf]]. The goal of the CLIO Simplification Implementation contract is the implementation in the Lustre source code of the CLIO Simplification Design that resulted from [[Media:CLIOSimplificationDesign_HighLevelDesign.pdf|Project 2]] of Contract SFS-DEV-003.

OpenSFS

 * Sarp Oral - OpenSFS Contract Administrator
 * Christopher Morrone - OpenSFS Technical Representative

Project Approval Committee (PAC)

 * Christopher Morrone - PAC Chair
 * Colin Faber
 * Patrick Farrell
 * Jason Hill
 * James Simmons
 * Cory Spitz

Intel

 * Richard Henwood - Project Manager
 * Andreas Dilger - Consulting Architect
 * Jinshan Xiong - Lead Engineer

Important Dates
The official start date of work is agreed to be October 13, 2014.

The contract lists milestone target dates in weeks relative to the start date. With the start date agreed, here we can just list actual dates to keep things easy to understand.

Meeting Minutes

 * SFS-DEV-004 Minutes 2014-11-06
 * SFS-DEV-004 Minutes 2014-12-12
 * SFS-DEV-004 Minutes 2014-01-28
 * SFS-DEV-004 Minutes 2015-02-26
 * SFS-DEV-004 Minutes 2015-03-26
 * SFS-DEV-004 Minutes 2015-05-28
 * SFS-DEV-004 Minutes 2015-06-25

cl_lock re-factoring (simplified and cache-less) DONE
LU-3259 cl_lock re-factoring The cl_lock is necessary because it communicates the DLM lock for a specific IO. The current implementation is highly complex. This work will write a simplified cl_lock. The new lock will be cache-less and replace the current implementation.

Removal of liblustre DONE
LU-2675 removal of liblustre

function calls implementation and cleanup obsolete OBD methods DONE
LU-5823 Replace some obsolete obd operations with CLIO ioctl interface OBD API operations for read, write, setattr, getattr, etc. became obsolete after MDT, OFD and client reconstructing were completed. This work removes these redundant operations. OBD API operations for read, write, setattr, getattr, etc because obsolete after MDT, OFD and client restructuring were completed. Redundant code remains in CLIO and interfaces that are not referenced by any module will be targeted for removal.

Remove lov_stripe_md (LSM) direct access beyond LOV layer DONE
LU-5814 encapsulate lov_stripe_md (LSM) to LOV layer The current CLIO implementation has a good interface to file layout operations. Legacy code still exists that does not use this interface. The code that does not use the file layout interface will be reviewed and targeted for removal or re-design to use the file layout interface.

NOTE: struct obd_info:oi_md cannot be removed now because of inter-dependencies between clean-up patches.

Remove non-linux interfaces DONE
Two parts

Remove some cfs_ prefixed functions. DONE
NOTE:
 * cfs_snprintf does have uses, not equivilent to snprintf, will remain.
 * cfs_hlist* are needed for Linux kernel compatibility and will remain.

Remove ccc_ layer DONE
LU-5971 removal of ccc_ layerWith the removal of liblustre, the ccc_ layer is redundant and complex. The remaining useful functions will be merged into vfs vm posix layer and the ccc_ layer will be removed.

Test and Fix Phase
For this phase, we will complete the following:


 * 1) Contractor demonstrates the code passing the complement of tests in Contractor's Autotest environment with the code applied to the Lustre Master tree.


 * 1) Contract demonstrates the code runs successfully at scale (typically completing a 48 hour SWL run on the Hyperion platform at Lawrence Livermore National Laboratory).


 * 1) Contractor executes performance regression testing identifying and addressing performance regressions related to the development of the revised code. This performance testing will be run on a system with at least 100 clients and will compare results of IOR, mdtest on builds before and after the implementation of the CLIO Simplification HLD. Degradation of more than 5% will be taken as a failure, but small drops will be accepted as within normal variation.

Introduction
The following milestone completion document applies to CLIO Simplification Project recorded in the OpenSFS Lustre Development contract SFS-DEV-004 agreed September 25, 2014. The CLIO Simplification code is functionally complete and recorded in the Implementation Milestone. Completion of this milestone requires the following tasks to be executed:


 * 1) Contractor demonstrates the code passing the complement of tests in Contractor's Autotest environment with the code applied to the Lustre Master tree.
 * 2) Contractor demonstrates the code runs successfully at scale (typically completing a 48 hour SWL run on the Hyperion platform at the Lawrence Livermore National Laboratory).
 * 3) Contractor executes performance regression testing, identifying and addressing performance regressions related to the development of the revised code. This performance testing will be run on a system with at least 100 clients and will compare results of IOR, mdtest on builds before and after the implementation of the CLIO Simplification High Level Design. Degradation of more than 5% will be taken as a failure, but small drops will be accepted as within normal variation.

NOTE: This task list includes agreed enhancements in item 3. They are: lnet-selftest has been omitted as redundant. Mdtest has been selected as a better alternative to mdsrate. Overview of CLIO Simplification. CLIO Simplification work was completed with six high-level tasks. These are:
 * cl_lock re-factoring (simplified and cache-less).
 * Liblustre removal.
 * Implement function calls and cleanup obsolete OBD methods.
 * Remove lov_stripe_md (LSM) direct access beyond LOV layer.
 * Remove cfs_ prefixed functions, where appropriate.
 * Remove ccc_ layer.

This work has been completed in the following patches:

Autotest results
The complete series of patches are recorded at http://review.whamcloud.com/13737/ and below. This series (patch set 3) successfully passed Autotest on March 27th. This result is recorded here:
 * review-zfs
 * review-dne-part-1
 * review-dne-part-2
 * review-ldiskfs

NOTE: since completing these tests, many unrelated patches have landed on master that have obligated a re-base of this patch series.

48hr SWL run on Hyperion
SWL completed a 48 hour run on Hyperion on March 12 with no observed issues. A summary of the test is below:

Summary

=
Start Time: Thu Mar 12 05:59:15 PDT 2015 Job Totals Passed:      14346 Failed:      0 Terminated:  64 Unknown:     0 Total:       14410 Failure Rate: 0.00% Run Times Wall Clock Run Time:      2253.22 hrs. Node Run Time:            14018.81 node-hrs. SWL Node Utilization:     145.18% SLURM Node Utilization:   n/a Excessive Run-time Variation Job Count: 0 Overall Job Coverage: 21.7% (138/636) Passed Job Coverage: 21.5% (137/636) End Time: Sat Mar 14 06:16:02 PDT 2015 SWL Run Time: 173807 sec. (48.28) hrs.)

Failure Mode Summary Mode  Count                       Description

= ========================================================== 129  3431    TBD

Failure Mode Breakdown Test   Mode   Count                       Description

==
======= ========================================================== IO       129   3431    TBD

Report generated on Sat Mar 14 06:50:43 PDT 2015

NOTE: SWL runs continuously. This test run was ended after 48 hours. Jobs that where running when the test run was completed are recorded as terminated in this summary.

Performance tests
This series of tests is designed to verify that the CLIO Simplification project has not negatively affected the performance of the Lustre filesystem. This test was executed on Hyperion. Hyperion runs with 16 threads per single-client tests, and 1600 IOR threads for 100-client tests. The baseline for performance was selected as Lustre 2.6.0. The build with CLIO Patches applied was created from http://review.hpdd.intel.com/13318/ (since merged into http://review.hpdd.intel.com/13714 for landing). Five consecutive tests were run for each metric. The mean of the five runs was computed. This mean is used to calculate percentage difference against the baseline. The complete result set is recorded in Appendix A. Observed CLIO Simplification performance that is slower than 5% of the baseline is presented in red. Guidelines for the reader:
 * CLIO Simplification patches have been landed into Master over the last 9 months. During this time, 988 patches have landed which may or may not be responsible for changes in performance.
 * Variability in performance computing is commonly observed during tests on Hyperion. It is not unknown to see a 10% variation between consecutive runs of the same code. The figure below illustrates variability in performance over 15 recent consecutive tags. NOTE: for two runs of 2.6.90 on different dates (2.6.90.1 and 2.6.90.2 on the figure below) show significant differences in read performance for otherwise identical Lustre versions.



Performance of 15 consecutive tags including 2.6.0 and 2.7.0 releases as well as more recent master tags. Read and write bandwidth of a single shared file with 100 clients is recorded. Significant variability can be observed between any two consecutive tags.

IOR, 100 Clients, Single Shared File (SSF)
OBSERVATIONS: Figure 1 shows the baseline of 2.6.0 (far left) has write performance that is above the typical performance for 15 recent tags. This unusually high value for the 2.6.0 baseline, means that more typical observations are apparently slow by comparison. Choosing a more common value for 100 client SSF read and write (i.e. tag 2.6.53.1) provides a better baseline value and shows performance within tolerance.

IOR, 100 Clients, File Per Process (FPP)
Read performs: 96% Write perform: 102% OBSERVATION: Both observations show CLIO Simplification performance within tolerance.

IOR, Single Client, File Per Process (FPP)
OBSERVATIONS: Write performance was below tolerance during our run on Hyperion on this test. This apparent slow-down was not reproducible over three re-runs on the OpenSFS cluster where between 97% and 150% write performance was observed.

IOR, Single Client, Single Shared File (SSF)
Read performs: 101% Write perform: 388% OBSERVATIONS: Observation of the 2.6.0 baseline showed write performance on Hyperion at 340MB/s. The CLIO Simplification client performed at 1400MB/s. Task LU-1669 (out of scope for this contract) is thought to be primarily responsible for the large improvement in write performance between 2.6.0 and the accumulated unrelated CLIO Simplification patches.

mdtest, Single Client
Best FFP tree filestat: 115% Best SSF tree dircreate: 103% Worst FFP tree treecreate: 97% Worst SSF tree rm: 97%

OBSERVATIONS: All 32 observations show CLIO Simplification performance within tolerance. Only the best and the worst are included here.

mdtest, 100 Clients
OBSERVATIONS: Out of 32 metrics only two metric are observed out of tolerance for mdtest at 100 clients scale. Out of tolerance observations were not repeated between consecutive runs and they are judged to be due to variability occurring at 100 client scale. A significant performance increase is observed on metrics including ‘file create’ and ‘file rm’.

Conclusion
Functional testing and SWL testing of the CLIO Simplification stack was completed successfully. Performance testing is also complete. Of 49 values presented within the performance testing, four were observed below 5% of the 2.6.0 baseline value. On review, these four observations can be attributed to the challenges of running repeatable tests with low variability at scale. Further tests were run to check whether CLIO Simplification patches introduced these observed regressions, but no evidence was found to support this being the case. We are confident that the CLIO Simplification patches do not introduce regressions and this phase of the project is complete.

Appendix
A pdf of the report, including detailed results are included in this document.

Landing Phase
Patches for a simplified CLIO stack were landed over a period of approximately six months with the final outstanding CLIO Simplification patch landed on Mon, 29 Jun 2015. The complete list of patches created by Intel and their commitment into the Lustre Master is recorded below.

During pursuit of the project goals, more than 89KLOC have been removed. This figure represents over 10% of the Lustre code base and the successful execution of this project delivers a simplified code base for future enhancement. With successful completion of this project, this milestone was agreed on 2015-07-07.

004-001 CLIO ioctl's should be functions
CHANGE REQUEST: 004-001 CLIO ioctl's should be functions.

BACKGROUND: Ioctl calls were included in the CLIO Simplification design to replace some obsolete ODB operations. Alternatively, individual functions can replace the ODB operations instead.

CHANGE: Do not implement ioctl calls. Implement functions.

ACTIONS REQUIRED:
 * Ensure none of the current patches are land.
 * Update the design document with the new design.
 * Update the ticket LU-5823 with new activity.
 * Execute work to complete LU-5823.

004-002 Omit CLIO Demonstration milestone and associated milestone payment
CHANGE REQUEST: 004-002 Omit CLIO Demonstration milestone and associated milestone payment

BACKGROUND: The Demonstration milestone has been rendered redundant by a precisely specified 'Test and Fix' milestone that will execute before Demonstration. In the current plan, 'Test and Fix' (10 weeks) includes specific tests to run (including 48hr SWL, and performance characterization). There was agreement that a useful Demonstration Milestone is exactly defined by 'Test and Fix' Milestone. The plan requires specification of Demonstration during the Implementation phase and no additional work beyond 'Test and Fix' has been identified.

CHANGE: Omit Demonstration Milestone from the plan of record.

ACTIONS REQUIRED:
 * Move directly from 'Test and Fix' milestone to 'Landing' milestone.
 * Communicate new plan to stakeholders.

Lessons Learned
Attendees: James, Jinshan, Doug, Richard, John.

"Variability should be communicated from the raw results through to the validation metrics."

"Schedule wasn't too bad: six weeks behind on demo, four weeks behind on Landing."

"Good headlines for the project: 89KLOC removed."

"Clean-up project was not overwhelming. One-or two regressions. Exposed problems in the test suite. Really need to figure out why they only become visible after time. go back and analyze why this issues aren't immediately."

"Thought it went pretty well."

"Spent time re-basing and avoiding collisions. Topic branches would be useful for something like this in the future."

"CLIO work was separated well from the DNE2 and LFSCK3/4 work areas. This avoided costly collisions during landing"

"Test matrix needed to be made available to assist communication, record configuration, and communicate data."

"Using the OpenSFS wiki as the canonical project tool worked well. It allowed our work to be visible and avoided copying work between wikis."