MDS SMP Node Affinity Implementation wiki version

Milestone Completion for the SMP Node Affinity Subproject on the Single Metadata Server Performance Improvements Project of the SFS-DEV-001 contract.
Revision History

INtroduction
The following milestone completion document applies to Subproject 1.1 – SMP Node Affinity subproject of the Single Metadata Server Performance Improvements within the OpenSFS Lustre Development contract SFS-DEV-001 signed 7/30/2011.

Per the contract, Implementation milestone is described as follows: “This subproject splits the computing cores available on the Metadata Server (MDS) into a configurable number of compute partitions, and binds the Lustre RPC service threads to run within a specified compute partition. This allows the RPC threads to run more efficiently by keeping data structures in cache memory close to the CPU cores on which they are running, and avoids needles contention on the inter-CPU memory subsystem. SMP Node Affinity also allows individual RPC requests to stay local to a specific compute partition, improving overall efficiency throughout the protocol stack as the number of cores increases.”

Per the contract, Implementation milestone is described as follows: “Contractor shall complete implementation and unit testing for the approved solution. Contractor shall regularly report feature development progress including progress metrics at project meetings and engineers shall share interim unit testing results as they are available. OpenSFS at its discretion may request a code review. Completion of the implementation phase shall occur when the agreed to solution has been completed up to and including unit testing and this functionality can be demonstrated on a test cluster. Code Reviews shall include:


 * 1) Discussion led by Contractor engineer providing an overview of Lustre source code changes.
 * 2) Review of any new unit test cases that were developed to test changes.

Location of Completed Solution
The agreed solution has been completed and is recorded in the following patches:

Demonstrate any new tests that have been developed.SMP Node Affinity does not require new functional tests as this project is === Demonstrate any new tests that have been developed. ===

SMP Node Affinity does not require new functional tests as this project is a performance enhancement.
During the course of development, two small changes were made to the existing tests.


 * 1) Force enable multiple CPU partitions for autotest. By default, libcfs will create multiple CPU partition only for system with > 4 CPU cores. It is preferential to run test with multiple CPU partitions for all SMP machines. A patch was developed to always enable multiple CPU partitions on systems with multiple cores
 * 2) Minor issue fixes. Now multiple CPU partitions are provided modifications to the tests were required to work around brittle interractions between autotest and the procfs subsystem.

These changes are recorded ashttp://review.whamcloud.com/#change,3288

The completion of these modified tests is recorded as https://maloo.whamcloud.com/test_sessions/076bf58e-ca29-11e1-9192-52540035b04c

A subsequent test on IB is included in Appendix 2 recorded at https://maloo.whamcloud.com/test_sessions/2912130e-fd4f-11e1-b09c-52540035b04c

Demonstration of SMP Node Affinity functionality.
After landing the final patch, the complete test framework is recorded as completing at the following record:

https://maloo.whamcloud.com/test_sessions/076bf58e-ca29-11e1-9192-52540035b04c

The result detail is recorded in Appendix 1.

Conclusion
Implementation has been completed according to the agreed criteria.

Appendix 1 Autotest results on TCP/IP
Uploaded by: Whamcloud Autotest. Reason: landing. 12 test sets passed out of 12. Code review references

Session for group review	(fat-intel-3vm6, liang) Uploaded by:Whamcloud Autotest. Reason: landing. 12 test sets passed out of12. Code review references gerrit:3288 id:	b365fcb82a38761a4c40ff09ed653b7654a77d9e change_no: 3288 jira:LU-1607 id: LU-1607

Test sets


Kernel Version:

2.6.32-220.17.1.el6_lustre.g4a711e4.x86_64

Lustre Version:

jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkern

OS:

GNU/Linux

Networks:

tcp

Memsize:

1.96 GB

Lustre Build:

http://build.whamcloud.com/job/lustre-reviews/7631

Architecture:

x86_64

File System:

ldiskfs

Lustre Branch:

master

Node Architecture:

x86_64

Services:

MDS 1

Lustre Revision:

b365fcb82a38761a4c40ff09ed653b7654a77d9e

Distribution:

CentOS release 6.2

Name:

fat-intel-3vm3

fat-intel-3vm4
Kernel Version:

2.6.32-220.17.1.el6_lustre.g4a711e4.x86_64

Lustre Version:

jenkins-arch=x86_64,build_type=server,distro=el6,ib_stack=inkern

OS:

GNU/Linux

Networks:

tcp

Memsize:

1.96 GB

Lustre Build:

http://build.whamcloud.com/job/lustre-reviews/7631

Architecture:

x86_64

File System:

ldiskfs

Lustre Branch:

master

Node Architecture:

x86_64

Services:

OST 6, OST 7, OST 2, OST 3, OST 4, OST 5, OST 1

Lustre Revision:

b365fcb82a38761a4c40ff09ed653b7654a77d9e

Distribution:

CentOS release 6.2

Name:

fat-intel-3vm4

fat-intel-3vm5
Kernel Version:

2.6.18-238.19.1.el5

Lustre Version:

jenkins-arch=x86_64,build_type=client,distro=el5,ib_stack=inkern

OS:

GNU/Linux

Networks:

tcp

Memsize:

1.96 GB

Lustre Build:

http://build.whamcloud.com/job/lustre-reviews/7631

Architecture:

x86_64

File System:

ldiskfs

Lustre Branch:

master

Node Architecture:

x86_64

Services:

Client 2

Lustre Revision:

b365fcb82a38761a4c40ff09ed653b7654a77d9e

Distribution:

CentOS release 5.8

Name:

fat-intel-3vm5

fat-intel-3vm6
Kernel Version:

2.6.18-238.19.1.el5

Lustre Version:

jenkins-arch=x86_64,build_type=client,distro=el5,ib_stack=inkern

OS:

GNU/Linux

Networks:

tcp

Memsize:

1.96 GB

Lustre Build:

http://build.whamcloud.com/job/lustre-reviews/7631

Architecture:

x86_64

File System:

ldiskfs

Lustre Branch:

master

Node Architecture:

x86_64

Services:

Client 2

Lustre Revision:

b365fcb82a38761a4c40ff09ed653b7654a77d9e

Distribution:

CentOS release 5.8

Name:

fat-intel-3vm6

Session for group review (client-23-ib, liang)

Uploaded by: Whamcloud Autotest.

Reason: landing.

12 test sets passed out of 12.

Code review references
gerrit:381