LWG Minutes 2015-12-16

From OpenSFS
Jump to: navigation, search

Attendance

Cray: Chris Horn, Cory Spitz, Justin Miller
ORNL: Sarp Oral, James Simmons
Indiana: Ken Rawlings
Intel: Joe Gmitter, Peter Jones, Paul Sathis, Andreas Dilger, Richard Henwood
Seagate: Kalpak Shah
Sandia: Ruth Klundt
Fermilab: Alex Kulyavtsev

Actions

New Actions Captured:

  • Peter to post the process of requesting access to Jira for community developers on the wiki.

Existing Open Actions:

  • James will do a short write-up on the wiki.lustre.org for how to submit upstream

Actions Recently Closed:

Minutes

2.8 Release Status/Update
Peter

  • We are getting quite close to completion on 2.8, but not there yet.
  • Continuing efforts of stress testing DNE under recovery situations. Testing is holding up well, with a few patches for edge case bugs still remaining to be landed for the release.


2.8 Testing Updates

  • ORNL (James): Large scale test shot with titan last night, using a snapshot from about a week ago on clients (servers running 2.5). No major issues and confirms that LU-7173 did not occur. Two minors issues: lctl replace nids not working correctly, and a router crash after an IB leaf switch rebooted. James will be sending out stack trace in a new ticket. Also working on compiling performance resutls, however, no performance regressions were observed. A small performance increase IOR was observed.


MLX5 Drivers

  • Sarp: Any update on MLX5 driver support? Is this on Intel’s radar?
    • Peter: Yes, this is absolutely on Intel’s radar, but is orthogonal to the releases themselves. If it is ready before 2.8 goes out, it could be landed if there were no risk to overall stability. However, the release should not be held for it.
    • Discussion open to the floor for other opinions and all are in favor of NOT holding the releases for it
  • James: Tried Power8 nodes this morning and it was functional. Overall, the first round of patches is functional (LU-5783), but needs to be further solidified.


2.9 Planning
Peter

  • As a reminder, there is a 2.9 page put together which contains a matrix of major features and corresponding assignments for development, reviews, testing, and documentation to make sure we have someone identified for each role.


  • Any updates on assignment of primary reviews for features?
    • Cray (Cory): Asked Henri Doreau to be the lead reviewer for NRS Delay Policy and he has agreed. The page has been updated to reflect this assignment. Also, working with Vitaly to potentially be the lead reviewer on the Lock Ahead feature. There is no confirmation yet from Vitaly, but hopeful it will be coming soon. Also, will work to identify a second viewer on Lock Ahead, potentially Frank Zago.
    • DDN is also on the feature list for Server Side Advise and Hinting. This is a work that has been in progress for some time (LU-4931). Please consider having a review of the work to see if anyone is willing to offer up reviewers to help land this in the 2.9 timeframe.


  • Any potential for upstream work completion in the 2.9 timeframe? Any other upstream client effort updates?
    • James: Primary focus has been on sinking up libcfs and Lnet - this could be doable. The core client work has a lot of patches and needs to be fine tuned and merged in - this would be more of a challenge.
    • Cory: Ben was having some trouble getting to agreement on what an appropriate amount of work is to put in and is now working on it again. James reports that he has been looking at Ben’s work and testing from time to time.
    • James: A reminder that he put a note out to the mailing list on testing of the upstream client. Andreas and Oleg provided inputs. Does anyone else have any feedback?


Other Business

  • Discussed the appropriate mechanism for adding access for community developers.
    • The process has been to email Peter directly with the request.
    • Cory stated that all big feature items on the 2.9 page should be assigned directly to those community developers.
    • Peter proposes using the LU project in Jira as the access request flow to make sure there is traceability and a flow in place if he is unavailable.
  • It was agreed that the next call on 12/30 will be cancelled. If anything urgent arises that needs discussion prior to the next regularly scheduled slot, a one-time meeting can be arranged.


Next meeting will be on 2016-01-13