LWG Minutes 2016-01-13

From OpenSFS Wiki
Jump to navigation Jump to search


Cray: Cory Spitz, Justin Miller
ORNL: Sarp Oral, James Simmons
Indiana: Ken Rawlings, Steve Simms
Intel: Joe Gmitter, Peter Jones, Paul Sathis, Andreas Dilger
Seagate: Kalpak Shah
Sandia: Ruth Klundt
Fermilab: Alex Kulyavtsev


New Actions Captured:

  • James to send Peter the list of outstanding cifs/libcfs patches.
  • Peter to provide guidance to the community on the potential timeframe for 2.9 landings.
  • All: consider possible content for LUG developer day, LUG panel discussion, and feedback on the user survey for the next meeting.

Existing Open Actions:

  • James will do a short write-up on the wiki.lustre.org for how to submit upstream
    • 1/13 update: James has provided a rough draft, however, only Oleg and Andreas provided any feedback. Peter suggested posting to the wiki and allowing updates to be made from there

Actions Recently Closed:


2.8 Release Status/Update

  • The main blocking issues for the release remain to be around DNE recovery. There has been some slowdown on progress the past few weeks due to the varying holidays in the US and Russia.
  • One fix in particular (LU-7039) seems essential to land as we have seen it make a massive difference in our testing. There are some differing opinions of the approach in the patch and now that the needed resources are back in the office after the holidays, there should be more progress on closing it out.
  • We also uncovered a sporadic ZFS client eviction issue when running load under stress. It took a lot of engineering hours to uncover the root cause, but we have narrowed it down to a single commit in the ZFS baseline that is causing the problem. A bug has been submitted to the upstream ZFS community and we have rolled back to version on master to avoid the issue.
  • Sarp: How many blockers are there for the release?
    • Peter: There are currently 2 blockers from what we know at the moment. We also have a number of other issues marked for 2.8 that are worthwhile to try and resolve while we work through the blockers, however, they will not hold the release in their own right.

2.8 Testing Updates

  • ORNL (James): Have been doing quite a bit of testing, mostly focused on cifs support. The testing has all been good thus far.
    • Peter: What is still outstanding?
      • James: 7 outstanding patches for cifs, 1 for libcfs. Approximately 3 or 4 of the patches are ready to land. James to send this list of patches to Peter.
  • The other area ORNL has been working on is testing of LNet, namely LU-7569, LU-5783, and LU-7101. Also, James has been discuss LU-7650 with Andreas.
    • Discussion on the call around when the structures were added, discovered to be back in the 2005 timeframe. There is general agreement that it would be too late in the game to make this change now and that it can be targeted for 2.9.
  • Cory, Kalpak, and Ruth all report no news to share at this time regarding testing.

Release Dates

  • Sarp: What is the projected release date for 2.8?
    • Peter: The most optimistic date would be end of January, however, most likely date is likely to slip into February. From the Intel perspective, we now have the expertise needed to progress on the critical issues after the conclusion of the US and Russia holidays.

b2_8 Branch Creation

  • Cory: Is there a date set to create the b2_8 branch so things queued up can continue to land to master?
    • Peter: It is more of an issue of the current state of master than establishing a firm date. For example, LU-7039 is quite large and critical, so it would be best to close out on that before the branch.
  • Cory: It would be useful to have some transparency and communication around when the community can expect to be able to land 2.9 content. Is there something we can put together on the wiki or send out to lustre-devel perhaps?
    • Peter: Volunteered for the action and will target to have something out to the community before the end of the week.

Outstanding MLX Patches

  • James: We need further patches to make 2.8 work with MLX4.
  • Peter: It would be risky to land this work now without further review of the patch from the community, given the complexity involved. Therefore, we are inclined to defer this to 2.9 and leave it to those that need it to apply those patches
  • Sarp: ORNL is fine with carrying a few patches in addition to the 2.8 release for this work

LU-5050: cpu_npartitions default

  • Peter: We are trying to gather feedback on this from the community and there is some uneasiness about doing the change based upon limited feedback. It may be best to wait for a more thorough vetting of the change from the community.
  • Andreas: The behavior has been present since the 2.2/2.3 timeframe and it seems risky to make a change to the default, with seemingly low demand for the change, this close to the release date. It would be better to have it at the beginning of 2.9 so it can be thoroughly fleshed out in testing one the majority of the 2.9 cycle.
  • James: I have been testing it and it does everything as advertised. We have tried it on a range of nodes and have seen no issues with it.
  • Andreas: Fully agrees that there would be less confusion if we change the default, the concern was just having a small sample size of testing. Having tested it out on multiple configurations puts it in a much better position.

2.9 Planning

  • No significant news to report. We will look to start queuing things up for landing to master when the b2_8 branch is created.

LUG Topics

  • Lustre Developer Day @ LUG
    • http://opensfs.org/events/lustre-developer-day/
    • Peter reports that the LWG has been asked to develop the agenda for the developer day (monday before LUG). This will be a standing item on the agenda the next few meetings.
    • Please give this some thought and if you have any thoughts regarding content, please bring them up at the next call.
  • Lustre Community Survey
  • Lustre Panel Discussion at LUG?
    • Any thoughts on potential subjects for a Lustre panel discussion at LUG? Any immediate thoughts? Also, please bring any new thoughts to the next.
      • Cory: It would be nice to hear what the user perspective is regarding what needs to evolve in Lustre to keep it relevant and competitive. What optimizations are they looking for? What would potentially cause them to look elsewhere?
        • Peter: Fully agrees, in fact suggested the same topic.
      • Sarp: Is there value in a comprehensive study or discussion comparing features and functions between GPFS and Lustre to get Lustre caught up wherever it may be lagging? Are vendors willing to share their “TODO” list as well?
        • Peter: The program committee is driving to have this more user driven, so it would be best to approach it from a user perspective for LUG, perhaps in the context of a panel of users who use both filesystems. The Lustre ecosystem conference talked about this topic from a vendor perspective and Malcolm from Intel presented a version of this at last LUG.

Upstream Lustre Client

  • Have been pushing stuff on the master side so utilities can work with the upstream client. Several patches have been pushed up but have not been merged yet.
    • Andreas: Are you seeing this with just Lustre patches or all patches?. Is this expected to continue or to improve?
      • James: It has been everything, there has been patches merged in this cycle but it has been limited. External tree merges have happened, but those are the easiest. There is talk of adding a second person to do merges (Dan Carpenter) in addition to Greg to increase throughput.
    • Andreas: Did LNET cleanup patches make it in?
      • James - No, Dan had some opinions on it and they need some further cleanup.

Ken is serving as the lustre.org coordinator for OpenSFS. A few immediate areas of attention are:

  • Having the old wiki is pretty confusing and shows out of date content. Ken would like to start a process to retire the old wiki.
  • Work with subject matter experts to review the content on the current wiki, both broad spectrum and targeted areas for accuracy.
  • Add a categorization scheme or hierarchical structure for wiki articles and surface that structure to front page of the site to make it easier to find.
  • Restart the lustre.org working group and report back to LWG on activity.
    • Andreas: One issue in removing the old wiki completely is that it does contain quite a bit of useful information. Not everything is obsolete, there is some material that just has not been translated to new wiki. We need to be more aggressive in that aspect, as nobody has taken the time to do it.
    • Ken: The 3 step plan is: 1 - list each page as potentially out of date, letting people know it hasn’t been validated; 2 - migration of needed information to the new wiki; 3 - shutting down the old wiki site. The review in step 2 is expected to take some considerable time.
    • Andreas - Strongly suggests that someone like ken generates a list of pages and volunteers to review if it is out of date or not. This may be the best approach to make reasonable progress on it.
    • Cory - It sounds like a good effort, but will needs a lot of volunteers. It would be worthwhile to send out a note to the community that volunteers are needed.
    • Peter - Encourages Ken to you to use lustre.org mailing list, as it is a much larger distribution and encourages non-coders as well to consider volunteering and contributing.

Other Business

  • None

Next meeting will be on 2016-01-27