Difference between revisions of "LWG Minutes 2015-04-15"

From OpenSFS
Jump to: navigation, search
(Created page with "== Agenda == # Monitoring effort# Are we moving development platform to lustre.org # Testing coordination, getting more communication# Documentation effort, especially on the ...")
 
Line 1: Line 1:
== Agenda == # Monitoring effort# Are we moving development platform to lustre.org
+
== Agenda ==
# Testing coordination, getting more communication# Documentation effort, especially on the wiki at lustre.org# Release cadenceWe need more people to take projects on and lead them, become a championMaintain webpage on wiki, what work is going on and what is left to doRegularly report back to the LWG== Monitoring effort ==A lot of people doing a lot on their own, would be great to get togetherand not repeatLLNL has LMT - it is not dead, but we don't have DNE support and Javaclient isn't developedRobert: It collects the data for DNE but it doesn't display the dataAndreas: If the java client is dead, it should be removed from the projectand clearly marked as deprecated.  People still use it because they don't know the statusChris: We're the main guys driving development, not a lot of developmentfrom the outsideWe'd be happy to collaborate on any other toolsWho will be the project lead on this?Simms: Folks from Wisconsin are willing to offer up their workSDSC: Any products they put out we'll deploy, but not volunteering to bethe leadJames: What is every using now to monitor their sites?Kluge: Logstash, kabanaSDSC: CEA: OpenTSDB, kabanaJason Hill: homegrown scripts tied into nagiosLMT, homegrown scripts, lltop for job monitoringLooking to switch to graphanaChris: At least consider who at your sites could be a champion for thisprojectThis ties into documentation. We need to understand what is in the procfiles.James: We need to know what to measure.Chris: Friends at Wisconsin have created a wiki page documenting much ofthisAndreas: The manual has a lot information. A single page giving a how toon monitoring would help.Cory: I think what you're asking for, the BWG has made an effort on this.Simms: They're may have been an effort but nothing has been settled upon.So much effort is going into this, OpenSFS needs to get behind one or twotools to help new Lustre usersFermi: We should focus efforts to standardize interfaces so people don'tneed to reproduce collection effortsJames: Ultimately we'll be moving to sysfs which will replace procfsChris: We'll also need to use debugfs because sysfs is one line per fileJames: debugfs is sequence file based and could helpFermi: We could take the approach with event model like what was done withchangelogs, publish data with thresholdsJames: I agree with you, and sysfs can push uevents for event handling.SDSC: There is a lot of feature releases and change, it is hard for newusers to find out and use these featuresWiki is a place to publish standard feature usesChris: Landing collateral needs to include this type of informationAndreas: The manual gets updated but there may be a need for a 5-min demoChris: Go back and think about what we can collaborate== Documentation ==We should start working on a large list of landing collateral for newfeatures, including appropriate documentationOleg: People hate new restrictionsChris: Sure, but that is how we make it happenCory: We could set it up so a +1 can review from a publication botSDSC: I just need the steps to get it running, it doesn't have to beexhaustiveAndreas: There is stuff in the manual, people just don't look at itChris: The manual is almost a lost cause. Mismash, in docbook. There arereal barriersAndreas: Updating the manual isn't what people are looking for, is it aREADME.dne or a wiki page?Vitaly: Is this about manual or about design documents? New featuresshould include design docsAndreas: Design docs are on wiki, but not all PAC leaders publishChris: The design docs are pretty poor anyway, and too high level foradministratorsAndreas: man pages do describe how to do a lot of this. If they don't lookin the manual, or man page, where are they going to look?Chris: The wiki should be the place to go. Ceph as an example. We shouldhave the same for LustreWe need editors to create high level starting pages for people who don'tknow exactly what they're looking forAlex: Developer documentation is lackingChris: True, there is a lot of lacking documentation. We need to startsomewhere. There is an OpenSFS contract for the protocol documentation.Alex: These get outdated quickly.Chris: All new features that change the protocol must update the protocoldocumentation with suggested changes. Landing criteria.New landing rules.Vitaly: It would helpful to organize all the existing design documentsinto a central location.Chris: the wiki is a great place for that.Oleg: They should also be in the tree so they match the version you'vechecked out. Then when patches come in, we'll know they've changed some design and whatneeds to be updated in the tree.Peter Jones: Once we have the protocol document, I like the idea ofrequiring updates.How do you catch every change though?Chris: Peer review processPeter Jones: I was thinking of the gatekeepers role, compared to theoverall volume of changes this will be a small percentageCory: It should be on the inspectors and reviewersOleg: There are certain warning flags that would show need for protocoldocumentationChris: Protocol needs to not change with every change. We also need todocument on disk protocolIf we keep the protocol documentation separate, it will help maintaincompatabilityAndreas: There clearly has been ongoing efforts to maintain protocolcompatibility. Some changes update protocol but don't fundamentally changeit. Need to be clear about when feature was introduced.Chris: Protocol documentation would make it clear if this feature flag isset, you must support this.Simms: We're all over the place. There is a range of documentation here,high to low. IU has an editor, and when you're doing something technical,you just push it to them and they work it into a document or updateexisting. Fermi: We need to lower entry barrier, and allow developer to publishshort notes. Tech editor and work it into a more formal document.Simms: It would be useful to have a person(s) that it would help to havesomebody not in this room who is an editor to create a consistent styleand enforce it.Better documentation would increase adoption of Lustre.Chris: Take a step back. We need to focus on how we're going to makeprogress. People have pet interest, start using the wiki, publish.Who are the main markets for documentation? Starting points fordevelopers, new users, admins. We need a champion who can rally people tocreate this starting point.SDSC: We should compare against GPFS and do better than that.Chris: Google presence is improving, and good information will bubble tothe topWe also need focus on the packaging and installation in comparison to GPFSSimms: suggest brainstorming from people in the roomFermilab: We need to publish these suggestions to the community so theycan contribute tooSimms: I'm prepared to ask IU to use a slice of time from our editors tohelp with the initiativeWe need ideas on what specifically we can help with.Andreas: OpenSFS could be funding a technical editor to help with thisconsistent style issue.Rich: Features are awesome, bug fixes are necessary, but this will havethe biggest impact on adoption.Cory: We brought down the promoter fee, but they're still committed toproviding resources. We should ask the board to poll all members to donatea 1/4FTE it might be easier to get something done.Brainstorming exercise about what needs to be done:Chris: We don't have any money, so we'll have to be creative to getsomething done. Chris, CEA, Marc are willing to spend some their tomeAndreas: Developers are not the people to be updating this. Everyuniversity has a "how to use Lustre effectively" It's the consumers thathave the most to contribute.At CFS there was a policy of not replying directly to inquiries. UpdateFAQ and point at that.James: We get the Mellanox question about once a monthSimms: I've proposed somebody follows lustre-discuss and capturesknowledge and adds it to a appropriate venueAndreas: Are there documentation engines we can leverage here?Marc: Stackexchange is openly availableChris: This fractures the documentation. Do we want this and the wiki?Oleg: It would help for specific questionsJohn Hammond: We could use the editor for the wiki, and the communitywould use stackexchangeChris: We need to cut Operations manual in half. 1/2 operations and other1/2 for developersLNET should be promoted to the top.John Hammond: The manual looks like HTML 1.1 and modernization would helpattract people to Lustre. Also reformat from one big page.Andreas: It needs to be google searchable as well.Chris: We also need to have version releases of manualAndreas: But this requires backing porting changesOleg: At the start of some manuals they have a changelog that specifies atwhat version things changedChangelog section seems to be very important.Andreas: There are facilities for this in the manual already.Chris: Not exactly what people want, they want it up front.Needs to be mainly flat text, but at the beginning explain what thedifferences are.Fermilab: There are issues with formatting. Command line examples go overthe edge.Chris: We need responsible people to report bugs and an editor that goesin and makes the changes.Jason Hill: Sent a note to their technical editors to sit down with aLustre admin to begin fleshing this out.Marc: I'm happy to review changes for technical accuracy.Andreas: The information people want is there, it needs to be other peoplethat doing some of these things. This is the wrong audience for thisdiscussion.Peter Jones: Isn't Richard Friedman a technical writer?Terri: He's paid by the hour and his allotment is already used.Cory: Can Richard work on some of the ideas we've listed? Getting themanual indexableNirmal: The log messages are a whole other challenge. The knowledgeableshould help with this.Chris: This would be great for the stackexchangeJason Hill: This also relates to tutorial style content that Chris andJason have discussedWe've talked about talking about tutorial day at LUG or the LustreEcosystem workshopSimms: Terri had a good idea before. What if we were to have developerstalk about subsystems? These are the pieces parts, video tape it and putit up on a site. Helps new developers.Like a podcast or a "Lustre hour"SDSC: Large conferences would be a great place to capture content like thisFermilab: Everything should at least be in a central location.Andreas: Do we need to get rid of the source control and review process?Alexander: It does make it much easier for people to contribute.SDSC: I think you need both, wiki and manual are separate documents fordifferent purposes.Chris: We've been slow at updating the manual, but we can make the mostprogress with the wiki. Work on that first and circle back to the manual.Simms: We need to identify what documentationBrainstorming:
+
 
* Tools* Howtos* Log decoder* Parameters and configuration* Where to go when something breaks* Tutorials* Make it pretty* FAQ* LNET* Disks* Clients* Servers* Best practices* PDFs in a container that gives details for each subsystem* Design docs* Protocol documentation* doxygen* Documentation inside the code* man pages* Style rubric and coding guidelines
+
# Monitoring effort
* Quickstart* Install * Configuration and Tuning* Troubleshooting* IndexRobert Read: readthedocs.orgJohn Hammond: The code also needs to be cleaner and with more clairty. ex.better variable namesAndreas: Instead of external design documents, code needs to bedocumented. Header and function blocks that clearly articulate what thecode does.== What are we doing with developer resources? ==Chris: People seem to think it is inevitable it will move to lustre.orgbut I don't necessarily agreeWe're in too much a limbo right now.Do people think have to move them away?James: Who is going to run this?Simms: Need to prioritize what we're going to do, and identify how we'regoing to do itSDSC: People that object doing it under Intel need to pay for doing thisAndreas: The Intel system is open, and free for the community to use. Thiswould be a duplication of effort.There is no clear benefit to wholesale changing this.Chris: There is a vocal element that says the community needs to host thisand get away from Intel.Personally I feel this is a duplication of effort and lots of other thingsto do.Peter: Why make a bunch of complexity for ourselves? We could easily movethe manual and issue tracker to OpenSFS and solve most of the complaints.Andreas: A git.lustre.org URL redirection is a quick fix to this.downloads.lustre.orgRobert: I suggest moving the manual to github as a first step towards this.Jason: If there is no pressure from Intel, there is no pressing needchange this. Much larger issues to tackle.CEA: Gerrit and git would be easiest to move.Jason Hill: Big hill to get people to change their development work flow.Peter Jones: Sounds like not this year.Robert Read: Mirror the git tree on github.Andreas: I'll donate the lustre github account== Other ==Kyr: Static analysisCory: This relates to testing efforts
+
# Are we moving development platform to lustre.org
 +
# Testing coordination, getting more communication
 +
# Documentation effort, especially on the wiki at lustre.org
 +
# Release cadence
 +
 
 +
== Introduction ==
 +
 
 +
We need more people to take projects on and lead them, become a champion
 +
 
 +
Maintain webpage on wiki, what work is going on and what is left to do
 +
 
 +
Regularly report back to the LWG
 +
 
 +
== Monitoring effort ==
 +
 
 +
A lot of people doing a lot on their own, would be great to get together and not repeat
 +
 
 +
LLNL has LMT - it is not dead, but we don't have DNE support and Javaclient isn't developed
 +
 
 +
Robert: It collects the data for DNE but it doesn't display the data
 +
 
 +
Andreas: If the java client is dead, it should be removed from the projectand clearly marked as deprecated.  People still use it because they don't know the status
 +
 
 +
Chris: We're the main guys driving development, not a lot of development from the outside
 +
 
 +
We'd be happy to collaborate on any other tools
 +
 
 +
Who will be the project lead on this?
 +
 
 +
Simms: Folks from Wisconsin are willing to offer up their work
 +
 
 +
SDSC: Any products they put out we'll deploy, but not volunteering to bethe lead
 +
 
 +
James: What is every using now to monitor their sites?
 +
 
 +
Kluge: Logstash, kabanaSDSC: CEA: OpenTSDB, kabana
 +
 
 +
Jason Hill: homegrown scripts tied into nagios
 +
LMT, homegrown scripts, lltop for job monitoringLooking to switch to graphana
 +
 
 +
Chris: At least consider who at your sites could be a champion for thisproject
 +
 
 +
This ties into documentation. We need to understand what is in the proc files.
 +
 
 +
James: We need to know what to measure.
 +
 
 +
Chris: Friends at Wisconsin have created a wiki page documenting much of this
 +
 
 +
Andreas: The manual has a lot information. A single page giving a how to on monitoring would help.
 +
 
 +
Cory: I think what you're asking for, the BWG has made an effort on this.
 +
 
 +
Simms: They're may have been an effort but nothing has been settled upon. So much effort is going into this, OpenSFS needs to get behind one or two tools to help new Lustre users
 +
 
 +
Fermi: We should focus efforts to standardize interfaces so people don't need to reproduce collection efforts
 +
 
 +
James: Ultimately we'll be moving to sysfs which will replace procfs
 +
 
 +
Chris: We'll also need to use debugfs because sysfs is one line per file
 +
 
 +
James: debugfs is sequence file based and could help
 +
 
 +
Fermi: We could take the approach with event model like what was done with changelogs, publish data with thresholds
 +
 
 +
James: I agree with you, and sysfs can push uevents for event handling.
 +
 
 +
SDSC: There is a lot of feature releases and change, it is hard for newusers to find out and use these features
 +
Wiki is a place to publish standard feature uses
 +
 
 +
Chris: Landing collateral needs to include this type of information
 +
 
 +
Andreas: The manual gets updated but there may be a need for a 5-min demo
 +
 
 +
Chris: Go back and think about what we can collaborate
 +
 
 +
== Documentation ==
 +
 
 +
We should start working on a large list of landing collateral for new features, including appropriate documentation
 +
 
 +
Oleg: People hate new restrictions
 +
 
 +
Chris: Sure, but that is how we make it happen
 +
 
 +
Cory: We could set it up so a +1 can review from a publication bot
 +
 
 +
SDSC: I just need the steps to get it running, it doesn't have to be exhaustive
 +
 
 +
Andreas: There is stuff in the manual, people just don't look at it
 +
 
 +
Chris: The manual is almost a lost cause. Mismash, in docbook. There are real barriers
 +
 
 +
Andreas: Updating the manual isn't what people are looking for, is it a README or a wiki page?
 +
 
 +
Vitaly: Is this about manual or about design documents? New feature sshould include design docs
 +
 
 +
Andreas: Design docs are on wiki, but not all PAC leaders publish
 +
 
 +
Chris: The design docs are pretty poor anyway, and too high level for administrators
 +
 
 +
Andreas: man pages do describe how to do a lot of this. If they don't look in the manual, or man page, where are they going to look?
 +
 
 +
Chris: The wiki should be the place to go. Ceph as an example. We should have the same for Lustre.  We need editors to create high level starting pages for people who don't know exactly what they're looking for
 +
 
 +
Alex: Developer documentation is lacking
 +
 
 +
Chris: True, there is a lot of lacking documentation. We need to start somewhere. There is an OpenSFS contract for the protocol documentation.
 +
 
 +
Alex: These get outdated quickly.
 +
 
 +
Chris: All new features that change the protocol must update the protocol documentation with suggested changes. Landing criteria. New landing rules
 +
 
 +
Vitaly: It would helpful to organize all the existing design documents into a central location.
 +
 
 +
Chris: the wiki is a great place for that.
 +
 
 +
Oleg: They should also be in the tree so they match the version you've checked out. Then when patches come in, we'll know they've changed some design and what needs to be updated in the tree.
 +
 
 +
Peter Jones: Once we have the protocol document, I like the idea of requiring updates. How do you catch every change though?
 +
 
 +
Chris: Peer review process
 +
 
 +
Peter Jones: I was thinking of the gatekeepers role, compared to the overall volume of changes this will be a small percentage
 +
 
 +
Cory: It should be on the inspectors and reviewers
 +
 
 +
Oleg: There are certain warning flags that would show need for protocol documentation
 +
 
 +
Chris: Protocol needs to not change with every change. We also need to document on disk protocol If we keep the protocol documentation separate, it will help maintain compatability
 +
 
 +
Andreas: There clearly has been ongoing efforts to maintain protocol compatibility. Some changes update protocol but don't fundamentally change it. Need to be clear about when feature was introduced.
 +
 
 +
Chris: Protocol documentation would make it clear if this feature flag is set, you must support this.
 +
 
 +
Simms: We're all over the place. There is a range of documentation here, high to low. IU has an editor, and when you're doing something technical,you just push it to them and they work it into a document or update existing.
 +
 
 +
Fermi: We need to lower entry barrier, and allow developer to publish short notes. Tech editor and work it into a more formal document.
 +
 
 +
Simms: It would be useful to have a person(s) that it would help to have somebody not in this room who is an editor to create a consistent style and enforce it. Better documentation would increase adoption of Lustre.
 +
 
 +
Chris: Take a step back. We need to focus on how we're going to make progress. People have pet interest, start using the wiki, publish. Who are the main markets for documentation? Starting points for developers, new users, admins. We need a champion who can rally people to create this starting point.
 +
 
 +
SDSC: We should compare against GPFS and do better than that.
 +
 
 +
Chris: Google presence is improving, and good information will bubble to the top. We also need focus on the packaging and installation in comparison to GPFS
 +
 
 +
Simms: suggest brainstorming from people in the room
 +
 
 +
Fermilab: We need to publish these suggestions to the community so they can contribute too
 +
 
 +
Simms: I'm prepared to ask IU to use a slice of time from our editors to help with the initiative. We need ideas on what specifically we can help with.
 +
 
 +
Andreas: OpenSFS could be funding a technical editor to help with this consistent style issue.
 +
 
 +
Rich: Features are awesome, bug fixes are necessary, but this will have the biggest impact on adoption.
 +
 
 +
Cory: We brought down the promoter fee, but they're still committed to providing resources. We should ask the board to poll all members to donate a 1/4FTE it might be easier to get something done.B rainstorming exercise about what needs to be done
 +
 
 +
Chris: We don't have any money, so we'll have to be creative to get something done.  
 +
 
 +
Chris, CEA, Marc are willing to spend some their time
 +
 
 +
Andreas: Developers are not the people to be updating this. Every university has a "how to use Lustre effectively" It's the consumers that have the most to contribute. At CFS there was a policy of not replying directly to inquiries. Update FAQ and point at that.
 +
 
 +
James: We get the Mellanox question about once a month
 +
 
 +
Simms: I've proposed somebody follows lustre-discuss and captures knowledge and adds it to a appropriate venue
 +
 
 +
Andreas: Are there documentation engines we can leverage here?
 +
 
 +
Marc: Stackexchange is openly available
 +
 
 +
Chris: This fractures the documentation. Do we want this and the wiki?
 +
 
 +
Oleg: It would help for specific questions
 +
 
 +
John Hammond: We could use the editor for the wiki, and the community would use stackexchange
 +
 
 +
Chris: We need to cut Operations manual in half. 1/2 operations and other 1/2 for developers LNET should be promoted to the top.
 +
 
 +
John Hammond: The manual looks like HTML 1.1 and modernization would help attract people to Lustre. Also reformat from one big page.
 +
 
 +
Andreas: It needs to be google searchable as well.
 +
 
 +
Chris: We also need to have version releases of manual
 +
 
 +
Andreas: But this requires backing porting changes
 +
 
 +
Oleg: At the start of some manuals they have a changelog that specifies at what version things changed.  Changelog section seems to be very important.
 +
 
 +
Andreas: There are facilities for this in the manual already.
 +
 
 +
Chris: Not exactly what people want, they want it up front. Needs to be mainly flat text, but at the beginning explain what the differences are.
 +
 
 +
Fermilab: There are issues with formatting. Command line examples go over the edge.
 +
 
 +
Chris: We need responsible people to report bugs and an editor that goes in and makes the changes.
 +
 
 +
Jason Hill: Sent a note to their technical editors to sit down with a Lustre admin to begin fleshing this out.
 +
 
 +
Marc: I'm happy to review changes for technical accuracy.
 +
 
 +
Andreas: The information people want is there, it needs to be other people that doing some of these things. This is the wrong audience for this discussion.
 +
 
 +
Peter Jones: Isn't Richard Friedman a technical writer?
 +
 
 +
Terri: He's paid by the hour and his allotment is already used.
 +
 
 +
Cory: Can Richard work on some of the ideas we've listed? Getting the manual indexable
 +
 
 +
Nirmal: The log messages are a whole other challenge. The knowledgeable should help with this.
 +
 
 +
Chris: This would be great for the stackexchange
 +
 
 +
Jason Hill: This also relates to tutorial style content that Chris and Jason have discussed. We've talked about talking about tutorial day at LUG or the Lustre Ecosystem workshop
 +
 
 +
Simms: Terri had a good idea before. What if we were to have developers talk about subsystems? These are the pieces parts, video tape it and put it up on a site. Helps new developers.Like a podcast or a "Lustre hour"
 +
 
 +
SDSC: Large conferences would be a great place to capture content like this
 +
 
 +
Fermilab: Everything should at least be in a central location.
 +
 
 +
Andreas: Do we need to get rid of the source control and review process?
 +
 
 +
Alexander: It does make it much easier for people to contribute.
 +
 
 +
SDSC: I think you need both, wiki and manual are separate documents for different purposes.
 +
 
 +
Chris: We've been slow at updating the manual, but we can make the most progress with the wiki. Work on that first and circle back to the manual.
 +
 
 +
Simms: We need to identify what documentation
 +
 
 +
Brainstorming:
 +
* Tools
 +
* Howtos
 +
* Log decoder
 +
* Parameters and configuration
 +
* Where to go when something breaks
 +
* Tutorials* Make it pretty
 +
* FAQ* LNET* Disks
 +
* Clients* Servers
 +
* Best practices
 +
* PDFs in a container that gives details for each subsystem
 +
* Design docs
 +
* Protocol documentation
 +
* doxygen
 +
* Documentation inside the code
 +
* man pages
 +
* Style rubric and coding guidelines
 +
 
 +
* Quickstart
 +
* Install
 +
* Configuration and Tuning
 +
* Troubleshooting
 +
* Index
 +
 
 +
Robert Read: readthedocs.org
 +
 
 +
John Hammond: The code also needs to be cleaner and with more clairty. ex.better variable names
 +
 
 +
Andreas: Instead of external design documents, code needs to be documented. Header and function blocks that clearly articulate what the code does.
 +
 
 +
== What are we doing with developer resources? ==
 +
 
 +
Chris: People seem to think it is inevitable it will move to lustre.org but I don't necessarily agre. We're in too much a limbo right now. Do people think have to move them away?
 +
 
 +
James: Who is going to run this?
 +
 
 +
Simms: Need to prioritize what we're going to do, and identify how we're going to do it
 +
 
 +
SDSC: People that object doing it under Intel need to pay for doing this
 +
 
 +
Andreas: The Intel system is open, and free for the community to use. This would be a duplication of effort.There is no clear benefit to wholesale changing this.
 +
 
 +
Chris: There is a vocal element that says the community needs to host this and get away from Intel. Personally I feel this is a duplication of effort and lots of other things to do.
 +
 
 +
Peter: Why make a bunch of complexity for ourselves? We could easily move the manual and issue tracker to OpenSFS and solve most of the complaints.
 +
 
 +
Andreas: A git.lustre.org URL redirection is a quick fix to this. downloads.lustre.org
 +
 
 +
Robert: I suggest moving the manual to github as a first step towards this.
 +
 
 +
Jason: If there is no pressure from Intel, there is no pressing need change this. Much larger issues to tackle.
 +
 
 +
CEA: Gerrit and git would be easiest to move.
 +
 
 +
Jason Hill: Big hill to get people to change their development work flow.
 +
 
 +
Peter Jones: Sounds like not this year.
 +
 
 +
Robert Read: Mirror the git tree on github.Andreas: I'll donate the lustre github account
 +
 
 +
== Other ==
 +
 
 +
Kyr: Static analysis
 +
 
 +
Cory: This relates to testing efforts
 +
 
 
Chris postponed topic until next LWG meeting
 
Chris postponed topic until next LWG meeting
== Action Items ==
+
 
 +
== Action Items ==
 +
 
 
* Add CCPCheck automated testing to top of the agenda for next LWG meeting
 
* Add CCPCheck automated testing to top of the agenda for next LWG meeting
 
* Richard - indexable manual, implement some of the documentation ideas, URL redirection
 
* Richard - indexable manual, implement some of the documentation ideas, URL redirection

Revision as of 17:06, 23 April 2015

Agenda

  1. Monitoring effort
  2. Are we moving development platform to lustre.org
  3. Testing coordination, getting more communication
  4. Documentation effort, especially on the wiki at lustre.org
  5. Release cadence

Introduction

We need more people to take projects on and lead them, become a champion

Maintain webpage on wiki, what work is going on and what is left to do

Regularly report back to the LWG

Monitoring effort

A lot of people doing a lot on their own, would be great to get together and not repeat

LLNL has LMT - it is not dead, but we don't have DNE support and Javaclient isn't developed

Robert: It collects the data for DNE but it doesn't display the data

Andreas: If the java client is dead, it should be removed from the projectand clearly marked as deprecated. People still use it because they don't know the status

Chris: We're the main guys driving development, not a lot of development from the outside

We'd be happy to collaborate on any other tools

Who will be the project lead on this?

Simms: Folks from Wisconsin are willing to offer up their work

SDSC: Any products they put out we'll deploy, but not volunteering to bethe lead

James: What is every using now to monitor their sites?

Kluge: Logstash, kabanaSDSC: CEA: OpenTSDB, kabana

Jason Hill: homegrown scripts tied into nagios LMT, homegrown scripts, lltop for job monitoringLooking to switch to graphana

Chris: At least consider who at your sites could be a champion for thisproject

This ties into documentation. We need to understand what is in the proc files.

James: We need to know what to measure.

Chris: Friends at Wisconsin have created a wiki page documenting much of this

Andreas: The manual has a lot information. A single page giving a how to on monitoring would help.

Cory: I think what you're asking for, the BWG has made an effort on this.

Simms: They're may have been an effort but nothing has been settled upon. So much effort is going into this, OpenSFS needs to get behind one or two tools to help new Lustre users

Fermi: We should focus efforts to standardize interfaces so people don't need to reproduce collection efforts

James: Ultimately we'll be moving to sysfs which will replace procfs

Chris: We'll also need to use debugfs because sysfs is one line per file

James: debugfs is sequence file based and could help

Fermi: We could take the approach with event model like what was done with changelogs, publish data with thresholds

James: I agree with you, and sysfs can push uevents for event handling.

SDSC: There is a lot of feature releases and change, it is hard for newusers to find out and use these features Wiki is a place to publish standard feature uses

Chris: Landing collateral needs to include this type of information

Andreas: The manual gets updated but there may be a need for a 5-min demo

Chris: Go back and think about what we can collaborate

Documentation

We should start working on a large list of landing collateral for new features, including appropriate documentation

Oleg: People hate new restrictions

Chris: Sure, but that is how we make it happen

Cory: We could set it up so a +1 can review from a publication bot

SDSC: I just need the steps to get it running, it doesn't have to be exhaustive

Andreas: There is stuff in the manual, people just don't look at it

Chris: The manual is almost a lost cause. Mismash, in docbook. There are real barriers

Andreas: Updating the manual isn't what people are looking for, is it a README or a wiki page?

Vitaly: Is this about manual or about design documents? New feature sshould include design docs

Andreas: Design docs are on wiki, but not all PAC leaders publish

Chris: The design docs are pretty poor anyway, and too high level for administrators

Andreas: man pages do describe how to do a lot of this. If they don't look in the manual, or man page, where are they going to look?

Chris: The wiki should be the place to go. Ceph as an example. We should have the same for Lustre. We need editors to create high level starting pages for people who don't know exactly what they're looking for

Alex: Developer documentation is lacking

Chris: True, there is a lot of lacking documentation. We need to start somewhere. There is an OpenSFS contract for the protocol documentation.

Alex: These get outdated quickly.

Chris: All new features that change the protocol must update the protocol documentation with suggested changes. Landing criteria. New landing rules

Vitaly: It would helpful to organize all the existing design documents into a central location.

Chris: the wiki is a great place for that.

Oleg: They should also be in the tree so they match the version you've checked out. Then when patches come in, we'll know they've changed some design and what needs to be updated in the tree.

Peter Jones: Once we have the protocol document, I like the idea of requiring updates. How do you catch every change though?

Chris: Peer review process

Peter Jones: I was thinking of the gatekeepers role, compared to the overall volume of changes this will be a small percentage

Cory: It should be on the inspectors and reviewers

Oleg: There are certain warning flags that would show need for protocol documentation

Chris: Protocol needs to not change with every change. We also need to document on disk protocol If we keep the protocol documentation separate, it will help maintain compatability

Andreas: There clearly has been ongoing efforts to maintain protocol compatibility. Some changes update protocol but don't fundamentally change it. Need to be clear about when feature was introduced.

Chris: Protocol documentation would make it clear if this feature flag is set, you must support this.

Simms: We're all over the place. There is a range of documentation here, high to low. IU has an editor, and when you're doing something technical,you just push it to them and they work it into a document or update existing.

Fermi: We need to lower entry barrier, and allow developer to publish short notes. Tech editor and work it into a more formal document.

Simms: It would be useful to have a person(s) that it would help to have somebody not in this room who is an editor to create a consistent style and enforce it. Better documentation would increase adoption of Lustre.

Chris: Take a step back. We need to focus on how we're going to make progress. People have pet interest, start using the wiki, publish. Who are the main markets for documentation? Starting points for developers, new users, admins. We need a champion who can rally people to create this starting point.

SDSC: We should compare against GPFS and do better than that.

Chris: Google presence is improving, and good information will bubble to the top. We also need focus on the packaging and installation in comparison to GPFS

Simms: suggest brainstorming from people in the room

Fermilab: We need to publish these suggestions to the community so they can contribute too

Simms: I'm prepared to ask IU to use a slice of time from our editors to help with the initiative. We need ideas on what specifically we can help with.

Andreas: OpenSFS could be funding a technical editor to help with this consistent style issue.

Rich: Features are awesome, bug fixes are necessary, but this will have the biggest impact on adoption.

Cory: We brought down the promoter fee, but they're still committed to providing resources. We should ask the board to poll all members to donate a 1/4FTE it might be easier to get something done.B rainstorming exercise about what needs to be done

Chris: We don't have any money, so we'll have to be creative to get something done.

Chris, CEA, Marc are willing to spend some their time

Andreas: Developers are not the people to be updating this. Every university has a "how to use Lustre effectively" It's the consumers that have the most to contribute. At CFS there was a policy of not replying directly to inquiries. Update FAQ and point at that.

James: We get the Mellanox question about once a month

Simms: I've proposed somebody follows lustre-discuss and captures knowledge and adds it to a appropriate venue

Andreas: Are there documentation engines we can leverage here?

Marc: Stackexchange is openly available

Chris: This fractures the documentation. Do we want this and the wiki?

Oleg: It would help for specific questions

John Hammond: We could use the editor for the wiki, and the community would use stackexchange

Chris: We need to cut Operations manual in half. 1/2 operations and other 1/2 for developers LNET should be promoted to the top.

John Hammond: The manual looks like HTML 1.1 and modernization would help attract people to Lustre. Also reformat from one big page.

Andreas: It needs to be google searchable as well.

Chris: We also need to have version releases of manual

Andreas: But this requires backing porting changes

Oleg: At the start of some manuals they have a changelog that specifies at what version things changed. Changelog section seems to be very important.

Andreas: There are facilities for this in the manual already.

Chris: Not exactly what people want, they want it up front. Needs to be mainly flat text, but at the beginning explain what the differences are.

Fermilab: There are issues with formatting. Command line examples go over the edge.

Chris: We need responsible people to report bugs and an editor that goes in and makes the changes.

Jason Hill: Sent a note to their technical editors to sit down with a Lustre admin to begin fleshing this out.

Marc: I'm happy to review changes for technical accuracy.

Andreas: The information people want is there, it needs to be other people that doing some of these things. This is the wrong audience for this discussion.

Peter Jones: Isn't Richard Friedman a technical writer?

Terri: He's paid by the hour and his allotment is already used.

Cory: Can Richard work on some of the ideas we've listed? Getting the manual indexable

Nirmal: The log messages are a whole other challenge. The knowledgeable should help with this.

Chris: This would be great for the stackexchange

Jason Hill: This also relates to tutorial style content that Chris and Jason have discussed. We've talked about talking about tutorial day at LUG or the Lustre Ecosystem workshop

Simms: Terri had a good idea before. What if we were to have developers talk about subsystems? These are the pieces parts, video tape it and put it up on a site. Helps new developers.Like a podcast or a "Lustre hour"

SDSC: Large conferences would be a great place to capture content like this

Fermilab: Everything should at least be in a central location.

Andreas: Do we need to get rid of the source control and review process?

Alexander: It does make it much easier for people to contribute.

SDSC: I think you need both, wiki and manual are separate documents for different purposes.

Chris: We've been slow at updating the manual, but we can make the most progress with the wiki. Work on that first and circle back to the manual.

Simms: We need to identify what documentation

Brainstorming:

  • Tools
  • Howtos
  • Log decoder
  • Parameters and configuration
  • Where to go when something breaks
  • Tutorials* Make it pretty
  • FAQ* LNET* Disks
  • Clients* Servers
  • Best practices
  • PDFs in a container that gives details for each subsystem
  • Design docs
  • Protocol documentation
  • doxygen
  • Documentation inside the code
  • man pages
  • Style rubric and coding guidelines
  • Quickstart
  • Install
  • Configuration and Tuning
  • Troubleshooting
  • Index

Robert Read: readthedocs.org

John Hammond: The code also needs to be cleaner and with more clairty. ex.better variable names

Andreas: Instead of external design documents, code needs to be documented. Header and function blocks that clearly articulate what the code does.

What are we doing with developer resources?

Chris: People seem to think it is inevitable it will move to lustre.org but I don't necessarily agre. We're in too much a limbo right now. Do people think have to move them away?

James: Who is going to run this?

Simms: Need to prioritize what we're going to do, and identify how we're going to do it

SDSC: People that object doing it under Intel need to pay for doing this

Andreas: The Intel system is open, and free for the community to use. This would be a duplication of effort.There is no clear benefit to wholesale changing this.

Chris: There is a vocal element that says the community needs to host this and get away from Intel. Personally I feel this is a duplication of effort and lots of other things to do.

Peter: Why make a bunch of complexity for ourselves? We could easily move the manual and issue tracker to OpenSFS and solve most of the complaints.

Andreas: A git.lustre.org URL redirection is a quick fix to this. downloads.lustre.org

Robert: I suggest moving the manual to github as a first step towards this.

Jason: If there is no pressure from Intel, there is no pressing need change this. Much larger issues to tackle.

CEA: Gerrit and git would be easiest to move.

Jason Hill: Big hill to get people to change their development work flow.

Peter Jones: Sounds like not this year.

Robert Read: Mirror the git tree on github.Andreas: I'll donate the lustre github account

Other

Kyr: Static analysis

Cory: This relates to testing efforts

Chris postponed topic until next LWG meeting

Action Items

  • Add CCPCheck automated testing to top of the agenda for next LWG meeting
  • Richard - indexable manual, implement some of the documentation ideas, URL redirection