LWG Minutes 2015-04-15: Difference between revisions

From OpenSFS Wiki
Jump to navigation Jump to search
(Created page with "== Agenda == # Monitoring effort# Are we moving development platform to lustre.org # Testing coordination, getting more communication# Documentation effort, especially on the ...")
(No difference)

Revision as of 16:52, 23 April 2015

== Agenda == # Monitoring effort# Are we moving development platform to lustre.org

  1. Testing coordination, getting more communication# Documentation effort, especially on the wiki at lustre.org# Release cadenceWe need more people to take projects on and lead them, become a championMaintain webpage on wiki, what work is going on and what is left to doRegularly report back to the LWG== Monitoring effort ==A lot of people doing a lot on their own, would be great to get togetherand not repeatLLNL has LMT - it is not dead, but we don't have DNE support and Javaclient isn't developedRobert: It collects the data for DNE but it doesn't display the dataAndreas: If the java client is dead, it should be removed from the projectand clearly marked as deprecated. People still use it because they don't know the statusChris: We're the main guys driving development, not a lot of developmentfrom the outsideWe'd be happy to collaborate on any other toolsWho will be the project lead on this?Simms: Folks from Wisconsin are willing to offer up their workSDSC: Any products they put out we'll deploy, but not volunteering to bethe leadJames: What is every using now to monitor their sites?Kluge: Logstash, kabanaSDSC: CEA: OpenTSDB, kabanaJason Hill: homegrown scripts tied into nagiosLMT, homegrown scripts, lltop for job monitoringLooking to switch to graphanaChris: At least consider who at your sites could be a champion for thisprojectThis ties into documentation. We need to understand what is in the procfiles.James: We need to know what to measure.Chris: Friends at Wisconsin have created a wiki page documenting much ofthisAndreas: The manual has a lot information. A single page giving a how toon monitoring would help.Cory: I think what you're asking for, the BWG has made an effort on this.Simms: They're may have been an effort but nothing has been settled upon.So much effort is going into this, OpenSFS needs to get behind one or twotools to help new Lustre usersFermi: We should focus efforts to standardize interfaces so people don'tneed to reproduce collection effortsJames: Ultimately we'll be moving to sysfs which will replace procfsChris: We'll also need to use debugfs because sysfs is one line per fileJames: debugfs is sequence file based and could helpFermi: We could take the approach with event model like what was done withchangelogs, publish data with thresholdsJames: I agree with you, and sysfs can push uevents for event handling.SDSC: There is a lot of feature releases and change, it is hard for newusers to find out and use these featuresWiki is a place to publish standard feature usesChris: Landing collateral needs to include this type of informationAndreas: The manual gets updated but there may be a need for a 5-min demoChris: Go back and think about what we can collaborate== Documentation ==We should start working on a large list of landing collateral for newfeatures, including appropriate documentationOleg: People hate new restrictionsChris: Sure, but that is how we make it happenCory: We could set it up so a +1 can review from a publication botSDSC: I just need the steps to get it running, it doesn't have to beexhaustiveAndreas: There is stuff in the manual, people just don't look at itChris: The manual is almost a lost cause. Mismash, in docbook. There arereal barriersAndreas: Updating the manual isn't what people are looking for, is it aREADME.dne or a wiki page?Vitaly: Is this about manual or about design documents? New featuresshould include design docsAndreas: Design docs are on wiki, but not all PAC leaders publishChris: The design docs are pretty poor anyway, and too high level foradministratorsAndreas: man pages do describe how to do a lot of this. If they don't lookin the manual, or man page, where are they going to look?Chris: The wiki should be the place to go. Ceph as an example. We shouldhave the same for LustreWe need editors to create high level starting pages for people who don'tknow exactly what they're looking forAlex: Developer documentation is lackingChris: True, there is a lot of lacking documentation. We need to startsomewhere. There is an OpenSFS contract for the protocol documentation.Alex: These get outdated quickly.Chris: All new features that change the protocol must update the protocoldocumentation with suggested changes. Landing criteria.New landing rules.Vitaly: It would helpful to organize all the existing design documentsinto a central location.Chris: the wiki is a great place for that.Oleg: They should also be in the tree so they match the version you'vechecked out. Then when patches come in, we'll know they've changed some design and whatneeds to be updated in the tree.Peter Jones: Once we have the protocol document, I like the idea ofrequiring updates.How do you catch every change though?Chris: Peer review processPeter Jones: I was thinking of the gatekeepers role, compared to theoverall volume of changes this will be a small percentageCory: It should be on the inspectors and reviewersOleg: There are certain warning flags that would show need for protocoldocumentationChris: Protocol needs to not change with every change. We also need todocument on disk protocolIf we keep the protocol documentation separate, it will help maintaincompatabilityAndreas: There clearly has been ongoing efforts to maintain protocolcompatibility. Some changes update protocol but don't fundamentally changeit. Need to be clear about when feature was introduced.Chris: Protocol documentation would make it clear if this feature flag isset, you must support this.Simms: We're all over the place. There is a range of documentation here,high to low. IU has an editor, and when you're doing something technical,you just push it to them and they work it into a document or updateexisting. Fermi: We need to lower entry barrier, and allow developer to publishshort notes. Tech editor and work it into a more formal document.Simms: It would be useful to have a person(s) that it would help to havesomebody not in this room who is an editor to create a consistent styleand enforce it.Better documentation would increase adoption of Lustre.Chris: Take a step back. We need to focus on how we're going to makeprogress. People have pet interest, start using the wiki, publish.Who are the main markets for documentation? Starting points fordevelopers, new users, admins. We need a champion who can rally people tocreate this starting point.SDSC: We should compare against GPFS and do better than that.Chris: Google presence is improving, and good information will bubble tothe topWe also need focus on the packaging and installation in comparison to GPFSSimms: suggest brainstorming from people in the roomFermilab: We need to publish these suggestions to the community so theycan contribute tooSimms: I'm prepared to ask IU to use a slice of time from our editors tohelp with the initiativeWe need ideas on what specifically we can help with.Andreas: OpenSFS could be funding a technical editor to help with thisconsistent style issue.Rich: Features are awesome, bug fixes are necessary, but this will havethe biggest impact on adoption.Cory: We brought down the promoter fee, but they're still committed toproviding resources. We should ask the board to poll all members to donatea 1/4FTE it might be easier to get something done.Brainstorming exercise about what needs to be done:Chris: We don't have any money, so we'll have to be creative to getsomething done. Chris, CEA, Marc are willing to spend some their tomeAndreas: Developers are not the people to be updating this. Everyuniversity has a "how to use Lustre effectively" It's the consumers thathave the most to contribute.At CFS there was a policy of not replying directly to inquiries. UpdateFAQ and point at that.James: We get the Mellanox question about once a monthSimms: I've proposed somebody follows lustre-discuss and capturesknowledge and adds it to a appropriate venueAndreas: Are there documentation engines we can leverage here?Marc: Stackexchange is openly availableChris: This fractures the documentation. Do we want this and the wiki?Oleg: It would help for specific questionsJohn Hammond: We could use the editor for the wiki, and the communitywould use stackexchangeChris: We need to cut Operations manual in half. 1/2 operations and other1/2 for developersLNET should be promoted to the top.John Hammond: The manual looks like HTML 1.1 and modernization would helpattract people to Lustre. Also reformat from one big page.Andreas: It needs to be google searchable as well.Chris: We also need to have version releases of manualAndreas: But this requires backing porting changesOleg: At the start of some manuals they have a changelog that specifies atwhat version things changedChangelog section seems to be very important.Andreas: There are facilities for this in the manual already.Chris: Not exactly what people want, they want it up front.Needs to be mainly flat text, but at the beginning explain what thedifferences are.Fermilab: There are issues with formatting. Command line examples go overthe edge.Chris: We need responsible people to report bugs and an editor that goesin and makes the changes.Jason Hill: Sent a note to their technical editors to sit down with aLustre admin to begin fleshing this out.Marc: I'm happy to review changes for technical accuracy.Andreas: The information people want is there, it needs to be other peoplethat doing some of these things. This is the wrong audience for thisdiscussion.Peter Jones: Isn't Richard Friedman a technical writer?Terri: He's paid by the hour and his allotment is already used.Cory: Can Richard work on some of the ideas we've listed? Getting themanual indexableNirmal: The log messages are a whole other challenge. The knowledgeableshould help with this.Chris: This would be great for the stackexchangeJason Hill: This also relates to tutorial style content that Chris andJason have discussedWe've talked about talking about tutorial day at LUG or the LustreEcosystem workshopSimms: Terri had a good idea before. What if we were to have developerstalk about subsystems? These are the pieces parts, video tape it and putit up on a site. Helps new developers.Like a podcast or a "Lustre hour"SDSC: Large conferences would be a great place to capture content like thisFermilab: Everything should at least be in a central location.Andreas: Do we need to get rid of the source control and review process?Alexander: It does make it much easier for people to contribute.SDSC: I think you need both, wiki and manual are separate documents fordifferent purposes.Chris: We've been slow at updating the manual, but we can make the mostprogress with the wiki. Work on that first and circle back to the manual.Simms: We need to identify what documentationBrainstorming:
  • Tools* Howtos* Log decoder* Parameters and configuration* Where to go when something breaks* Tutorials* Make it pretty* FAQ* LNET* Disks* Clients* Servers* Best practices* PDFs in a container that gives details for each subsystem* Design docs* Protocol documentation* doxygen* Documentation inside the code* man pages* Style rubric and coding guidelines
  • Quickstart* Install * Configuration and Tuning* Troubleshooting* IndexRobert Read: readthedocs.orgJohn Hammond: The code also needs to be cleaner and with more clairty. ex.better variable namesAndreas: Instead of external design documents, code needs to bedocumented. Header and function blocks that clearly articulate what thecode does.== What are we doing with developer resources? ==Chris: People seem to think it is inevitable it will move to lustre.orgbut I don't necessarily agreeWe're in too much a limbo right now.Do people think have to move them away?James: Who is going to run this?Simms: Need to prioritize what we're going to do, and identify how we'regoing to do itSDSC: People that object doing it under Intel need to pay for doing thisAndreas: The Intel system is open, and free for the community to use. Thiswould be a duplication of effort.There is no clear benefit to wholesale changing this.Chris: There is a vocal element that says the community needs to host thisand get away from Intel.Personally I feel this is a duplication of effort and lots of other thingsto do.Peter: Why make a bunch of complexity for ourselves? We could easily movethe manual and issue tracker to OpenSFS and solve most of the complaints.Andreas: A git.lustre.org URL redirection is a quick fix to this.downloads.lustre.orgRobert: I suggest moving the manual to github as a first step towards this.Jason: If there is no pressure from Intel, there is no pressing needchange this. Much larger issues to tackle.CEA: Gerrit and git would be easiest to move.Jason Hill: Big hill to get people to change their development work flow.Peter Jones: Sounds like not this year.Robert Read: Mirror the git tree on github.Andreas: I'll donate the lustre github account== Other ==Kyr: Static analysisCory: This relates to testing efforts

Chris postponed topic until next LWG meeting

== Action Items ==
  • Add CCPCheck automated testing to top of the agenda for next LWG meeting
  • Richard - indexable manual, implement some of the documentation ideas, URL redirection