LWG Minutes 2015-04-15

Agenda

 * 1) Monitoring effort
 * 2) Are we moving development platform to lustre.org
 * 3) Testing coordination, getting more communication
 * 4) Documentation effort, especially on the wiki at lustre.org
 * 5) Release cadence

Introduction
We need more people to take projects on and lead them, become a champion

Maintain webpage on wiki, what work is going on and what is left to do

Regularly report back to the LWG

Monitoring effort
A lot of people doing a lot on their own, would be great to get together and not repeat

LLNL has LMT - it is not dead, but we don't have DNE support and Javaclient isn't developed

Robert: It collects the data for DNE but it doesn't display the data

Andreas: If the java client is dead, it should be removed from the projectand clearly marked as deprecated. People still use it because they don't know the status

Chris: We're the main guys driving development, not a lot of development from the outside

We'd be happy to collaborate on any other tools

Who will be the project lead on this?

Simms: Folks from Wisconsin are willing to offer up their work

SDSC: Any products they put out we'll deploy, but not volunteering to bethe lead

James: What is every using now to monitor their sites?

Kluge: Logstash, kabanaSDSC: CEA: OpenTSDB, kabana

Jason Hill: homegrown scripts tied into nagios LMT, homegrown scripts, lltop for job monitoringLooking to switch to graphana

Chris: At least consider who at your sites could be a champion for thisproject

This ties into documentation. We need to understand what is in the proc files.

James: We need to know what to measure.

Chris: Friends at Wisconsin have created a wiki page documenting much of this

Andreas: The manual has a lot information. A single page giving a how to on monitoring would help.

Cory: I think what you're asking for, the BWG has made an effort on this.

Simms: They're may have been an effort but nothing has been settled upon. So much effort is going into this, OpenSFS needs to get behind one or two tools to help new Lustre users

Fermi: We should focus efforts to standardize interfaces so people don't need to reproduce collection efforts

James: Ultimately we'll be moving to sysfs which will replace procfs

Chris: We'll also need to use debugfs because sysfs is one line per file

James: debugfs is sequence file based and could help

Fermi: We could take the approach with event model like what was done with changelogs, publish data with thresholds

James: I agree with you, and sysfs can push uevents for event handling.

SDSC: There is a lot of feature releases and change, it is hard for newusers to find out and use these features Wiki is a place to publish standard feature uses

Chris: Landing collateral needs to include this type of information

Andreas: The manual gets updated but there may be a need for a 5-min demo

Chris: Go back and think about what we can collaborate

Documentation
We should start working on a large list of landing collateral for new features, including appropriate documentation

Oleg: People hate new restrictions

Chris: Sure, but that is how we make it happen

Cory: We could set it up so a +1 can review from a publication bot

SDSC: I just need the steps to get it running, it doesn't have to be exhaustive

Andreas: There is stuff in the manual, people just don't look at it

Chris: The manual is almost a lost cause. Mismash, in docbook. There are real barriers

Andreas: Updating the manual isn't what people are looking for, is it a README or a wiki page?

Vitaly: Is this about manual or about design documents? New feature sshould include design docs

Andreas: Design docs are on wiki, but not all PAC leaders publish

Chris: The design docs are pretty poor anyway, and too high level for administrators

Andreas: man pages do describe how to do a lot of this. If they don't look in the manual, or man page, where are they going to look?

Chris: The wiki should be the place to go. Ceph as an example. We should have the same for Lustre. We need editors to create high level starting pages for people who don't know exactly what they're looking for

Alex: Developer documentation is lacking

Chris: True, there is a lot of lacking documentation. We need to start somewhere. There is an OpenSFS contract for the protocol documentation.

Alex: These get outdated quickly.

Chris: All new features that change the protocol must update the protocol documentation with suggested changes. Landing criteria. New landing rules

Vitaly: It would helpful to organize all the existing design documents into a central location.

Chris: the wiki is a great place for that.

Oleg: They should also be in the tree so they match the version you've checked out. Then when patches come in, we'll know they've changed some design and what needs to be updated in the tree.

Peter Jones: Once we have the protocol document, I like the idea of requiring updates. How do you catch every change though?

Chris: Peer review process

Peter Jones: I was thinking of the gatekeepers role, compared to the overall volume of changes this will be a small percentage

Cory: It should be on the inspectors and reviewers

Oleg: There are certain warning flags that would show need for protocol documentation

Chris: Protocol needs to not change with every change. We also need to document on disk protocol If we keep the protocol documentation separate, it will help maintain compatability

Andreas: There clearly has been ongoing efforts to maintain protocol compatibility. Some changes update protocol but don't fundamentally change it. Need to be clear about when feature was introduced.

Chris: Protocol documentation would make it clear if this feature flag is set, you must support this.

Simms: We're all over the place. There is a range of documentation here, high to low. IU has an editor, and when you're doing something technical,you just push it to them and they work it into a document or update existing.

Fermi: We need to lower entry barrier, and allow developer to publish short notes. Tech editor and work it into a more formal document.

Simms: It would be useful to have a person(s) that it would help to have somebody not in this room who is an editor to create a consistent style and enforce it. Better documentation would increase adoption of Lustre.

Chris: Take a step back. We need to focus on how we're going to make progress. People have pet interest, start using the wiki, publish. Who are the main markets for documentation? Starting points for developers, new users, admins. We need a champion who can rally people to create this starting point.

SDSC: We should compare against GPFS and do better than that.

Chris: Google presence is improving, and good information will bubble to the top. We also need focus on the packaging and installation in comparison to GPFS

Simms: suggest brainstorming from people in the room

Fermilab: We need to publish these suggestions to the community so they can contribute too

Simms: I'm prepared to ask IU to use a slice of time from our editors to help with the initiative. We need ideas on what specifically we can help with.

Andreas: OpenSFS could be funding a technical editor to help with this consistent style issue.

Rich: Features are awesome, bug fixes are necessary, but this will have the biggest impact on adoption.

Cory: We brought down the promoter fee, but they're still committed to providing resources. We should ask the board to poll all members to donate a 1/4FTE it might be easier to get something done.B rainstorming exercise about what needs to be done

Chris: We don't have any money, so we'll have to be creative to get something done.

Chris, CEA, Marc are willing to spend some their time

Andreas: Developers are not the people to be updating this. Every university has a "how to use Lustre effectively" It's the consumers that have the most to contribute. At CFS there was a policy of not replying directly to inquiries. Update FAQ and point at that.

James: We get the Mellanox question about once a month

Simms: I've proposed somebody follows lustre-discuss and captures knowledge and adds it to a appropriate venue

Andreas: Are there documentation engines we can leverage here?

Marc: Stackexchange is openly available

Chris: This fractures the documentation. Do we want this and the wiki?

Oleg: It would help for specific questions

John Hammond: We could use the editor for the wiki, and the community would use stackexchange

Chris: We need to cut Operations manual in half. 1/2 operations and other 1/2 for developers LNET should be promoted to the top.

John Hammond: The manual looks like HTML 1.1 and modernization would help attract people to Lustre. Also reformat from one big page.

Andreas: It needs to be google searchable as well.

Chris: We also need to have version releases of manual

Andreas: But this requires backing porting changes

Oleg: At the start of some manuals they have a changelog that specifies at what version things changed. Changelog section seems to be very important.

Andreas: There are facilities for this in the manual already.

Chris: Not exactly what people want, they want it up front. Needs to be mainly flat text, but at the beginning explain what the differences are.

Fermilab: There are issues with formatting. Command line examples go over the edge.

Chris: We need responsible people to report bugs and an editor that goes in and makes the changes.

Jason Hill: Sent a note to their technical editors to sit down with a Lustre admin to begin fleshing this out.

Marc: I'm happy to review changes for technical accuracy.

Andreas: The information people want is there, it needs to be other people that doing some of these things. This is the wrong audience for this discussion.

Peter Jones: Isn't Richard Friedman a technical writer?

Terri: He's paid by the hour and his allotment is already used.

Cory: Can Richard work on some of the ideas we've listed? Getting the manual indexable

Nirmal: The log messages are a whole other challenge. The knowledgeable should help with this.

Chris: This would be great for the stackexchange

Jason Hill: This also relates to tutorial style content that Chris and Jason have discussed. We've talked about talking about tutorial day at LUG or the Lustre Ecosystem workshop

Simms: Terri had a good idea before. What if we were to have developers talk about subsystems? These are the pieces parts, video tape it and put it up on a site. Helps new developers.Like a podcast or a "Lustre hour"

SDSC: Large conferences would be a great place to capture content like this

Fermilab: Everything should at least be in a central location.

Andreas: Do we need to get rid of the source control and review process?

Alexander: It does make it much easier for people to contribute.

SDSC: I think you need both, wiki and manual are separate documents for different purposes.

Chris: We've been slow at updating the manual, but we can make the most progress with the wiki. Work on that first and circle back to the manual.

Simms: We need to identify what documentation

Brainstorming:
 * Tools
 * Howtos
 * Log decoder
 * Parameters and configuration
 * Where to go when something breaks
 * Tutorials* Make it pretty
 * FAQ* LNET* Disks
 * Clients* Servers
 * Best practices
 * PDFs in a container that gives details for each subsystem
 * Design docs
 * Protocol documentation
 * doxygen
 * Documentation inside the code
 * man pages
 * Style rubric and coding guidelines


 * Quickstart
 * Install
 * Configuration and Tuning
 * Troubleshooting
 * Index

Robert Read: readthedocs.org

John Hammond: The code also needs to be cleaner and with more clairty. ex.better variable names

Andreas: Instead of external design documents, code needs to be documented. Header and function blocks that clearly articulate what the code does.

What are we doing with developer resources?
Chris: People seem to think it is inevitable it will move to lustre.org but I don't necessarily agre. We're in too much a limbo right now. Do people think have to move them away?

James: Who is going to run this?

Simms: Need to prioritize what we're going to do, and identify how we're going to do it

SDSC: People that object doing it under Intel need to pay for doing this

Andreas: The Intel system is open, and free for the community to use. This would be a duplication of effort.There is no clear benefit to wholesale changing this.

Chris: There is a vocal element that says the community needs to host this and get away from Intel. Personally I feel this is a duplication of effort and lots of other things to do.

Peter: Why make a bunch of complexity for ourselves? We could easily move the manual and issue tracker to OpenSFS and solve most of the complaints.

Andreas: A git.lustre.org URL redirection is a quick fix to this. downloads.lustre.org

Robert: I suggest moving the manual to github as a first step towards this.

Jason: If there is no pressure from Intel, there is no pressing need change this. Much larger issues to tackle.

CEA: Gerrit and git would be easiest to move.

Jason Hill: Big hill to get people to change their development work flow.

Peter Jones: Sounds like not this year.

Robert Read: Mirror the git tree on github.Andreas: I'll donate the lustre github account

Other
Kyr: Static analysis

Cory: This relates to testing efforts

Chris postponed topic until next LWG meeting

Action Items

 * Add CCPCheck automated testing to top of the agenda for next LWG meeting
 * Richard - indexable manual, implement some of the documentation ideas, URL redirection