Lustre GSSAPI/Kerberos Repair
One of our goals of Contract_SFS-DEV-002 is to provide lightweight, shared-key authentication for sites that may have trouble finding the resources or political will to maintain their own Kerberos infrastructure. The natural choice, we thought, was to plug in these features using the existing GSSAPI abstraction, thereby interfering with as little other code as possible, including the existing Kerberos support.
We quickly ran into (and fixed) build issues with the GSSAPI and Kerberos code. We also corrected some inappropriate direct dependencies on Kerberos (where the GSSAPI abstraction wasn't used). In talking with others at the LAD conference last month, though, we confirmed nagging fears of more issues lurking under the surface. I want to find and fix these bugs so we can move on with our GSSAPI-dependent implementation. I suspect others here want working Kerberos support.
I've written a "null" (pass-through) GSSAPI mechanism whose sole purpose is to help us test the GSSAPI code paths. I'm currently working on getting Lustre up and running with GSSAPI enabled, keyring support enabled, but using only the null mechanism to pass traffic through with no authentication or encryption.
Perhaps now is a good time for others on the list to share what experiences they've had with this code and where they've run into issues. Then maybe we can identify bugs, file them, and get to work on fixing them.
So far, a few Kerberos/GSS patches have been submitted to the Lustre Gerrit repository:
- LU-4113 gss: uncatched error in gss_svc_upcall (fixed in Lustre 2.6.0)
- LU-4085 build: gss/krb5 is disabled (fixed in Lustre 2.6.0)
- LU-4012 gss: upcall fails due to removed calls (fixed in Lustre 2.6.0)
- LU-3778 GSS doesn't know about proxy subsystems (open)
- LU-6020 Bugfixes for GSS/Kerberos (open)
If you are interested to contribute to Lustre Kerberos, please see:
Note that while the majority of the Lustre code is tested for every patch that is submitted (8h or so per patch) the Kerberos code is NOT currently in the automated environment. It would be great if people with an understanding of Kerberos could look at the test scripts in lustre/tests/sanity-sec.sh and lustre/tests/sanity-gss.sh to see what needs to be done to get them working. Having those tests working would ensure that the Lustre Kerberos code would avoid breakage in the future.