BWG FSM IO Process Model

From OpenSFS
Jump to: navigation, search


Return to File System Monitoring page.

(Paraphrase of discussion with Bevin Brett, December 2014)

There are two ways people approach understanding the performance of a new (to them) file system. Someone who is an expert in the domain often starts with a pretty detailed mental model of the system and will do a few experiments to calibrate that model. You see that a lot, and it can be good for a shoot-from-the-hip situation.

The other way of approaching a new system is more black-box. In that case you run a series of experiments, and only as you get the data on performance throughout the space of possible experiments do you then develop a model to explain what you see. If you aren't a super expert in the particular hardware and software then this is pretty much what you have to do.

I've done these experiments enough that I have some expectations, but my main interest is in finding problems and issues. That means I want to look closely enough that I will find ways the system doesn't fit my preconceived notions. Needless to say, the black-box approach is much more time consuming.

The mental image of file system performance follows the behavior of an application running on a set compute nodes. The application has periods when it is not conducting I/O (it is computing or waiting at a barrier or whatever) and then periods when it does I/O. During the actual processing of a write() system call, for example, we can guess that the node will work on that I/O as fast as it can, but the I/O may consist of a long sequence of write() calls, that have other activity mixed in. Now the node will probably just copy the write() to a memory buffer and that will go fast. Then the OS will drain those buffers to disk at a somewhat slower rate. So long as the buffers never entirely fill the I/O looks like it is very fast. As soon as the OS has to block the application while buffers drain then the I/O looks like it is slowing down to disk (or controller or network) speed. Finally, there can be multiple layers of such buffer cache behavior, so performance can be intimately dependent of the amount, size, frequency, and timing of the many write() calls on the many nodes, as well as the details of the system construction. Repeat the foregoing with read I/Os in place of wirte I/Os except that the buffer cache can't hide disk and network read latency while it can read form previously cached writes and it can read ahead.

Thus one's mental model looks like a very vague and highly incorrect queuing network who's properties you want to discover. If you have some reason to think that your mental model is very close to correct then you may only do a few experiments to prove that. Of course, that means you won't be able to see if and where it's wrong in many cases :)