Zero to Hero

Starting from scratch
You are a first time user of high-speed storage, possibly HPC in general, and want to see what it can do. Big numbers would be great, you've been told that your new system is capable of moving truly staggering amounts of data at blistering speeds, and you want to see it happen.

This is a tale about locating and eliminating bottlenecks. Some you may or may not have encountered before, or expected.

First attempts
Well, the way you'd test normal storage is to get a file from one client server on your network to another. So you try it, and copy an iso image, or other large file you've got sitting somewhere to your completely unused parallel filesystem. You're expecting big things, but watching it go, you're underwhelmed. It's nothing like what you've been promised, maybe not by several orders of magnitude! What could be wrong?

Well, maybe it's threads, so you copy the same file several times in parallel. That should speed things up, and it might. But you're still stuck with some pretty bad performance. Where do you go from here?

dd
The dd command is something you've heard of, maybe used on occasion for testing disk arrays, you might as well give it a shot, and maybe reading from your local drive is causing some problems. So you run something like:

dd if=/dev/zero of=/mnt/filesystem/testdir/file bs=4k count=1m

You get something that is better than your previous attempts, but still isn't near what you should get.

More threads? can't hurt. You see some improvement as you start to add threads, but the returns are diminishing, and you're still not seeing the numbers you were promised. You're pretty sure you're out of network bandwidth on the client. If you've been working with networked filesystems before, this may or may not have happened to you.

More clients
Time to add another client to the mix. Taking the scripts and configurations you've got from the single client dd script, you expand it out to 2 clients. You get roughly double the performance, now we're getting somewhere.

So you start adding in 4, then 8 then ... clients watching the numbers creep up. However, when you started this whole thing you weren't really thinking of running dozens, hundreds, or thousands? threads across many, many different clients. Things are starting to get messy.

Someone else has got to have done this before, right? Maybe tools exist that can help sort all of this out.

Clustered runs
Hopefully you've got your compute cluster up and running, and you can run jobs on it. Time to go hunting for tools to use. IOR, IOzone and xdd are the tools that the Hero Run task group uses (which is why you're here). So you grab one, find the basic scripts to run them on your cluster, and start firing away, tweaking variables in the scripts, recording numbers, creeping upwards in performance, and hopefully reaching your goal.