Zero to Hero
Starting from scratch
HPC systems provide value through their ability to run big jobs faster than other systems. It is natural to look for performance metrics; users need to know if your system can run their job, managers want to know if they received good value from their vendor, and so on.
When a system is procured, the vendor often provides peak performance numbers. These are based on the theoretical maximum for each component. Storage benchmarks can often run at 80% or more of the theoretical peak. Good results are almost never achieved on the first attempt. What follows here is a guide to improving benchmark results to get to the desired result.
A normal test of read and write rates begins with a program that creates random data and writes it to a file. The new file is then read by the processor and sent to /dev/null.
The first time you try this, expect to be underwhelmed. It's far from the peak that you expected. What could be wrong? Where is the bottleneck? Is the problem with IO parameters such as the block size, or is there a problem with parallelism?
Start over, restrict your test to one processor and one IO device. Determine the optimal parameters for this configuration. Then see if another processor helps. See if you can double the speed when you are using two IO devices. Find out if one processor can drive two IO devices at twice the speed of one device.
If you have done this sort of thing before, you are probably familiar with the dd command. If not, it’s time to go to the man page for dd and check it out. Then run something like:
dd if=/dev/zero of=/mnt/filesystem/testdir/file bs=4k count=1m Compare your test program results (if you started with another test program) with dd results. Vary the block size. You should see improvement as you start to add threads, but the returns are probably diminishing. Make sure yournetwork isn’t the bottleneck. If you've been working with networked filesystems before, this may have happened.
Time to add another client to the mix. Taking the scripts and configurations you've got from the single client dd script, you expand it out to 2 clients. You get roughly double the performance, now we're getting somewhere.
So you start adding in 4, then 8 then ... clients watching the numbers creep up. However, when you started this whole thing you weren't really thinking of running dozens, hundreds, or thousands? threads across many, many different clients. Things are starting to get messy.
Someone else has got to have done this before, right? Maybe tools exist that can help sort all of this out.
Hopefully you've got your compute cluster up and running, and you can run jobs on it. Time to go hunting for tools to use. IOR, IOzone and xdd are the tools that the Hero Run task group uses (which is why you're here). So you grab one, find the basic scripts to run them on your cluster, and start firing away, tweaking variables in the scripts, recording numbers, creeping upwards in performance, and hopefully reaching your goal.