6. Common Questions about the Engine


6.86 Should I use striping?

On 7th September 2001 ahamm@sanderson.net.au (Andrew Hamm) wrote:-

I would imagine striping is a good easy way to exploit a disk of 4-8Gb from a few years ago, but as we charge head-long into the world of 18, 40, 70gb disks, it's just starting to get ridiculous. In fact, until Informix get around the 2G limit per chunk, we'll all start to suffer from relatively poor performance.

A typical SCSI disk of today can handle between 5-10 parallel threads for maximum throughput. Here's a recent analysis of a disk on an L-1000 series HP

/dev/vg00/ronline00 :  1 concurrent read threads        400  KB/sec.
/dev/vg00/ronline00 :  2 concurrent read threads        444  KB/sec.
/dev/vg00/ronline00 :  3 concurrent read threads        600  KB/sec.
/dev/vg00/ronline00 :  4 concurrent read threads        664  KB/sec.
/dev/vg00/ronline00 :  5 concurrent read threads        765  KB/sec.
/dev/vg00/ronline00 :  6 concurrent read threads        774  KB/sec.
/dev/vg00/ronline00 :  7 concurrent read threads        851  KB/sec.
/dev/vg00/ronline00 :  8 concurrent read threads        888  KB/sec.
/dev/vg00/ronline00 :  9 concurrent read threads        875  KB/sec.
/dev/vg00/ronline00 :  10 concurrent read threads       400  KB/sec.

The write performance curve does the same thing. See how the speed of the disk slumps when you have too much activity, but TOTAL throughput is maximised at between 7-9 threads? Other disks tested on other boxes are a little more extreme - for example, the 1-thread performance is approx 300, rising to 1200 and then collapsing to 300 again at about 10 threads.

SO: lets consider a 4-way stripe built on 4 18gb disks. Since max informix chunk size is 2Gb, each disk will contribute 0.5 gig. That implies you can get 36 chunks out of each disk. That means each disk will be forced to handle 36 threads on a busy system.

So where in the performance curve do you think the disk will be then? I assure you, an old test upto 25 threads did NOT see a fresh rise in the curve.

Let's say you are on disks where this worst-case performance is 300, and best is 1200 at 9 threads. By the magic of numbers, you'll get 4x performance increase from your disks if you allocate 9 2Gb chunks on each disk without striping. Each disk will give you total throughput of 1200 instead of 300. That's 4 times faster in anyone's language.

As your disk gets bigger you have to either choose not to use all of it, or try to setup a few chunks for large, lazy tables wot rarely get accessed. With this kind of allocation, you get some free chunks that would help to consume space on a 24Gb or higher disk. But with our system, there aren't more than approx 15% of tables which can be farmed off to lazy chunks. The rest will be busy.

If you deliberately reduce your AIO's and cleaners to 9 (roughly), then you'll reduce the traffic on each disk down to 9 (optimum performance), but guess what? You'll have to wait to access or flush the other 27 chunk stripes. There's no free lunch...

A stripe-set might give you more throughput when you are the only writer, but in a real engine, you won't have only one thread. I can see a 4-way stripe exploiting upto 5Gb disks effectively without having to think too much about spreading your tables intelligently, but until we escape the 2Gb chunk limit, striping is going to be less effective unless the striping is performed by hot hardware which is capable of hiding the engine threads behind masses of cache memory and intelligent algorithms. I've seen recently that DG Clariion arrays have a practically flat performance curve, which was extremely interesting. But that kind of hardware costs more than a small business can afford. If you can afford it, great.

All of this theory hinges around effective distribution of your tables so that the traffic is evenly spread across the chunks. If you were sitting there monitoring your own system then you've got the time to monitor onstat -g iof and sysmaster:sysptprof and shift the tables around. I've had to go in cold to sites and reload in a day, so it's taken me several attempts to get happy with my algorithm.

With this allocation strategy on multiple disks with optimum number of chunks, I've had great success squeezing plenty more performance from systems. Sites with checkpoints hitting 15 seconds or worse, or sites which were forced to use ridiculously low LRU percentages to combat checkpoint times have ended up with checkpoints of just 2-3 seconds max, and no foreground or LRU writes. You gotta be happy with that.

Don't forget, LRU writes steal bandwith from queries, just a foreground writes steal bandwidth from queries AND updates. So if you can get away from LRU writes then surely you'll have better query performance, not including the overall 4x or greater improvement this practice can give you.

Of course, this theory against striping needs backing up with hard tests. There may be a ghost in the machine which affects the results. Next time a HP with multiple disks passes beneath my fingers I'll do some damage. But until then, as John Cleese once said: A-hemm - my theory of the dinosaur - accc-hem...

Any refutation backed up with hard figures will be very welcome.