In earlier coverage pitting ZFS against Linux kernel RAID, some readers had some concerns that we had missed some tricks for mdraid tuning. In particular, Louwrentius wanted us to retest mdadm with bitmaps disabled, and targetnovember thought that perhaps XFS might outperform ext4.
Write intent bitmaps are an mdraid feature that allows disks that have dropped off and re-entered the array to resync rather than rebuild from scratch. The "age" of the bitmap on the returning disk is used to determine what data has been written in its absence—which allows it to be updated with the new data only, rather than rebuilt from scratch.
XFS and ext4 are simply two different filesystems. Ext4 is the default root filesystem on most distributions, and XFS is an enterprise heavy-hitter most commonly seen in arrays in the hundreds or even thousands of tebibytes. We tested both this time, with bitmap support disabled.
Running the entire panoply of tests we used in earlier articles isn't trivial—the full suite, which tests a wide range of topologies, blocksizes, process numbers and I/O types, takes around 18 hours to complete. But we found the time to run some tests against the heavyweight topologies—that is to say, the ones with all eight disks active.
A note on today's results
The framework we used for the ZFS testing automatically destroys, builds, formats, and mounts arrays as well as running the actual tests. Our original mdadm tests were run individually and manually. To make sure we had the best apples-to-apples experience, we adapted the framework to function with mdadm
.
During this adaptation, we discovered a problem with our 4KiB asynchronous write test. For ZFS, we used --numjobs=8 --iodepth=8 --size=512M
. This creates eight separate files of 512MiB apiece, for the eight separate fio
processes to work with. Unfortunately, this filesize is just small enough for mdraid to decide to commit the entire test in a single sequential batch, rather than actually doing 4GiB worth of random writes.
In order to get mdadm to cooperate, we needed to adjust upwards until we reached --size=2G
—at which point mdadm
's write throughput plummeted to less than 20 percent of its "burst" throughput when using smaller files. Unfortunately, this also extends the 4KiB asynchronous write test duration enormously—and even fio
's --time_based
option doesn't help, since in the first few hundred milliseconds, mdraid
has already accepted the entire workload into its write buffer.
Since our test results would otherwise be from slightly different fio
configurations, we ran new tests for both ZFS and mdraid with default bitmaps enabled, in addition to the new --bitmap none
and XFS filesystem tests.
RAIDz2 vs mdraid6
Although we're only testing eight-disk wide configurations today, we are testing both striped parity and striped mirror configurations. First, we'll compare our parity options—ZFS RAIDz2 and Linux mdraid6.
Blocksize 1MiB
When we created a new eight-disk mdraid6 array with bitmap support disabled, our asynchronous writes sped up significantly—but the extra 27.9-percent bump still didn't bring mdraid6 anywhere within shouting distance of the ZFS defaults, much less the properly recordsize=1M
result.
Both reads and synchronous writes were unaffected by bitmap support or lack thereof. RAIDz2 writes are more than double the speed of mdraid6 writes, even with the bitmap, while mdraid6 reads are a little less than double the speed of RAIDz2 reads.
Despite only being tested without bitmap, XFS lagged behind ext4 in every 1MiB test.
Blocksize 4KiB
Small random operations are any conventional RAID6's nightmare. They're not RAIDz2's ideal scenario either—but RAIDz2's ability to avoid being trapped in a read-modify-rewrite cycle brings it a 6:1 write performance advantage vs. mdraid6. Mdraid6 fares much better on random reads, with a 2:1 read advantage.
In these small block tests, XFS held its own with ext4—and even slightly outperformed it on 4KiB asynchronous writes. None of these changes—filesystem or bitmap support—made much impact on mdraid6's 4KiB performance overall.
ZFS Mirrors vs mdraid10
Administrators who need maximum performance should leave the parity arrays behind and move to mirrors. On the mdraid side, mdraid10 outperforms mdraid6 in every performance metric we test—and a ZFS pool of mirrors similarly outperforms mdraid10 in every metric tested.