zfs: copies=n is not a substitute for device redundancy!
Published by Jim Salter // May 2nd, 2016
I’ve been seeing a lot of misinformation flying around the web lately about the zfs dataset-level feature copies=n. To be clear, dangerous misinformation. So dangerous, I’m going to go ahead and give you the punchline in the title of this post and in its first paragraph: copies=n does not give you device fault tolerance!
Why does copies=n actually exist then? Well, it’s a sort of (extremely) poor cousin that helps give you a better chance of surviving data corruption. Let’s say you have a laptop, you’ve set copies=3 on some extremely critical work-related datasets, and the drive goes absolutely bonkers and starts throwing tens of thousands of checksum errors. Since there’s only one disk in the laptop, ZFS can’t correct the checksum errors, only detect them… except on that critical dataset, maybe and hopefully, because each block has multiple copies. So if a given block has been written three times and any single copy of that block reads so as to match its validation hash, that block will get served up to you intact.
So far, so good. The problem is that I am seeing people advocating scenarios like “oh, I’ll just add five disks as single-disk vdevs to a pool, then make sure to set copies=2, and that way even if I lose a disk I still have all the data.” No, no, and no. But don’t take my word for it: let’s demonstrate.
First, let’s set up a test pool using virtual disks.
root@banshee:/tmp# qemu-img create -f qcow2 0.qcow2 10G ; qemu-img create -f qcow2 1.qcow2 10G Formatting '0.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off Formatting '1.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off root@banshee:/tmp# qemu-nbd -c /dev/nbd0 /tmp/0.qcow2 ; qemu-nbd -c /dev/nbd1 /tmp/1.qcow2 root@banshee:/tmp# zpool create test /dev/nbd0 /dev/nbd1 root@banshee:/tmp# zpool status test pool: test state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 nbd0 ONLINE 0 0 0 nbd1 ONLINE 0 0 0 errors: No known data errors
Now let’s set copies=2, and then create a couple of files in our pool.
root@banshee:/tmp# zfs set copies=2 test root@banshee:/tmp# dd if=/dev/urandom bs=4M count=1 of=/test/test1 1+0 records in 1+0 records out 4194304 bytes (4.2 MB) copied, 0.310805 s, 13.5 MB/s root@banshee:/tmp# dd if=/dev/urandom bs=4M count=1 of=/test/test2 1+0 records in 1+0 records out 4194304 bytes (4.2 MB) copied, 0.285544 s, 14.7 MB/s
Let’s confirm that copies=2 is working.
We should see about 8MB of data on each of our virtual disks – one for each copy of each of our 4MB test files.
root@banshee:/tmp# ls -lh *.qcow2 -rw-r--r-- 1 root root 9.7M May 2 13:56 0.qcow2 -rw-r--r-- 1 root root 9.8M May 2 13:56 1.qcow2
Yep, we’re good – we’ve written a copy of each of our two 4MB files to each virtual disk.
Now fail out a disk:
root@banshee:/tmp# zpool export test root@banshee:/tmp# qemu-nbd -d /dev/nbd1 /dev/nbd1 disconnected
Will a pool with copies=2 and one missing disk import?
root@banshee:/tmp# zpool import pool: test id: 15144803977964153230 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://zfsonlinux.org/msg/ZFS-8000-6X config: test UNAVAIL missing device nbd0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined.
That’s a resounding “no”.
Can we force an import?
root@banshee:/tmp# zpool import -f test cannot import 'test': one or more devices is currently unavailable
No – your data is gone.
Please let this be a lesson!
No, copies=n is not a substitute for redundancy or parity, and yes, losing any vdev does lose the pool.