zfs: copies=n is not a substitute for device redundancy!

Published by Jim Salter // May 2nd, 2016

I’ve been seeing a lot of misinformation flying around the web lately about the zfs dataset-level feature copies=n. To be clear, dangerous misinformation. So dangerous, I’m going to go ahead and give you the punchline in the title of this post and in its first paragraph: copies=n does not give you device fault tolerance!

Why does copies=n actually exist then? Well, it’s a sort of (extremely) poor cousin that helps give you a better chance of surviving data corruption. Let’s say you have a laptop, you’ve set copies=3 on some extremely critical work-related datasets, and the drive goes absolutely bonkers and starts throwing tens of thousands of checksum errors. Since there’s only one disk in the laptop, ZFS can’t correct the checksum errors, only detect them… except on that critical dataset, maybe and hopefully, because each block has multiple copies. So if a given block has been written three times and any single copy of that block reads so as to match its validation hash, that block will get served up to you intact.

So far, so good. The problem is that I am seeing people advocating scenarios like “oh, I’ll just add five disks as single-disk vdevs to a pool, then make sure to set copies=2, and that way even if I lose a disk I still have all the data.” No, no, and no. But don’t take my word for it: let’s demonstrate.

First, let’s set up a test pool using virtual disks.

root@banshee:/tmp# qemu-img create -f qcow2 0.qcow2 10G ; qemu-img create -f qcow2 1.qcow2 10G
Formatting '0.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off 
Formatting '1.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off 
root@banshee:/tmp# qemu-nbd -c /dev/nbd0 /tmp/0.qcow2 ; qemu-nbd -c /dev/nbd1 /tmp/1.qcow2
root@banshee:/tmp# zpool create test /dev/nbd0 /dev/nbd1
root@banshee:/tmp# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    test        ONLINE       0     0     0
      nbd0      ONLINE       0     0     0
      nbd1      ONLINE       0     0     0

errors: No known data errors

Now let’s set copies=2, and then create a couple of files in our pool.

root@banshee:/tmp# zfs set copies=2 test
root@banshee:/tmp# dd if=/dev/urandom bs=4M count=1 of=/test/test1
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 0.310805 s, 13.5 MB/s
root@banshee:/tmp# dd if=/dev/urandom bs=4M count=1 of=/test/test2
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 0.285544 s, 14.7 MB/s

Let’s confirm that copies=2 is working.

We should see about 8MB of data on each of our virtual disks – one for each copy of each of our 4MB test files.

root@banshee:/tmp# ls -lh *.qcow2
-rw-r--r-- 1 root root 9.7M May  2 13:56 0.qcow2
-rw-r--r-- 1 root root 9.8M May  2 13:56 1.qcow2

Yep, we’re good – we’ve written a copy of each of our two 4MB files to each virtual disk.

Now fail out a disk:

root@banshee:/tmp# zpool export test
root@banshee:/tmp# qemu-nbd -d /dev/nbd1
/dev/nbd1 disconnected

Will a pool with copies=2 and one missing disk import?

root@banshee:/tmp# zpool import
   pool: test
     id: 15144803977964153230
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
    devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

    test         UNAVAIL  missing device
      nbd0       ONLINE

    Additional devices are known to be part of this pool, though their
    exact configuration cannot be determined.

That’s a resounding “no”.

Can we force an import?

root@banshee:/tmp# zpool import -f test
cannot import 'test': one or more devices is currently unavailable

No – your data is gone.

Please let this be a lesson!

No, copies=n is not a substitute for redundancy or parity, and yes, losing any vdev does lose the pool.

Recent Articles