PSA: snapshots are better than ZVOLs.
Published by Jim Salter // June 16th, 2016
A lot of people new to ZFS, and even a lot of people not-so-new to ZFS, like to wax ecstatic about ZVOLs. But they never seem to mention the very real pitfalls ZVOLs present.
What’s a ZVOL?
Well, if you know what LVM is, a ZVOL is like an LV, but for ZFS. If you don’t know what LVM is, you can think of a ZVOL as, basically, a dynamically allocated “raw partition” inside ZFS. Unlike a normal dataset, a ZVOL doesn’t have a filesystem of its own. And you can access it by a raw devicename, like /dev/zvol/poolname/zvolname. This looks ideal for those use-cases where you want to nest a legacy filesystem underneath ZFS – for example, virtual machine images. Once you have the ZVOL, you have a raw block storage device to interact with – think mkfs.ext4 /dev/zvol/poolname/zvolname, for example – but you still get all the normal ZFS features along with it, like data integrity, compression, snapshots, and so forth. Plus you don’t have to mess with a loopback device, so that should be higher performance, right? What’s not to love?
ZVOLs perform better, though, right?
AFAICT, the increased performance is pretty much a lie. I’ve benchmarked ZVOLs pretty extensively against raw disk partitions, raw LVs, raw files, and even .qcow2 files and there really isn’t much of a performance difference to be seen. A partially-allocated ZVOL isn’t going to perform any better than a partially-allocated .qcow2 file, and a fully-allocated ZVOL isn’t going to perform any better than a fully-allocated .qcow2 file. (Raw disk partitions or LVs don’t really get any significant boost, either.)
Let’s talk about snapshots.
If snapshots aren’t one of the biggest reasons you’re using ZFS, they should be, and ZVOLs and snapshots are really, really tricky and weird. If you have a dataset that’s occupying 85% of your pool, you can snapshot that dataset any time you like. If you have a ZVOL that’s occupying 85% of your pool, you cannot snapshot it, period. This is one of those things that both tyros and vets tend to immediately balk at – I must be misunderstanding something, right? Surely it doesn’t work that way? Afraid it does.
Ooh, is it demo-in-a-VM-time again?! =)
root@xenial:~# zfs create target/dataset -o compress=off -o quota=15G root@xenial:~# pv < /dev/zero > /target/dataset/0.bin 15GiB 0:01:13 [10.3MiB/s] [ <=> ] pv: write failed: Disk quota exceeded root@xenial:~# zfs list NAME USED AVAIL REFER MOUNTPOINT target 15.3G 3.93G 19K /target target/dataset 15.0G 0 15.0G /target/dataset root@xenial:~# zfs snapshot target/dataset@1 root@xenial:~#
Above, we created a dataset on a 20G pool, we dumped 15G of data into it, and we snapshotted the dataset. No surprises here, this is exactly what we expect.
But what happens when we try the same thing with a ZVOL?
root@xenial:~# zfs create target/zvol -V 15G -o compress=off root@xenial:~# pv < /dev/zero > /dev/zvol/target/zvol 15GiB 0:03:22 [57.3MiB/s] [========================================> ] 99% ETA 0:00:00 pv: write failed: No space left on device NAME USED AVAIL REFER MOUNTPOINT target 15.8G 3.46G 19K /target target/zvol 15.5G 3.90G 15.0G - root@xenial:~# zfs snapshot target/zvol@1 cannot create snapshot 'target/zvol@1': out of space
Despite having 3.9G free on our pool, we can’t snapshot the zvol. If you don’t have at least as much free space in a pool as the REFER of a ZVOL on that pool, you can’t snapshot the ZVOL, period. This means for our little baby demonstration here we’d need 15G free to snapshot our 15G ZVOL. In a real-world situation with VM images, this could easily be a case where you can’t snapshot your 15TB VM image without 15 terabytes of free space available – where if you’d stuck with standard datasets, you’d be able to snapshot that same 15TB VM even with just a few hundred megabytes of AVAIL at your disposal.
TL;DR:
Think long and hard before you implement ZVOLs. Then, you know… don’t.