Warning:  This is more of a rambling brain-dump than a carefully revised and edited, or even coherent post.

 

For a long time now, I've been intrigued by two useful features of FreeBSD:  nanoBSD and ZFS. 

I like the nanoBSD concept of managing servers as deployable read-only images on ping-pong partitions.  That would allow me to build and test an OS/application image and then push it over the network to a remote server without having to go through the mergemaster hassle.  It also provides for an instant failsafe in case the upgrade fails.

I've been using ZFS on production servers for about 2 years now.  I like that it can tolerate unexpected power interruptions and that I can snapshot file systems before attempting risky modifications.  The ability to back-out of bad upgrades can spare me long weekends and trips to a remote site.

 

Over the years and in the back of my mind, I've been searching for a way to meld the characteristics of nanoBSD and ZFS to create an extremely reliable server able to tolerate far less than ideal environments and also easy to maintain remotely (including OS upgrades).  In the worst case, I'd like to have the option of walking somebody through booting a recovery image over the phone so I can finish the job remotely.

The problem with ZFS by itself for purely remote administration manifests when it comes time for an OS upgrade.  Snapshots make it possible to roll back, but you still need a reboot to single-user mode and other risky business in order to complete the upgrade itself.  It also takes a considerable amount of time (I like to build from source to apply custom build and configuration options).

The problem with nanoBSD for a regular server is that /var is a memory disk and nothing stored there will persist across a reboot.  Gone will be the logs, databases, mail, and other important things.  There are some simple ways to work around this, but I want complete separation of the OS media and the data/configuration media.  I'd really like to have a read-only OS image on one device with ping-pong binary upgrades and all the configuration and data on a raidz pool that's backed-up to another machine using dirvish or rsnapshot.

 

I'm not the first to attempt some sort of "bigger nanoBSD" or "nanoBSD + ZFS" evolution:

http://www.freebsdnews.net/2010/05/21/minimizing-downtime-nanobsd-zfs-jails/

http://www.psconsult.nl/talks/NLLGG-BSDdag-Servers/paper.pdf

http://lists.freebsd.org/pipermail/freebsd-fs/2007-April/003135.html

http://blog.freenas.org/2010/08/ixsystems-freenas-snapshot.html

I think it's cleaner to separate the configuration and data from the OS and installed software.  Data and configuration can be on a raidz pool while the operating system and applications are static and could reside on a separate disk, compact flash, USB drive, CD-ROM, etc.  Should the OS media fail, I want the system to automagically revive itself after replacement of a new OS drive without reconfiguration or restoration from backup.

I've considered the idea of updating the applications separately from the OS (something like GoboLinux), but I think that is more trouble than it's worth.  Libraries, applications, and OS are all tied together and it makes more sense to rev a single image where libraries applications and OS are compatible rather than try to maintain separate applications and OS and track a compatibility matrix.

 

My plan is to start with nanoBSD and add a zfs pool with file systems and mount points at /cfg and /var.  I'll symlink a directory like /etc/local to /usr/local/etc so I can store /usr/local/etc config data in /cfg along with the /etc data.

If the zfs pool is unavailable, there will be a fallback to a writable memory disk /var and a default recovery configuration on an OS image /cfg partition.

The upgrade procedure will be modified to include creation of zfs snapshots for /cfg and /var so that they can be updated along with the OS, but also rolled-back in case of failure.

 

I think FreeNAS 8 comes very close to doing what I intend to do, but but its focus is heavily on being a NAS appliance and not as much on being a remotely managed general purpose server.  I'll probably pare down the FreeNAS nanoBSD config file as a starting point and add in my particular customizations.

 

Some particular issues I will need to consider:

  • GEOM labeling of the OS drive so it can always be found (tunefs -L)  (added already in (not-so)recent versions of nanobsd?)
  • Zpool naming scheme and zfs mount point scheme
  • handling of zpool.cache and/or importing zpools after swapping OS drives / upgrading

It turns out we don't need zpool.cache at all. It is just as safe to import the zpools we know about on boot. Rather than play games to keep this file happy, we should just do the import always. Email with pjd@ confirms that unless we're booting from ZFS, this file is completely optional and won't really buy us anything since we have another database of zpool data.

We should modify the zpool commands to operate on /etc/zpool.cache and not worry about it from there. This property can be set early in boot so we don't need to modify our backend, I believe. I'll investigate and add the appropriate early boot commands (as well as adding the zpool imports to ix-fstab). This will be more robust anyway and a lot easier to implement than the mount -uw / dancing.

 

And that's about it for my research and brainstorming.  I'm going to take a couple stabs at actually implementing these ideas and hopefully post some successful results soon.