I’ve got a couple of Linux machines that need a transparent backup
solution at home. Transparent as in - they should happen all the time,
without asking, without notification, without any interaction. Ideally
it shouldn’t run on the client either just to avoid issues with system
updates. Making it all server-side allows me also to backup my
girlfriend’s machine without explaining the setup - which is great when
even periodic upgrades are a pain to enforce.
Another couple of requirements I had were:
- backing up in local network (I don’t have enough bandwidth to backup
remotely all the time)
- saving storage space where files don’t change
- able to run on a small device (I’ve got a
TonidoPlug ready)
- directory-based rather than fs-based (I’ve got a number of VMs I
really don’t want to waste time backing up)
Unfortunately there aren’t many good, ready solutions and there aren’t
many choices in general either. Most solutions fall into one of the
following groups: trivial copy, archiving solutions, rsync-like and
full-blown systems.
To be honest, maybe a big system like bacula can do what I need… but
with it having more components that I have local hosts, I didn’t want to
invest that much time in learning the configuration.
At small companies where I set up backups, I was happy with DAR. Full
backups split into volumes of a given size and stored on removable disks
weekly, with daily diffs stored locally - perfect. Here it won’t work -
I can’t saturate the network or disk io every couple of hours. I really
don’t want to deal with replacing removable disks. Some more intelligent
solution is required.
Enter rsnapshot
Rsnapshot is an interesting mix between
systems that understand backup schedules and a minimal backup utility.
Rsnapshot can copy files from a remote host using rsync (and ssh for
example). Then, it knows how to hard-link files that didn’t change to
avoid wasting both bandwidth and storage. It also knows how to rotate
backup directories when new backups come in and how to promote backups
between levels.
The schedule and promotion is done in a very simple way too. Since
rsnapshot needs to be run from some scheduler like cron, it takes the
name of the backup as a parameter. This can be for example “daily”,
“weekly”, “purple”, or whatever you want to call it. What’s important is
the order in rsnapshot config file - only the first entry will get
copied from source, others will get promoted from the “upper” level locally.
This is pretty much exactly what was needed. For the remote copy, ssh
can be configured to use a password-less key to authenticate on the
remote hosts. In case it’s possible to access the backup server from the
internet, making sure it can run only
rsync is a good
idea too. For paranoid people, this can be also locked down with
apparmor or selinux to deny anything apart from read-only access.
Sample configuration
So what does this look like in practice?
The server which collects the backups has rsnapshot and rsync installed.
It also has a password-less ssh key created: just run ssh-keygen and
copy the resulting id_rsa.pub into client machine’s
\~/.ssh/authorized_keys.
It also has a separate partition available for each backup. In my case
they’re mounted under /media/disk1partX.
Now, each machine which needs backing up requires a configuration file.
I put them all in my plug’s root directory:
/root/rsnapshot_configs/xxx. Each machine looks almost the same. For
example host walker has /root/rsnapshot_configs/walker:
config_version 1.2
snapshot_root /media/disk1part2/rsnapshot
cmd_rsync /usr/bin/rsync
cmd_ssh /usr/bin/ssh
cmd_du /usr/bin/du
exclude_file /root/rsnapshot_configs/walker.exclude
retain hourly 5
retain daily 7
retain weekly 5
backup viraptor@walker.local:/home/viraptor viraptor
This configuration says that /media/disk1part2/rsnapshot/viraptor will
keep copies of /home/viraptor (minus some excludes like .cache,
.steam, etc.) - 5 hourly ones, then 7 daily ones, then 5 weekly ones.
All files which have not changed will be hard-linked and no space will
be wasted on exact duplicates. The backup is taken from the host
walker.local (thanks Avahi!).
But that’s only the definition of what… what about when? Crontab
entries are what actually triggers the rsnapshot actions and they live
in /etc/cron.d. The content is:
55 */2 * * * root (/bin/ping -c1 walker.local >/dev/null 2>/dev/null) &&
/usr/bin/rsnapshot -c /root/rsnapshot_configs/walker hourly
20 3 * * * root /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker daily
20 4 * * 1 root /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker weekly
This means that every 2 hours if the host is on (it most likely isn’t),
rsnapshot will take another hourly copy. Then every day in the morning
it will promote an old hourly snapshot to daily and once a week it will
promote daily to weekly.
Summary
All simple and easy. There’s no specific recovery procedure involved,
because all backups are available as normal files on the server. It’s
actually easy to browse the old entries in case only one or two files
need recovering. And in case of serious issues - just rsync everything
back from the last snapshot. So far I’m very happy with it.
One thing to watch out for is the exclusions file in case you don’t have
a massive harddrive available. If you happen to install Steam with some
games you may start getting emails about failed backups… but there’s
only so many copies of portal or tf2 that you’ll need in your life - you
probably don’t need them in the backups unless you have lots of space.
VM images which change all the time are also very bad candidates for
backing up with rsnapshot. Every modification there will need a
separate, complete copy.
With a “normal” developer-style usage the size of old copies is close to
none. Currently, my last copy is 22GB, but all previous snapshots and
modifications add only 2GB to that.