Constructing a (linux) home backup system

I’ve got a couple of Linux machines that need a transparent backup solution at home. Transparent as in - they should happen all the time, without asking, without notification, without any interaction. Ideally it shouldn’t run on the client either just to avoid issues with system updates. Making it all server-side allows me also to backup my girlfriend’s machine without explaining the setup - which is great when even periodic upgrades are a pain to enforce.

Another couple of requirements I had were:

  • backing up in local network (I don’t have enough bandwidth to backup remotely all the time)
  • saving storage space where files don’t change
  • able to run on a small device (I’ve got a TonidoPlug ready)
  • directory-based rather than fs-based (I’ve got a number of VMs I really don’t want to waste time backing up)

Unfortunately there aren’t many good, ready solutions and there aren’t many choices in general either. Most solutions fall into one of the following groups: trivial copy, archiving solutions, rsync-like and full-blown systems.

To be honest, maybe a big system like bacula can do what I need… but with it having more components that I have local hosts, I didn’t want to invest that much time in learning the configuration.

At small companies where I set up backups, I was happy with DAR. Full backups split into volumes of a given size and stored on removable disks weekly, with daily diffs stored locally - perfect. Here it won’t work - I can’t saturate the network or disk io every couple of hours. I really don’t want to deal with replacing removable disks. Some more intelligent solution is required.

Enter rsnapshot

Rsnapshot is an interesting mix between systems that understand backup schedules and a minimal backup utility. Rsnapshot can copy files from a remote host using rsync (and ssh for example). Then, it knows how to hard-link files that didn’t change to avoid wasting both bandwidth and storage. It also knows how to rotate backup directories when new backups come in and how to promote backups between levels.

The schedule and promotion is done in a very simple way too. Since rsnapshot needs to be run from some scheduler like cron, it takes the name of the backup as a parameter. This can be for example “daily”, “weekly”, “purple”, or whatever you want to call it. What’s important is the order in rsnapshot config file - only the first entry will get copied from source, others will get promoted from the “upper” level locally.

This is pretty much exactly what was needed. For the remote copy, ssh can be configured to use a password-less key to authenticate on the remote hosts. In case it’s possible to access the backup server from the internet, making sure it can run only rsync is a good idea too. For paranoid people, this can be also locked down with apparmor or selinux to deny anything apart from read-only access.

Sample configuration

So what does this look like in practice?

The server which collects the backups has rsnapshot and rsync installed. It also has a password-less ssh key created: just run ssh-keygen and copy the resulting id_rsa.pub into client machine’s \~/.ssh/authorized_keys.

It also has a separate partition available for each backup. In my case they’re mounted under /media/disk1partX.

Now, each machine which needs backing up requires a configuration file. I put them all in my plug’s root directory: /root/rsnapshot_configs/xxx. Each machine looks almost the same. For example host walker has /root/rsnapshot_configs/walker:

config_version    1.2
snapshot_root    /media/disk1part2/rsnapshot

cmd_rsync    /usr/bin/rsync
cmd_ssh    /usr/bin/ssh
cmd_du    /usr/bin/du
exclude_file    /root/rsnapshot_configs/walker.exclude

retain    hourly    5
retain    daily    7
retain    weekly    5

backup    viraptor@walker.local:/home/viraptor    viraptor

This configuration says that /media/disk1part2/rsnapshot/viraptor will keep copies of /home/viraptor (minus some excludes like .cache, .steam, etc.) - 5 hourly ones, then 7 daily ones, then 5 weekly ones. All files which have not changed will be hard-linked and no space will be wasted on exact duplicates. The backup is taken from the host walker.local (thanks Avahi!).

But that’s only the definition of what… what about when? Crontab entries are what actually triggers the rsnapshot actions and they live in /etc/cron.d. The content is:

55 */2 * * * root (/bin/ping -c1 walker.local >/dev/null 2>/dev/null) &&
    /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker hourly
20 3 * * * root /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker daily
20 4 * * 1 root /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker weekly

This means that every 2 hours if the host is on (it most likely isn’t), rsnapshot will take another hourly copy. Then every day in the morning it will promote an old hourly snapshot to daily and once a week it will promote daily to weekly.

Summary

All simple and easy. There’s no specific recovery procedure involved, because all backups are available as normal files on the server. It’s actually easy to browse the old entries in case only one or two files need recovering. And in case of serious issues - just rsync everything back from the last snapshot. So far I’m very happy with it.

One thing to watch out for is the exclusions file in case you don’t have a massive harddrive available. If you happen to install Steam with some games you may start getting emails about failed backups… but there’s only so many copies of portal or tf2 that you’ll need in your life - you probably don’t need them in the backups unless you have lots of space. VM images which change all the time are also very bad candidates for backing up with rsnapshot. Every modification there will need a separate, complete copy.

With a “normal” developer-style usage the size of old copies is close to none. Currently, my last copy is 22GB, but all previous snapshots and modifications add only 2GB to that.

Was it useful? BTC: 182DVfre4E7WNk3Qakc4aK7bh4fch51hTY
While you're here, why not check out my project Phishtrack which will notify you about domains with names similar to your business. Learn about phishing campaigns early.

blogroll

social