Text

Here’s a list of things that currently fail when trying to run vagrant under Arch. Hopefully I hit most of the keywords from errors and you’re reading this because you ran into one of those.

Vagrant is not an official Arch package.

It’s in AUR though - https://aur.archlinux.org/packages/vagrant/ (vote for it!) and `pacaur -S vagrant` will happily install it.

Wrong guest additions version

Arch will most likely install virtualbox with a different api than guest additions in the box you downloaded from the internet. This is easily fixable by either creating your own compatible box, or adding a plugin via:

vagrant plugin install vagrant-vbguest

This will make sure that after every new vm is started, the guest additions version will be checked. If it doesn’t match or isn’t installed, guest additions will be downloaded and forced to comply.

If you provide the iso, the addins will be installed from the CD. No, the internet doesn’t know how to make sure the guest additions CD is mounted automatically. No, you can’t access the “Install guest additions” menu unless you use the gui for the vm. No, it doesn’t make sense.

Reading from shared folders hangs

If reading from shared folders causes freezes and stracing results in just a looped call to `getdents()`, you have incompatible guest additions. Problem described here. Solution - see above.

Rebooting the VM results in a freeze

If you don’t update guest additions, shared folders don’t work. If you do, the host will freeze during the bootup after the first reset. Yes it does. That’s all.

There’s a kvm module for vagrant!

Some sanity can be hopefully restored by using KVM/libvirt instead of vbox. There are two modules - vagrant-kvm and vagrant-libvirt. Both will fail to install.

How to install vagrant-kvm

Trying to simply run `vagrant plugin install vagrant-kvm` will result in:

Installing the 'vagrant-kvm' plugin. This can take a few minutes...
...in `run': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError)

The log in `mkmf.log` will tell you that:

/usr/..../libvirt.so: undefined reference to `curl_global_init@CURL_OPENSSL_4'

(and many other functions too). The problem is described here and all you need to do is move away the following files from `/opt/vagrant/embedded/lib/`: `libcurl.so`, `libcurl.so.4`, `libcurl.so.4….` (whichever version you have at the moment). Finally move `/opt/vagrant/embedded/lib/pkgconfig/libcurl.pc`. Now `vagrant-kvm` will install succesfully and you can move the files back into the original place.

Vagrant-kvm doesn’t work because “authentication failed: polkit”

Of course it won’t. Arch simply installs libvirt and leaves you holding the baby. Read the arch wiki and setup the polkit rules. Also you have to create the libvirt group, even though the wiki lists it as an alternative way.

Vagrant-kvm doesn’t work because “Failed to connect socket to …”

Restart your libvirt using:

sudo systemctl restart libvirtd

Vagrant-kvm doesn’t work becuase “Could not create file: Permission denied”

At some point when you weren’t looking something changed the owner of `~/.vagrant.d/tmp/storage-pool` to root. That means the process spawning the vm is not running as you anymore and most likely you removed the `rx’ rights from “others” on your home directory. For this one I don’t have a good solution anymore. You’ll have to give ‘x’ rights on your home either to all (`chmod o+x ~`), or just to `users` and `libvirt` using the acls.

Vagrant-kvm doesn’t work because “VagrantPlugins::ProviderKvm::Action::NFS”

You’re right, it doesn’t. You have to repackage vagrant-kvm from master branch, because the fix is not released yet (as of March 2014). Read the upstream issue and follow the gem repackaging instructions.

Text

There are many ways to turn on the unattended upgrades in Ubuntu. Creating files in /etc/apt/apt.conf.d, reconfiguring the package by hand, reinstalling after debconf, etc.

Here’s a simple way to do it with salt without breaking the Ubuntu / debconf integration:


unattended-upgrades:
  debconf.set:
    - data:
'unattended-upgrades/enable_auto_updates': type: boolean value: "true" cmd.wait: - name: "dpkg-reconfigure unattended-upgrades" - watch: - debconf: unattended-upgrades - env: DEBIAN_FRONTEND: noninteractive DEBCONF_NONINTERACTIVE_SEEN: "true"

Text

Any long-term data archives you keep - backups, copies of logs, code repositories, audit entries, etc. Are they append-only? I don’t mean from the perspective of the account owner. Of course the operator is able to do whatever he wants with those files, including deleting them.

But what about your servers? I’ve seen many backup manuals, or even published scripts which pretty much require you to setup the S3 account and that’s it. They’ll manage the files inside and make sure everything is updated as needed. But there’s a problem with that strategy - what if someone else gains control of your server? What can they do to your archives?

Can they, in one go, wipe both your current database and all the available backups using credentials stored locally? Or all of the logs from this and other services? Can they wipe the code repositories using your deployment keys? Can they read all your database backups with every past and present account still in there? Guessing from many tutorials and articles, there are lots of production services out there where the answer is “yes” to all. Sometimes it’s not even an issue with the admins not trying hard enough - sometimes services themselves are really not prepared for protection from that kind of behaviour. Many providers simply don’t offer any other ways of accessing their accounts apart from a single user+key combination giving global access to everything.

There are ways to provide some level of protection even without any support from the service provider, but it’s much easier if the access control is already built-in. For example when uploading data into S3, everything that’s needed for good protection is already provided. IAM gives you the possibility to create an account per service, or even per host. User permissions allow you to give only as much access as needed for your use case. That means your remote backup script shouldn’t need any more privileges than PutObject on some specific object (in case you’re using versioned buckets), or on a bucket (without versioning). The second case requires that you also assign unpredictable names (for example random suffixes) to the files, so that they cannot be destroyed by overwriting.

Here’s an example of a user policy for S3 bucket without versioning:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:PutObject"
      ],
      "Sid": "Stmt1375779806000",
      "Resource": [
        "arn:aws:s3:::blah-database-backups/*"
      ],
      "Effect": "Allow"
    }
  ]
}

But that’s not all. Even if you have a single account, there’s still a way to limit the potential damage to some extent. Instead of using the same account for upload and long-term storage of the files, get two accounts. Upload to the first one and keep it around only for that purpose. For long-term storage, set up a machine which is completely separated from the users (maybe even rejects all incomming connections) and make sure all files are moved to the second account as soon as possible.

That kind of setup is not perfect and still allows the attacker to replace files that have not been moved, or download files which wouldn’t normally be accessible from the local filesystem. But the time window and amount of data that may be impacted is much lower. There’s also still the possiblity to encrypt the data locally using a public key (which can be safely stored on the server) in order to protect the information.

So, if I got access to one of your servers and its backup system… what would I be able to achieve?

Text

There’s an interesting side effect to way Debian-like systems handle the database configuration for various packages. Instead of each package setting the connection parameters on their own, they reuse the abstraction provided by package ‘dbconfig-common’. This is all fine in theory, but unfortunately dbconfig doesn’t seem to behave very well with values pre-set before installation.

It should check the debconf settings called “<package>/internal/skip-preseed” and ignore all the setup and leave the configuration up to other tools. (configuration management!) But that’s not what happens at the start. For example package “phpmyadmin” - it depends on “dbconfig-common” and I didn’t want to this way of configuration. But here’s the catch: setting “phpmyadmin/internal/skip-preseed” to “true” didn’t work. The package is installed with a randomly generated password.

There’s a way to work around this by splitting the installation into steps though. First install “dbconfig-common”, then set the values, then install the package. The way it works in salt (for example) is:

dbconfig-common:
  pkg.installed

phpmyadmin:
  pkg:
    - installed
    - require:
      - debconf: phpmyadmin

  debconf.set:
    - require:
      - pkg: dbconfig-common
    - data
        'phpmyadmin/internal/skip-preseed':
          type: boolean
          value: true

Then it all works again. Hopefully also for packages other than phpmyadmin.

Text

I’ve got a couple of Linux machines that need a transparent backup solution at home. Transparent as in - they should happen all the time, without asking, without notification, without any interaction. Ideally it shouldn’t run on the client either just to avoid issues with system updates. Making it all server-side allows me also to backup my girlfriend’s machine without explaining the setup - which is great when even periodic upgrades are a pain to enforce.

Another couple of requirements I had were:

  • backing up in local network (I don’t have enough bandwidth to backup remotely all the time)
  • saving storage space where files don’t change
  • able to run on a small device (I’ve got a TonidoPlug ready)
  • directory-based rather than fs-based (I’ve got a number of VMs I really don’t want to waste time backing up)

Unfortunately there aren’t many good, ready solutions and there aren’t many choices in general either. Most solutions fall into one of the following groups: trivial copy, archiving solutions, rsync-like and full-blown systems.

To be honest, maybe a big system like bacula can do what I need… but with it having more components that I have local hosts, I didn’t want to invest that much time in learning the configuration.

At small companies where I set up backups, I was happy with DAR. Full backups split into volumes of a given size and stored on removable disks weekly, with daily diffs stored locally - perfect. Here it won’t work - I can’t saturate the network or disk io every couple of hours. I really don’t want to deal with replacing removable disks. Some more intelligent solution is required.

Enter rsnapshot

Rsnapshot is an interesting mix between systems that understand backup schedules and a minimal backup utility. Rsnapshot can copy files from a remote host using rsync (and ssh for example). Then, it knows how to hard-link files that didn’t change to avoid wasting both bandwidth and storage. It also knows how to rotate backup directories when new backups come in and how to promote backups between levels.

The schedule and promotion is done in a very simple way too. Since rsnapshot needs to be run from some scheduler like cron, it takes the name of the backup as a parameter. This can be for example “daily”, “weekly”, “purple”, or whatever you want to call it. What’s important is the order in rsnapshot config file - only the first entry will get copied from source, others will get promoted from the “upper” level locally.

This is pretty much exactly what was needed. For the remote copy, ssh can be configured to use a password-less key to authenticate on the remote hosts. In case it’s possible to access the backup server from the internet, making sure it can run only rsync is a good idea too. For paranoid people, this can be also locked down with apparmor or selinux to deny anything apart from read-only access.

Sample configuration

So what does this look like in practice?

The server which collects the backups has rsnapshot and rsync installed. It also has a password-less ssh key created: just run ssh-keygen and copy the resulting id_rsa.pub into client machine’s ~/.ssh/authorized_keys.

It also has a separate partition available for each backup. In my case they’re mounted under /media/disk1partX.

Now, each machine which needs backing up requires a configuration file. I put them all in my plug’s root directory: /root/rsnapshot_configs/xxx. Each machine looks almost the same. For example host walker has /root/rsnapshot_configs/walker:

config_version    1.2
snapshot_root    /media/disk1part2/rsnapshot

cmd_rsync    /usr/bin/rsync
cmd_ssh    /usr/bin/ssh
cmd_du    /usr/bin/du
exclude_file    /root/rsnapshot_configs/walker.exclude

retain    hourly    5
retain    daily    7
retain    weekly    5

backup    viraptor@walker.local:/home/viraptor    viraptor

This configuration says that /media/disk1part2/rsnapshot/viraptor will keep copies of /home/viraptor (minus some excludes like .cache, .steam, etc.) - 5 hourly ones, then 7 daily ones, then 5 weekly ones. All files which have not changed will be hard-linked and no space will be wasted on exact duplicates. The backup is taken from the host walker.local (thanks Avahi!).

But that’s only the definition of what… what about when? Crontab entries are what actually triggers the rsnapshot actions and they live in /etc/cron.d. The content is:

55 */2 * * * root (/bin/ping -c1 walker.local >/dev/null 2>/dev/null) &&
/usr/bin/rsnapshot -c /root/rsnapshot_configs/walker hourly
20 3 * * * root /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker daily
20 4 * * 1 root /usr/bin/rsnapshot -c /root/rsnapshot_configs/walker weekly

This means that every 2 hours if the host is on (it most likely isn’t), rsnapshot will take another hourly copy. Then every day in the morning it will promote an old hourly snapshot to daily and once a week it will promote daily to weekly.

Summary

All simple and easy. There’s no specific recovery procedure involved, because all backups are available as normal files on the server. It’s actually easy to browse the old entries in case only one or two files need recovering. And in case of serious issues - just rsync everything back from the last snapshot. So far I’m very happy with it.

One thing to watch out for is the exclusions file in case you don’t have a massive harddrive available. If you happen to install Steam with some games you may start getting emails about failed backups… but there’s only so many copies of portal or tf2 that you’ll need in your life - you probably don’t need them in the backups unless you have lots of space. VM images which change all the time are also very bad candidates for backing up with rsnapshot. Every modification there will need a separate, complete copy.

With a “normal” developer-style usage the size of old copies is close to none. Currently, my last copy is 22GB, but all previous snapshots and modifications add only 2GB to that.

Text

Just a popular stackoverflow question & answer which was deleted as not constructive even with 10+ score and thousands of views (by a mod, so can’t vote to revert - thanks!) Keep in mind that it’s from 2009 without later reviews.

Question:

I am searching for a tool that tests SIP calls. A platform that makes a call from SIP device A to SIP device B and reports results…

Any idea? A simulation platform would be ideal.

My answer:

There are many solutions. Some more broken than others. Here’s a quick summary of what I’ve found while looking for a base for a proper automated testing solution.

SIPp

It’s ok if you want only a single dialog at a time. What doesn’t work here is complex scenarios where you need to synchronise 2 call legs, do registration, call and presence in the same scenario. If you go this way, you’ll end up with running multiple sipp scenarios for each conversation element separately. Sipp also doesn’t scale at all for media transfers. Even though it’s multithreaded, something stops it from running concurrently - if you look at htop for example, you’ll see that sipp never crosses the 100% line. Around 50 media calls it starts to cut audio and take all CPU of the machine.

It can sometimes lose track of what’s happening, some packets which don’t even belong to the call really can result in a failed test. It’s got some silly bugs too, like case-sensitive comparing of the headers.

SIPr/sipper

Ruby-based solution where you have to write your own scenarios in Ruby. It’s got its own SIP stack and lots of tests. While it’s generally good and handles a lot of complex scenarios nicely, its design is terrible in my opinion. Bugs are hard to track down and after a week I had >10 patches that I needed just to make it do basic stuff. Later I learned that some of the scenarios should have been written in a slightly different way, but SIPr developers were not really responsive and it took a lot of time to find out about it. There was no good documentation either. Synchronising actions of many agents is a hard problem, since they’d rather use an event-based, but still single-threaded approach… it just makes you concentrate too much on “what order can this happen in and do I handle it correctly” and worrying about sync/async actions, rather than writing the actual test.

WinSIP

Commercial solution. Never tested it properly since the basic functionality is missing from the evaluation version and it’s hard to spend that much money on something you’re not sure works…

SipUnit

Java-based solution reusing Jain-SIP stack. It can do almost any scenario and is fairly good. It tries to make everything non-blocking / action based leading to a similar situation SIPr has, but in this case it’s trivial to make it parallel / threaded so it’s much easier to deal with. It has its own share of bugs, so not everything works well in the vanilla package, but most of the stuff is patchable. The developers seem to be busy with other projects, so it’s not updated for a long time. If you need transfers, presence, dialog-info, custom messages, RTP handling, etc. - you’ll have to write your own modifications to support them. It is not good for performance testing though - each session on the test client side is rather heavy compared to the typical server side processing. That means you’ll most likely need a number of hosts doing the testing per one server/proxy.

If you’re a Java-hater like me, it can be used in a simple way from Jython, JRuby or any other JVM language.

In the end, I chose SIPunit as the least broken/evil/unusable solution. It is by no means perfect, but… it works in most cases. If I was doing the project once again with all this knowledge, I’d probably reuse SIPp configurations and try to write my own solution that uses proper threading - but I guesstimate it’s at least a ½ year project for one person, to make it good enough for production.

Text

If you follow the latest versions of… everything and tried to install flashcache you probably noticed that none of the current guides are correct regarding how to install it. Or they are mostly correct but with some bits missing. So here’s an attempt to do a refreshed guide. I’m using kernel version 3.7.10 and mkinitcpio version 0.13.0 (this actually matters, the interface for adding hooks and modules has changed).

Some of the guide is likely to be Arch-specific. I don’t know how much, so please watch out if you’re using another system. I’m going to explain why things are done the way they are, so you can replicate them under other circumstances.

Why flashcache?

First, what do I want to achieve? I’m setting up a system which has a large spinning disk (300GB) and a rather small SSD (16GB). Why such a weird combination? Lenovo allowed me to add a free 16GB SSD drive to the laptop configuration - couldn’t say no ;) The small disk is not useful for a filesystem on its own, but if all disk writes/reads were cached on it before writing them back to the platters, it should give my system a huge performance gain without a huge money loss. Flashcache can achieve exactly that. It was written by people working for Facebook to speed up their databases, but it works just as well for many other usage scenarios.

Why not other modules like bcache or something else dm-based? Because flashcache does not require kernel modifications. It’s just a module and a set of utilities. You get a new kernel and they “just work” again - no source patching required. I’m excited about the efforts for making bcache part of the kernel and for the new dm cache target coming in 3.9, but for now flashcache is what’s available in the easiest way.

I’m going to set up two SSD partitions because I want to cache two real partitions. There has to be a persistent 1:1 mapping between the cache and real storage for flashcache to work. One of the partitions is home (/home), the other is the root (/).

Preparation

Take backups, make sure you have a bootable installer of your system, make sure you really want to try this. Any mistake can cost you all the contents of your harddrive or break your grub configuration, so that you’ll need an alternative method of accessing your system. Also some of your “data has been written” guarantees are going to disappear. You’ve been warned.

Building the modules and tools

First we need the source. Make sure your git is installed and clone the flashcache repository: https://github.com/facebook/flashcache

Then build it, specifying the path where the kernel source is located - in case you’re in the middle of a version upgrade, this is the version you’re compiling for, not the one you’re using now:

make KERNEL_TREE=/usr/src/linux-3.7.10-1-ARCH KERNEL_SOURCE_VERSION=3.7.10-1-ARCH
sudo make KERNEL_TREE=/usr/src/linux-3.7.10-1-ARCH KERNEL_SOURCE_VERSION=3.7.10-1-ARCH install

There should be no surprises at all until now. The above should install a couple of things - the module and 4 utilities:

/usr/lib/modules/<version>/extra/flashcache/flashcache.ko
/sbin/flashcache_load
/sbin/flashcache_create
/sbin/flashcache_destroy
/sbin/flashcache_setioctl

The module is the most interesting bit at the moment, but to load the cache properly at boot time, we’ll need to put those binaries on the ramdisk.

Configuring ramdisk

Arch system creates the ramdisk using mkinitcpio (which is a successor to initramfs (which is a successor to initrd)) - you can read some more about it at Ubuntu wiki for example. The way this works is via hooks configured in /etc/mkinitcpio.conf. When the new kernel gets created, all hooks from that file are run in the defined order to build up the contents of what ends up in /boot/initramfs-linux.img (unless you changed the default).

The runtime scripts live in /usr/lib/initcpio/hooks while the ramdisk building elements live in /usr/lib/initcpio/install. Now the interesting part starts: first let’s place all needed bits into the ramdisk, by creating install hook /usr/lib/initcpio/install/flashcache :

# vim: set ft=sh:

build ()
{
    add_module "dm-mod"
    add_module "flashcache"

    add_dir "/dev/mapper"
    add_binary "/usr/sbin/dmsetup"
    add_binary "/sbin/flashcache_create"
    add_binary "/sbin/flashcache_load"
    add_binary "/sbin/flashcache_destroy"
    add_file "/lib/udev/rules.d/10-dm.rules"
    add_file "/lib/udev/rules.d/13-dm-disk.rules"
    add_file "/lib/udev/rules.d/95-dm-notify.rules"
    add_file "/lib/udev/rules.d/11-dm-lvm.rules"

    add_runscript
}

help ()
{
cat<<HELPEOF
  This hook loads the necessary modules for a flash drive as a cache device for your root device.
HELPEOF
}

This will add the required modules (dm-mod and flashcache), make sure mapper directory is ready, install the tools and add some useful udev disk discovery rules. Same rules are included in the lvm2 hook (I assume you’re using it anyway), so there is an overlap, but this will not cause any conflicts.

The last line of the build function makes sure that the script with runtime hooks will be included too. That’s the file which needs to ensure everything is loaded at boot time. It should contain function run_hook which runs after the modules are loaded, but before the filesystems are mounted, which is a perfect time for additional device setup. It looks like this and goes into /usr/lib/initcpio/hooks/flashcache:

#!/usr/bin/ash

run_hook ()
{
if [ ! -e "/dev/mapper/control" ]; then
/bin/mknod "/dev/mapper/control" c $(cat /sys/class/misc/device-mapper/dev | sed 's|:| |')
fi

[ "${quiet}" = "y" ] && LVMQUIET=">/dev/null"

msg "Activating cache volumes..."
oIFS="${IFS}"
IFS=","
for disk in ${flashcache_volumes} ; do
eval /usr/sbin/flashcache_load "${disk}" $LVMQUIET
done
IFS="${oIFS}"
}

# vim:set ft=sh:

Why the crazy splitting and where does flashcache_volumes come from? It’s done so that the values are not hardcoded and adding a volume doesn’t require rebuilding initramfs. Each variable set as kernel boot parameter is visible in the hook script, so adding a flashcache_volumes=/dev/sdb1,/dev/sdb2 will activate both of those volumes. I just add that to the GRUB_CMDLINE_LINUX_DEFAULT variable in /etc/default/grub.

The commands for loading sdb1, sdb2 are in my case the partitions on the SSD drive - but you may need to change those to match your environment.

Additionally if you’re attempting to have your root filesystem handled by flashcache, you’ll need two more parameters. One is of course root=/dev/mapper/cached_system and the second is lvmwait=/dev/maper/cached_system to make sure the device is mounted before the system starts booting.

At this point regenerating the initramfs (sudo mkinitcpio -p linux) should work and print out something about included flashcache. For example:

==> Building image from preset: 'default'
  -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
==> Starting build: 3.7.10-1-ARCH
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
  -> Running build hook: [lvm2]
  -> Running build hook: [flashcache]
  -> Running build hook: [filesystems]
  -> Running build hook: [keyboard]
  -> Running build hook: [fsck]
==> Generating module dependencies
==> Creating gzip initcpio image: /boot/initramfs-linux.img
==> Image generation successful

Finale - fs preparation and reboot

To actually create the initial caching filesystem you’ll have to prepare the SSD drive. Assuming it’s already split into partitions - each one for buffering data from a corresponding real partition, you have to run the flashcache_create app. The details of how to run it and available modes are described in the flashcache-sa-guide.txt file in the repository, but the simplest example is (in my case to create the root partition cache:

flashcache_create -p back cached_system /dev/sdb1 /dev/sda2

which creates a devmapper device called cached_system with fast cache on /dev/sdb1 and backing storage on /dev/sda2.

Now adjust your /etc/fstab to point at the caching devices where necessary, install grub to include the new parameters and reboot. If things went well you’ll be running from the cache instead of directly from the spinning disk.

Was it worth the work?

Learning about initramfs and configuring it by hand - of course - it was lots of fun and I got a ramdisk failing to boot the system only 3 times in the process…

Configuring flashcache - OH YES! It’s a night and day difference. You can check the stats of your cache device by running dmsetup status devicename. In my case after a couple of days of browsing, watching movies, hacking on python and haskell code, I get 92% cache hits on read and 58% on write on the root filesystem. On home it’s 97% and 91% respectively. Each partition is 50GB HDD with 8GB SDD cache. Since the cache persists across reboots, startup times have also dropped from ~5 minutes to around a minute in total.

I worked on SSD-only machines before and honestly can’t tell the difference between them and one with flashcache during standard usage. The only time when you’re likely to notice a delay is when loading a new, uncached program and the disk has to spin up for reading.

Good luck with your setup.

Text

Some time ago I ran into a production issue where the init process (upstart) stopped behaving properly. Specifically, instead of spawning new processes, it deadlocked in a transitional state. To be precise, the init process itself was responsive, but the critical services were stuck in one of the pre- or post- states, never actually restarting. What’s worse, upstart doesn’t allow forcing a state transition and trying to manually create and send DBus events didn’t help either. That meant the sane options we were left with were:

  • restart the host (not desirable at all in that scenario)
  • start the process manually and hope auto-respawn will not be needed.

Of course there are also some insane options. Why not cheat like in the old times and just PEEK and POKE the process in the right places? The solution used at the time involved a very ugly script driving gdb which probably summoned satan in some edge cases. But edge cases were not hit and majority of hosts recovered without issues. (if you overwrite memory of your init process, you should expect at least a small percent of segfaults) After some time however I wanted to recreate the experiment in a cleaner way and see what interfaces are available if I had a legitimate use for doing something similar again.

The goal is the same - given an upstart job name, change its goal and status fields to arbitrary values, without killing the init process. First some context however:

Why is peek/poke harder these days?

In good old times when software size was measured in kilobytes and each byte was quite expensive dynamic allocation was very rare. Whatever could be static, was static. Whatever couldn’t be, was most likely pooled in a specific region and there was a preset number of “things” the program could handle. That means your lives counter or some other important value was most likely always at the exact same address every time. That’s not the case anymore unfortunately. Almost everything needs to handle an arbitrary number of “things” these days and that means dynamic allocation.

It’s also trivial to allocate new memory regions and OS takes care of things like making sure the memory looks like a one continuous space to your app, while it reality it can be all over the place. The practical implication is that anything we’ll need to search for in the upstart process will be malloc’d somewhere in the heap area. We also need to know where the heap happens to be at the specific time.

Ways of direct access to a process.

On Linux there are a couple of ways to access memory of a foreign process. The easiest two are reading directly from /proc/(pid)/mem and using the ptrace library. The ptrace request ids are actually called PTRACE_PEEKDATA and PTRACE_POKEDATA which should make their purpose quite clear. There’s a lot of information about them in man pages if you want more details, but let’s move on to some real action.

Where to read from is another interesting question. Apart from dynamic allocation we’ve got virtual memory these days and additional memory-shifting concepts like ASLR. The up-to-date, valid information about where to look for data will exist under /proc/(pid)/maps for each running application. For the init process (PID 1), it looks something like this:

......
7fae2b2b7000-7fae2b2b9000 rw-p 00023000 fd:01 2860       /lib/x86_64-linux-gnu/ld-2.15.so
7fae2b2b9000-7fae2b2df000 r-xp 00000000 fd:01 4259       /sbin/init (deleted)
7fae2b4de000-7fae2b4e0000 r--p 00025000 fd:01 4259       /sbin/init (deleted)
7fae2b4e0000-7fae2b4e1000 rw-p 00027000 fd:01 4259       /sbin/init (deleted)
7fae2cf09000-7fae2cfd0000 rw-p 00000000 00:00 0          [heap]
7fffc146b000-7fffc148c000 rw-p 00000000 00:00 0          [stack]
7fffc1599000-7fffc159a000 r-xp 00000000 00:00 0          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]

As previously noted, all of the interesting / long-lived data will be found in the heap which is annotated with a fake path “[heap]”. All of the ranges listed in the maps file are available. Others will give an error on access. Process memory acts like a stricter version of a sparse file in this case.

Nice ways of direct access

Both ptrace and memory-file interfaces are quite low-level, so instead of writing lots of C code, I’m going to use some Python instead. Fortunately there’s an existing ptrace wrapper on pypi and even though it looks abandoned, it still works very well. The interface allows easy “stop and attach” operation as well as exposes some interesting functions for address range reading and writing. Allow me to do some blog-literate programming here. The ptrace interface allows for easy attaching to a chosen PID (1 in this case):

def get_init_process():
    d = ptrace.debugger.PtraceDebugger()
    proc = d.addProcess(1, False)
    return proc

Now down to the details… After a quick glance at init/job.h from upstart source code, it looks like we’re interested in two values from struct Job - goal and state. Both have a range of values described at the top of the file. Counting from the beginning of the struct, they’re at offset 5*(native pointer length), because NihList consists of two pointers only.

PTR_SIZE=ptrace.cpu_info.CPU_WORD_SIZE
JOB_CLASS_NAME_OFFSET = PTR_SIZE*2
JOB_CLASS_PATH_OFFSET = PTR_SIZE*3
JOB_NAME_OFFSET = PTR_SIZE*2
JOB_JOB_CLASS_OFFSET = PTR_SIZE*3
JOB_PATH_OFFSET = PTR_SIZE*4
JOB_GOAL_OFFSET = PTR_SIZE*5

But struct Job is not something we can find easily. Let’s say the upstart job to fix is called “rsyslog”. This string is in the heap, but not pointed to from the Job structure. That part initially consisted of some guesswork and upstream code browsing which I’m not going to reproduce here, but the result is that the bytes “rsyslog" (or "rsyslog\0" to be precise) exists in structure JobClass in init/job_class.h. Actually… there and in 18 other places. That means on the current system I can find 19 places which contain that name terminated by a zero byte and the next steps are going to be figuring out how to figure out which of those occurrences can be traced back to the job itself.

def get_heap(proc):
    return [m for m in proc.readMappings() if m.pathname == '[heap]'][0]

def find_refs_to(mem, bytestr)
    return [addr for addr in heap.search(bytestr)]

With such a low number of hits we can just check each of them and see how viable each one is.

Tracking references

So how to find out if each of the guesses is correct? By checking if the surrounding values and pointers makes sense. In this case the JobClass has a path field which according to comments is a string containing the DBus path for the job. As noted previously, those fields have a known offset from the start of the structure. Let’s write something generic then that will browse through given addresses and check if the memory referencing them looks like it could be a known object:

def flatten(stream):
    result = []
    for collection in stream:
        result.extend(collection)
    return result

def places_referring_to(mem, search_value):
    needle = ptrace.ctypes_tools.word2bytes(search_value)
    return find_refs_to(mem, needle)

def find_object_references(proc, heap, values, offset, verifier):
    refs = flatten(places_referring_to(heap, value) for value in values)
    return [ref-offset for ref in refs if verifier(proc, ref-offset)]

Now some functions that can actually judge whether some location looks like a Job or a JobClass by extracting expected strings:

def deref_string(proc, addr):
    s_addr = proc.readWord(addr)
    try:
        return proc.readCString(s_addr, 100)[0]
    except ptrace.debugger.process_error.ProcessError:
        return None

def looks_like_job_class(proc, addr):
    s = deref_string(proc, addr+JOB_CLASS_PATH_OFFSET)
    return s is not None and s.startswith('/com/ubuntu/Upstart/jobs/')

def looks_like_job(proc, addr):
    s = deref_string(proc, addr+JOB_PATH_OFFSET)
    return s is not None and s.startswith('/com/ubuntu/Upstart/jobs/')

And that’s it. There could be a lot more sanity checking going on, but after a quick check it appears to be unnecessary. A quick run results in only one pointer which actually does show a valid Job structure.

The reference chain we’re looking for is: string (name of the process) -> that is used in a JobClass -> that is used in a Job. To wrap it all up into an actual script:

proc = get_init_process()
heap = get_heap(proc)
process_strings = find_refs_to(heap, process_to_fix)
job_classes = find_object_references(proc, heap, process_strings,
    JOB_CLASS_NAME_OFFSET, looks_like_job_class)
jobs = find_object_references(proc, heap, job_classes,
    JOB_JOB_CLASS_OFFSET, looks_like_job)

for job in jobs:
    print "job found at 0x%016x" % job
    goal, state = proc.readStruct(job+JOB_GOAL_OFFSET,
        ctypes.c_int32*2)[:]
    print "goal", job_goals[goal]
    print "state", job_states[state]

Does it all work?

Yes, of course it does! And pretty reliably actually:

sudo ./search_init.py rsyslog
job found at 0x00007fae2cf95ca0
goal JOB_START
state JOB_RUNNING

After finding the right address it’s only a matter of proc.writeBytes() to force the change of the goal and state.

Unfortunately there’s nothing stopping the system from being in a state where this change really shouldn’t happen. For example right before the value is read, or while it’s being copied and some code path still holds the old reference, or… Basically changing memory which you don’t have complete control over is not safe. Ever. Around 1% of machines had problems with init going crazy afterwards, but those could be just rebooted then. But as a hack that allows you to fix a critical issue, it’s worth remembering that it’s not rocket science.

And finally: thanks to Victor Stinner for writing some really useful Python tools.

Text

I started playing some flash-based MMORPG for fun lately. The limited options available to the characters in RPGs are not as entertaining as programming, so this didn’t last long; but it definitely gave me an idea… Can I get the event stream and decode it without knowing anything about the game’s design and of course without any source available? Now that’s an interesting quest!

Warning: if you see anything silly about ABC or Flash, that might be because I learned everything I know about it during this project. Any corrections welcome.

Overview of communication

Before going into the details of communication I had to figure out what’s happening in general - what connections are started, how much data is sent, what does the encapsulation look like, etc. That task is pretty trivial with help of Wireshark. I set it to capture everything, connected, entered the game for a moment and left. So what could I learn from this short record?

There were three communication channels:

  • An unencrypted HTTP GET request which returned one word (number) - probably something related to a version check or cache invalidation
  • An encrypted HTTP request which contained (thanks to SSL MITM and own CA injection) the first login and some details about the available game characters
  • Then some time later, a single TCP connection constantly streaming loads of small portions of data

Everything looks pretty obvious, but… the long TCP connection seems to contain just garbage. That usually means that the data is either compressed or encrypted. The second one being more likely since that game would aim for minimum delays and small update packets. After checking the usual options - can ‘file’ identify the stream, does it start with any standard header, can standard tools unpack it, does it contain any readable strings - I found that the only answer was “no”. Encryption it is then…

Going deeper

Finding out more details about encryption was pretty easy. I disassembled the code in .swf and noticed that even though the main code was obfuscated, the libraries weren’t. This only required a quick grep for “cipher” and I knew the library was from crypto.hurlant.com. After a moment it was obvious which function to look for (Crypto.getCipher). This was very useful, because at that point I knew both what functions were used and what arguments they take. Calls to that function were present in a couple of places as expected and what was left was to figure out where does the key come from. That part was a bit harder.

Flash decompilers are very poor quality

I tried to find some decompiler for Flash to find the source of the key generation. Browsing through the ABC was a bit too hard (even though the code itself is quite simple) and I could not know how far the answer was. Maybe the key was generated in a trivial function, maybe it was split over many stages…

I thought: why not decompile everything to AS source code, surely it can’t be that tricky. Unfortunately it was - I got 3 different results from the software I could get my hands on: Complete crash, crash only on some modules, refusing to decompile. To be honest, I did not expect hexrays quality but something that does the minimum would be great. The partial results I got were completely silly and quite surprising. It seems that the easily available flash decompilers don’t even try to recreate what the code does - they just switch instructions into AS statements. If your ABC says “inclocal_i …; declocal_i …” that’s exactly what you’ll see done on a temporary variable… which turns out to be unused afterwards.

At this point I’ve done what any other insane person would do - started writing own decompiler. It’s not that hard really. Actually you can grab the last compiler you wrote (everyone wrote one, right?), and reuse large parts of it. Operations gets split into blocks linked to other blocks, you name every stack position and local variable with unique name, convert the code into SSA form, remove dead blocks, do peep-hole optimisations to strip silly obfuscation code, detect which loops can be converted into while-s/for-s and spit out the code… At stage 3 or 4, I noticed that the whole idea is silly and although the project was working nicely (probably at this point gave better results than some commercial solutions I tried before, even if not all opcodes were implemented) this is just wasting time. I had one simple task and this tool would take too long to complete even if it was limited to just ABC cleanup and propagating argument names into variables.

A better way

Since I didn’t really care about where the password comes from, but only what it is, I decided to do something else… print out the key itself. Apparently there’s this thing called flash debug player and you can use it to see the output of all “trace(…)” calls with it - awesome! Ah… and you need a 32b windows system for it, otherwise it’s not going to work - that was a bit painful, but virtualbox solved this problem quite well.

What I needed to do was to inject a bit of custom own code into the existing .swf, run it under the debug version of flash and collect the result. Injecting the code seems pretty hard if you want to do it yourself. There are loads of projects which will disassemble the .swf file, but almost none which can reassemble it again. Fortunately Apparat does this in a quite simple way - by providing an API written in Scala. It provides a complete, magical framework for modifying the code and the only thing you need to provide is a filter to choose where the modification should be applied and a new template to expand in that place.

Locating the needed part was pretty easy. It looked something like this:

getlocal 1
getlocal 2
getlocal 3
call_property getCipher, 3

This means just “load 3 local variables on the stack and call getCipher”. Of course the parameters were known from the library source:

getCipher(name:String, key:ByteArray, pad:IPad=null):ICipher

I was interested in the key and the cipher. The pad was not needed after all - because the cipher turned out to be “rc4”. The way it works (simplified) is similar to a combination of a known seed (the key), a pseudo-random number generator and the plaintext xor’ed with the PRNG’s output. Very simple design and there are lots of libraries available to verify the result.

Because only two arguments were interesting, it meant inserting the call to “trace” somewhere before the call to the library can give me the needed data. The only thing to remember is that the stack needs to be returned to exactly the same state as before, otherwise the rest of the code would fail. None of the local variables can be overwritten either. This is a work for a spy: get the key, convert to string, print out, put everything back in its place. Here’s the full filter for Apparat, printing out the second parameter:

private lazy val traceCipherCall = {
  (GetLocal(1) ~ GetLocal(2) ~ GetLocal(3) ~ BytecodeChains.partial {
    case originalCall @ CallProperty(name, 3) if name == getCipherQName => originalCall
  }) ^^ {
    case (GetLocal(1) ~ GetLocal(2) ~ GetLocal(3) ~ x) => List[AbstractOp](
      FindPropStrict(traceQName),
      GetLocal(2),
      ConvertString(),
      CallPropVoid(traceQName, 1),
      GetLocal(1),
      GetLocal(2),
      GetLocal(3),
      x)
    case _ => error("internal fail")
  }
}

This is fairly simple - match 3 getlocal-s followed by a call to “getCipher”. This sequence is pretty specific, so it matched only in 2 places - exactly where it was needed. Now I needed to get the output. One installation of windows + debug flash + tons of crap later, I discovered that… the application detects whether it’s running in a debugging environment and changes its configuration to use the test server instead.

Removing debugger detection

Two google searches later I found that the most likely way of debugger detection is checking the “isDebugger” variable. Unfortunately the check wasn’t done in a simple if/else way. The code pushed the “isDebugger” string to some other function which then saved the results of the number of checks. I really didn’t want to get into the details of how that happens. The easiest alternative was to use some other flag which was guaranteed to be false. Luckily, “avHardwareDisable” turned out to be a good candidate. The modification was quickly applied… and the app went into testing mode again. Something else was missing - the most trivial fix was not enough.

The second simplest thing to try was to look for the string describing the testing server and browse the code around it for some condition checking. And it was there! Some function was doing what looked like a dns lookup (still not sure if that was the case) and comparing the result to a known value - apparently running in a debugger influenced the result somehow. Since adding the “testing” string depended on the result of this function, it was a good enough candidate for patching. Fortunately the following sequence:

  convert_string
  equals
  getlocal_2
  if_true "L2"
  not
"L2:"
  return_value

is not that popular. What happens here is: two strings get compared first, then the result gets flipped if local_2 is not set, then the result is returned. It’s basically returning the result of string comparison xor local_2 - probably it went through bytecode obfuscation. The good way to hardcode the result without messing up the stack was changing the beginning to:

pop
push_false
push_true

And… success! On the next try the application connected to a production server and printed out the key it used. Not only that, but the key was successfully used later on to read and write any of the events sent between the client and the server. But that’s a topic for another post in the future.

Lessons if you want to protect your communication:

  • obfuscate the object files after including the libraries, not before - especially if the libraries are open-source
  • do a random key exchange instead of hardcoding your passphrase
  • add CA verification (is it possible in flash? not sure)
  • disassemble your own binaries to find issues
  • … all of the above will be worked around and your app will be hacked anyway, get used to the idea :)

Was it useful?
BTC: 18AMX5sowkLoR78Lns7Qz7fEzSTVEqCpqS
DOGE: DDYKHC6EBRxR7Ac2ByVLEuxmrhwo3xV3kk