Do you have a recommendation on decent backup practice? Thanks.

adwf · on April 13, 2014

The most important - and most forgotten/ignored - part of a good backup practice, is actually checking that you can restore the backup successfully. Ok, it's slightly less important than actually backing up at all in the first place, but only slightly.

You won't believe just how many organisations will have an automated script for a daily/weekly/monthly rolling backup of their database, but won't have any procedure to restore the previous days database to check the backup procedure went ok!

At the most basic, it's just another automated script on a spare box to ensure the restore returns ok. But people don't even bother with that...

sjclemmy · on April 13, 2014

Word to that. I was contracting once and I asked about the backups. Someone was diligently changing the tapes on a daily basis but the backup had actually been failing for months. How we laughed...

jvehent · on April 13, 2014

On small websites, I use git to track /etc and /var/www (where the webapp code is installed). 5 years ago, I would have recommended to rsync the entire server daily. But nowadays it's a lot easier to snapshot an EC2 instance daily. For the database, just use RDS which handles point-in-time snapshot for you.

If you're a developer with little linux experience, don't try to play the sunday morning sysadmin with fancy bash scripts on your startup. It takes a lot more time and resources to keep a system in good state than you can foresee.

ixmatus · on April 13, 2014

This is the reason for idempotent server configuration via Ansible, Chef, Salt, Puppet, or even better NixOS. Backing up a deployed application is kind of silly when the tagged releases should be sitting in your source control and that the press of a button have an entire server built and configured from scratch that is identical to the one you need. Then storing that configuration in source control.

Backing up configuration files and what-not is also pretty silly and indicative of bad devop practices.

The database should be versioned and maintained with a migration tool. So the only feature of the database that needs backing up should be the data it holds - live replication servers that can act as background backup slaves are perfect for this.

The only stuff you should be backing up on your servers are the latest log file rotations, user-generated data in the database, and any critical files that were also generated / uploaded by your users (but technically, you should be storing those in S3 with duplicates in another availability zone).

One other thought: people usually don't think to encrypt their backups. Duplicity + GPG + S3 is what I use right now and is super easy to setup with some fiddling. Once Tarsnap becomes a bit more professional I'll be moving all of my server backups to Tarsnap.

joevandyk · on April 13, 2014

> So the only feature of the database that needs backing up should be the data it holds - live replication servers that can act as background backup slaves are perfect for this.

Replication won't help if a "drop table users;" happens. You need point in time recovery.

You may want to backup log files and uploaded images/files.

The backups should be stored at your main service provider and also at a different unrelated place, so if AWS or whoever deletes your account, your backups aren't toast.

You also want an automated system for monitoring that your backups are happening and that you can restore from them.

If encrypting backups, don't forget to backup your encryption keys.

ixmatus · on April 14, 2014

That's what I meant by "background backup slave". The slave can handle point-in-time backups which is better than doing dumps or hot backups on the primary DB server.

Good point re: multiple providers.