backup/README.md

8.6 KiB

Backup Scripts

This repository contains the scripts that control the backup of the various services and services that I run. These scripts mostly just use rsync to copy files onto a RAID array, and then sync those files with an Amazon S3 bucket and a tape backup.

There are a number of scripts that drive the backup process. These are found in the scripts directory.

scripts/backup.sh                     The main driver script
scripts/backup-docker.sh              Backup remote docker volumes
scripts/backup-git.sh                 Backup git repositories (as bare repos)
scripts/backup-output-s3.sh           Write backup archives to an S3 bucket
scripts/backup-output-tape.sh         Write backup archives to a tape drive
scripts/backup-output-tape-remote.sh  Write backup archives to a remote tape drive
scripts/backup-cycle-logs.sh          Rotate 'backup.log' log files

To use the scripts you'll first want to create a backup.env file. The backup.sh driver script expects to find the backup.env file in the repo directory, identified by the script as the parent of the directory in which the backup.sh script resides. This file should contain the environment variables that the backup scripts require. The fundamental environment variables are as follows:

Variable Description
BACKUP_DIR The backup directory
BACKUP_OUTPUTS Comma-separated list of outputs

The BACKUP_DIR directory is where all the backup files will be created: the incremental backups and the produced archives. This directory will be created if it does not already exist.

The BACKUP_DIR will also contain two other files:

  1. The backup.log, recording the output of the last backup. Previous backup.log files will be saved to backup.log.1, backup.log.2, and so on, until backup.log.10.
  2. The backup.index file, which contains a list of all the archives created by the backup process. These are the files that are copied by the backup-output-*.sh scripts to the various outputs. This backup.index file is stored to the backup output(s).

The BACKUP_OUTPUTS environment variable contains a comma-separated list of outputs. These correspond to the backup-output-*.sh scripts. So if BACKUP_OUTPUTS is s3,tape, then the backup-output-s3.sh script and then the backup-output-tape.sh script will be run.

Logging

All standard output and standard error of the backup.sh script and any of its child processes will be written to the backup.log file in the BACKUP_DIR. Any previous backup.log file will be copied to backup.log.1 and rotated up to backup.log.10.

Because the backup.log file is written to during the entire backup process, it is not written to an output by any of the backup-output-*.sh scripts.

Backup Sources

In this section is the documentation for the different sources that can be backed up by these scripts. Each source is controlled by a set of input files that are found in the same directory as where the backup.sh script is invoked (not where the script lives).

For example, if you are in /home/foo and this repository is in /home/foo/backup, you may be invoking the backup.sh script as follows:

$ ~/backup/scripts/backup.sh

In this case, the parent directory of the script is ~/backup, so you will want to create the backup.env file and the various sources files (e.g. git.list and .docker.list files) in the ~/backup directory. The .gitignore is configured to exclude these configuration and list files.

Backup Git Repositories

To backup git repositories, create a git.list file. Each line of the file should be a repository URL of the sort expected by git-clone. Blank lines, or lines that start with a # are ignored.

The backup-git.sh script will clone bare repositories for each non-blank and non-comment line in the git.list file. These repositories will go in the BACKUP_DIR/git/ directory. Each repository is split into an organisation and a repository, using the / in the URL as a separator. These then serve as sub-directories in the BACKUP_DIR/git/ directory. Each repository is then archived into a .tar file in the BACKUP_DIR/git/ directory, named after the organisation and repository.

For example, consider a git.list file with the following contents:

https://github.com/linux/linux
https://git.blakerain.com/BlakeRain/backup

This will result in two bare repos being created and two archives of those repos:

BACKUP_DIR/git/linux/linux/
BACKUP_DIR/git/BlakeRain/backup/
BACKUP_DIR/git/linux.linux.tar
BACKUP_DIR/git/BlakeRain.backup.tar

Those last two files, the .tar archives, will be written to the backup.index file. They are what will be copied to the ouput(s) specified in the BACKUP_OUTPUTS environment variable by the backup-output-*.sh scripts.

Backup Docker Volumes

To backup Docker volumes, create a hostname.docker.list file, where hostname is replaced with the hostname on which the Docker volumes are located. Multiple .docker.list files can be specified. Each .docker.list file contains a list of the Docker volume names to backup from that host. Blank lines, or lines that start with a # will be ignored.

The backup-docker.sh script will use rsync to copy from the /var/lib/docker/volumes directory for each volume named in the .docker.list file. The _data directory found within each volume directory contains the files that will be copied. After copying with rsync, the script will create a .tar of the docker volume. Each archive will be named after the host and the volume, separated by a hyphen.

For example, consider a remote.docker.list file with the following contents:

minecraft_data
minecraft_mods

When the backup-docker.sh script runs, it will rsync the following remote URLs into the BACKUP_DIR/docker directory.

me@remote:/var/lib/docker/volumes/minecraft_data/_data/ ->
  BACKUP_DIR/docker/remote/minecraft_data/

me@remote:/var/lib/docker/volumes/minecraft_mods/_data/ ->
  BACKUP_DIR/docker/remote/minecraft_data/

Here me will be replaced with the current hostname of the machine performing the backup. You will want to create a user on the remote host with the same name as the hostname of the backup machine and do the usual SSH key shuffle.

The Docker volumes given above will be archived into two files:

BACKUP_DIR/docker/remote.minecraft_data.tar
BACKUP_DIR/docker/remote.minecraft_mods.tar

These last two archive files are what will be written to the backup.index file. They are what will be copied to the output(s) specified in the BACKUP_OUTPUTS environment variable by the backup-output-*.sh scripts.

Backup File Paths

To backup file paths, create a group.paths.list file, where group is a name used to group these backup files. Multiple .paths.list files can be specified. Each .paths.list file contains a list of backup path specifiers. Blank lines, or lines that start with a # will be ignored.

A backup path specifier has the following syntax:

backup-path ::= mode ":" /.*$/

Here mode is the mode of retrieval for the backup paths. Currently the only supported mode is rsync. In the rsync mode, the remainder of the line following the : is the source argument to rsync. For example, to backup log files in /var/log on a host host using a user user, you can specify the following line:

rsync: user@host:/var/log/*.log

The backup-path.sh script will parse the lines of a .paths.list file and instruct rsync to copy the files from the source with the options a, u, and v. These options have the following effect:

  • -a enables archive mode, which will perform recursion and preserve symbolic links, file permissions, ownership, and timestamps.
  • -u will skip files that are newer on the receiver, which means that previous backups will simply be updated rather than a complete backup being taken. This is consistent with the other backup scripts in this repo.
  • -v is used so that you can see the files that are being copied, and they will be written to the backup log.

The destination of the files will be in the paths/group directory under the BACKUP_DIR, where group is the same name given in group.paths.list. For example, if you had a file called logs.paths.list, all the rsync copies will be to the BACKUP_DIR/paths/logs/ directory.

The group is also used when creating the final tar file. In the previous example the list file called logs.paths.list will be archived into BACKUP_DIR/paths/logs.tar, and this is what will be added to the backup.index file.