8.6 KiB
Backup Scripts
This repository contains the scripts that control the backup of the various services and services
that I run. These scripts mostly just use rsync
to copy files onto a RAID array, and then sync
those files with an Amazon S3 bucket and a tape backup.
There are a number of scripts that drive the backup process. These are found in the scripts
directory.
scripts/backup.sh The main driver script
scripts/backup-docker.sh Backup remote docker volumes
scripts/backup-git.sh Backup git repositories (as bare repos)
scripts/backup-output-s3.sh Write backup archives to an S3 bucket
scripts/backup-output-tape.sh Write backup archives to a tape drive
scripts/backup-output-tape-remote.sh Write backup archives to a remote tape drive
scripts/backup-cycle-logs.sh Rotate 'backup.log' log files
To use the scripts you'll first want to create a backup.env
file. The backup.sh
driver script
expects to find the backup.env
file in the repo directory, identified by the script as the parent
of the directory in which the backup.sh
script resides. This file should contain the environment
variables that the backup scripts require. The fundamental environment variables are as follows:
Variable | Description |
---|---|
BACKUP_DIR |
The backup directory |
BACKUP_OUTPUTS |
Comma-separated list of outputs |
The BACKUP_DIR
directory is where all the backup files will be created: the incremental backups
and the produced archives. This directory will be created if it does not already exist.
The BACKUP_DIR
will also contain two other files:
- The
backup.log
, recording the output of the last backup. Previousbackup.log
files will be saved tobackup.log.1
,backup.log.2
, and so on, untilbackup.log.10
. - The
backup.index
file, which contains a list of all the archives created by the backup process. These are the files that are copied by thebackup-output-*.sh
scripts to the various outputs. Thisbackup.index
file is stored to the backup output(s).
The BACKUP_OUTPUTS
environment variable contains a comma-separated list of outputs. These
correspond to the backup-output-*.sh
scripts. So if BACKUP_OUTPUTS
is s3,tape
, then the
backup-output-s3.sh
script and then the backup-output-tape.sh
script will be run.
Logging
All standard output and standard error of the backup.sh
script and any of its child processes will
be written to the backup.log
file in the BACKUP_DIR
. Any previous backup.log
file will be
copied to backup.log.1
and rotated up to backup.log.10
.
Because the backup.log
file is written to during the entire backup process, it is not written to
an output by any of the backup-output-*.sh
scripts.
Backup Sources
In this section is the documentation for the different sources that can be backed up by these
scripts. Each source is controlled by a set of input files that are found in the same directory as
where the backup.sh
script is invoked (not where the script lives).
For example, if you are in /home/foo
and this repository is in /home/foo/backup
, you may be
invoking the backup.sh
script as follows:
$ ~/backup/scripts/backup.sh
In this case, the parent directory of the script is ~/backup
, so you will want to create the
backup.env
file and the various sources files (e.g. git.list
and .docker.list
files) in the
~/backup
directory. The .gitignore
is configured to exclude these configuration and list files.
Backup Git Repositories
To backup git repositories, create a git.list
file. Each line of the file should be a repository
URL of the sort expected by git-clone
. Blank lines, or lines that start with a #
are ignored.
The backup-git.sh
script will clone bare repositories for each non-blank and non-comment line in
the git.list
file. These repositories will go in the BACKUP_DIR/git/
directory. Each repository
is split into an organisation and a repository, using the /
in the URL as a separator. These
then serve as sub-directories in the BACKUP_DIR/git/
directory. Each repository is then archived
into a .tar
file in the BACKUP_DIR/git/
directory, named after the organisation and repository.
For example, consider a git.list
file with the following contents:
https://github.com/linux/linux
https://git.blakerain.com/BlakeRain/backup
This will result in two bare repos being created and two archives of those repos:
BACKUP_DIR/git/linux/linux/
BACKUP_DIR/git/BlakeRain/backup/
BACKUP_DIR/git/linux.linux.tar
BACKUP_DIR/git/BlakeRain.backup.tar
Those last two files, the .tar
archives, will be written to the backup.index
file. They are
what will be copied to the ouput(s) specified in the BACKUP_OUTPUTS
environment variable by the
backup-output-*.sh
scripts.
Backup Docker Volumes
To backup Docker volumes, create a hostname.docker.list
file, where hostname
is replaced with
the hostname on which the Docker volumes are located. Multiple .docker.list
files can be
specified. Each .docker.list
file contains a list of the Docker volume names to backup from that
host. Blank lines, or lines that start with a #
will be ignored.
The backup-docker.sh
script will use rsync
to copy from the /var/lib/docker/volumes
directory
for each volume named in the .docker.list
file. The _data
directory found within each volume
directory contains the files that will be copied. After copying with rsync
, the script will create
a .tar
of the docker volume. Each archive will be named after the host and the volume, separated
by a hyphen.
For example, consider a remote.docker.list
file with the following contents:
minecraft_data
minecraft_mods
When the backup-docker.sh
script runs, it will rsync
the following remote URLs into the
BACKUP_DIR/docker
directory.
me@remote:/var/lib/docker/volumes/minecraft_data/_data/ ->
BACKUP_DIR/docker/remote/minecraft_data/
me@remote:/var/lib/docker/volumes/minecraft_mods/_data/ ->
BACKUP_DIR/docker/remote/minecraft_data/
Here me
will be replaced with the current hostname of the machine performing the backup. You will
want to create a user on the remote host with the same name as the hostname of the backup machine
and do the usual SSH key shuffle.
The Docker volumes given above will be archived into two files:
BACKUP_DIR/docker/remote.minecraft_data.tar
BACKUP_DIR/docker/remote.minecraft_mods.tar
These last two archive files are what will be written to the backup.index
file. They are what will
be copied to the output(s) specified in the BACKUP_OUTPUTS
environment variable by the
backup-output-*.sh
scripts.
Backup File Paths
To backup file paths, create a group.paths.list
file, where group
is a name used to group these
backup files. Multiple .paths.list
files can be specified. Each .paths.list
file contains a list
of backup path specifiers. Blank lines, or lines that start with a #
will be ignored.
A backup path specifier has the following syntax:
backup-path ::= mode ":" /.*$/
Here mode
is the mode of retrieval for the backup paths. Currently the only supported mode is
rsync
. In the rsync
mode, the remainder of the line following the :
is the source argument
to rsync
. For example, to backup log files in /var/log
on a host host
using a user user
, you
can specify the following line:
rsync: user@host:/var/log/*.log
The backup-path.sh
script will parse the lines of a .paths.list
file and instruct rsync
to
copy the files from the source with the options a
, u
, and v
. These options have the following
effect:
-a
enables archive mode, which will perform recursion and preserve symbolic links, file permissions, ownership, and timestamps.-u
will skip files that are newer on the receiver, which means that previous backups will simply be updated rather than a complete backup being taken. This is consistent with the other backup scripts in this repo.-v
is used so that you can see the files that are being copied, and they will be written to the backup log.
The destination of the files will be in the paths/group
directory under the BACKUP_DIR
, where
group
is the same name given in group.paths.list
. For example, if you had a file called
logs.paths.list
, all the rsync
copies will be to the BACKUP_DIR/paths/logs/
directory.
The group
is also used when creating the final tar
file. In the previous example the list file
called logs.paths.list
will be archived into BACKUP_DIR/paths/logs.tar
, and this is what will be
added to the backup.index
file.