Backing up your Linux machine

Published Saturday 8 April 2023 at 12:00 WEST

I wrote the following advice for members of the Astrophysics Research Group at the University of Surrey. I publish it here, unchanged, in the hope that it might be of use to others too.

The University does not provide automated backup of Linux machines, but instead leaves you to worry about this yourself. I suspect that most Linux users back up their data in a happy-go-lucky way, using GitLab and external storage, performing commits and file transfers only when they remember. But with some intermediate-level Bash tinkering you can arrange for full backups to be made automatically to a schedule. Setting this up should take half an hour or so. The first backup will take considerably longer but will not need your attention (my home drive is currently 180 Gb in size and took 15 hrs to first backup) while subsequent backups will take only minutes.

Badly performed backups can result in the loss of data. Be careful. I will outline the prescription that I use for backing up data, but you must decide for yourself if you wish to adopt it too.

Your data is threatened by hardware failure, operating system failure, theft, and attack. But the greatest threat is likely yourself since (if you are are like me) you are forever breaking code and deleting files. So you want to make frequent backups that are easily accessed. In doing so you should really follow the 3-2-1 principle, keeping three copies of your data (the original and two copies), using at least two types of media, one of which is off site. Ideally, one of the backup copies should be kept offline, and you should be able to check that both backups are functioning (this is sometimes called the `3-2-1-1-0 principle', the additional digits representing one offine copy, and zero repository errors).

The two backup media can be provided by a cloud file system and an external storage device. The cloud file system is, of course, off site and the external storage device is offline (or at least it is when you unplug it from your machine). The University allows you 1 TB of storage on Microsoft OneDrive and if you can live with the shame of using a Microsoft product this provides your cloud file system. However, the University does not provide you with external storage devices, which you must buy yourself. A 1 TB hard disc drive (HDD) or solid-state drive (SSD) costs less than £100, and a 256 Gb thumb drive costs £25. These devices last forever and you will not regret buying one.

To make your backups you can use the programme Restic. This is fast and easy to use, and consequently takes the pain out of backing up your data. Importantly, it allows you to check that a backup is error free. With OneDrive, external storage, and Restic, you can build a backup system that satisfies all of your requirements.

Accessing OneDrive: Rclone

You have access to OneDrive file storage with your University IT account. OneDrive is integrated with Windows and various consumer and business applications like Microsoft 365, from which Microsoft expect you to access it. However, you can use Rclone to manage a cloud file system from any machine, as if it were local. To install Rclone and configure it to work with OneDrive follow the guide from It's FOSS. Ignore Step 6 (we will not need to mount OneDrive on startup). With OneDrive still mounted, create a backup repository (a OneDrive directory called Backup) using a new terminal, as follows.

$ mkdir OneDrive/Backup

Then unmount OneDrive by returning to the first terminal and stopping Rclone with C-c.

Backing up to OneDrive with Restic

Restic performs full backups of a specified directory but avoids duplication by transmitting only new and modified files to the repository each time a backup is made. This makes all but the first backup very fast and the repository itself very light. Assuming that you are using Ubuntu, you can install Restic with APT as follows. (For other Linux distibutions, see here.)

$ sudo apt install restic

The Restic documentation is available here and there are three handy tutorials at Remo Hoeppli's Medium blog, LabSrc, and J.L. Falcone's webpage. (Unfortunately, the last of these calls Restic backups `incremental'. In fact they are full.) Let us go through the essentials.

Initializing the backup repository

First you must initialize the backup repository with the following command, which will prompt you to choose and confirm a password. (I assume that OneDrive is secure and therefore use a trivial password, namely password.)

$ restic -r rclone:onedrive:Backup init

Performing the first backup

You are nearly ready to make the first backup. This will take considerable time, although subsequent backups will be much quicker. First, however, you should make a dry run, which reduces the risk of data loss. To do so, run the following, where Path/ is the path to the directory you wish to back up (I suggest you back up your home directory, ~/), entering your password when prompted.

$ restic -r rclone:onedrive:Backup backup Path/ --dry-run

Before the real thing, though, another word of warning: do not access your files while a backup is being made. It is not the end of the world if you do, but there is a chance that an open file will be corrupted. For this reason, it is best to perform the backup overnight, when you are not using your machine. You will probably want to use the University network for this. (When I checked, my home upload speed was about 50 times smaller, at 18 kbps, than the University's, at 915 kbps.) With all these preliminaries out of the way, you can make your first backup by running the previous command without the dry-run flag.

$ restic -r rclone:onedrive:Backup backup Path/

Viewing the backup

Each backup of a directory is called a `snapshot'. Once your first backup is complete, you should have a single snapshot in your OneDrive repository. At any time you may use the following command to view a list of all available snapshots.

$ restic -r rclone:onedrive:Backup snapshots

This time you should see just one.

Checking the backup

You should also check that the repository is error free, as follows.

$ restic -r rclone:onedrive:Backup check

Again, you can issue this command at any time.

Automating the backup

You can schedule Restic to run automatically using the utility Cron. To do this you create a script that is run by the Cron daemon at specified times. Suppose you wish to keep daily snapshots for the last seven days, weekly snapshots for the last month, and monthly snapshots for the last year. You can do this using the Restic command forget.

First create the shell script /usr/local/bin/linux_backup.sh containing the following. (You will need to use sudo to write to this directory.)

#!/bin/bash

# Linux Backup: back up home directory to multiple respositories using Restic
# Copyright (C) 2023 Amery Gration (amerygration@proton.me)
# 
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <https://www.gnu.org/licenses/>.

#############################################################################
# Back up home directory to a repository
# Arguments:
#   Repository path
# Returns:
#   0 if all operations successful, non-zero otherwise
#############################################################################
backup () {
    # Set log file names
    TIMESTAMP=$(date +"%Y%m%d")_$(date +"%X")
    BACKUP_LOG=~/.cache/restic/log/${TIMESTAMP}_backup.log
    FORGET_LOG=~/.cache/restic/log/${TIMESTAMP}_forget.log
    CHECK_LOG=~/.cache/restic/log/${TIMESTAMP}_check.log

    # Use Restic (https://restic.net/) to back up the home directory
    restic -r $1 backup --exclude-caches ~/ | tee $BACKUP_LOG
    restic -r $1 forget --keep-within-daily 7d --keep-within-weekly 1m\
           --keep-within-monthly 1y --prune | tee $FORGET_LOG
    restic -r $1 check | tee $CHECK_LOG
}

# Ensure log directory exists
if [ ! -d ~/.cache/restic/log ]; then
    mkdir -p ~/.cache/restic/log;
fi

# Move existing log files to trash
if [ ! -z "$(ls -A ~/.cache/restic/log)" ]; then
    gio trash ~/.cache/restic/log/*
fi

# Back up home directory to specified repositories
backup rclone:onedrive:Backup

The results of backup, forget and check are written to the log files .cache/restic/log/${TIMESTAMP}_backup.log, .cache/restic/log/${TIMESTAMP}_forget.log, and .cache/restic/log/${TIMESTAMP}_check.log. You should inspect these intermittently to make sure that the repository is in good health. If you have the Mail utility (or equivalent) configured to send email from the command line, you can send an alert to yourself when a backup fails or a repository error is discovered. Instead of the previous script use the following one, changing the email address as required.

#!/bin/bash

# Linux Backup: back up home directory to multiple respositories using Restic
# Copyright (C) 2023 Amery Gration (amerygration@proton.me)
# 
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <https://www.gnu.org/licenses/>.

#############################################################################
# Back up home directory to a repository
# Arguments:
#   Repository path
# Returns:
#   0 if all operations successful, non-zero otherwise
#############################################################################
backup () {
    # Set log file names
    TIMESTAMP=$(date +"%Y%m%d")_$(date +"%X")
    BACKUP_LOG=~/.cache/restic/log/${TIMESTAMP}_backup.log
    FORGET_LOG=~/.cache/restic/log/${TIMESTAMP}_forget.log
    CHECK_LOG=~/.cache/restic/log/${TIMESTAMP}_check.log

    # Use Restic (https://restic.net/) to back up the home directory
    restic -r $1 backup --exclude-caches ~/ | tee $BACKUP_LOG
    BACKUP_STATUS=$PIPESTATUS
    restic -r $1 forget --keep-within-daily 7d --keep-within-weekly 1m\
           --keep-within-monthly 1y --prune | tee $FORGET_LOG
    FORGET_STATUS=$PIPESTATUS
    restic -r $1 check | tee $CHECK_LOG
    CHECK_STATUS=$PIPESTATUS

    # Send email alert if the backup fails
    if [ $BACKUP_STATUS != 0 ]; then
        echo "$(<$BACKUP_LOG)" | mail -s "Backup failed: $1"\
                                      name@example.com
    fi
    if [ $FORGET_STATUS != 0 ]; then
        echo "$(<$FORGET_LOG)" | mail -s "Forget failed: $1"\
                                      name@example.com
    fi
    if [ $CHECK_STATUS != 0 ]; then
        echo "$(<$CHECK_LOG)" | mail -s "Backup repository errors: $1"\
                                     name@example.com
    fi
}

# Ensure log directory exists
if [ ! -d ~/.cache/restic/log ]; then
    mkdir -p ~/.cache/restic/log;
fi

# Move existing log files to trash
if [ ! -z "$(ls -A ~/.cache/restic/log)" ]; then
    gio trash ~/.cache/restic/log/*
fi

# Back up home directory to specified repositories
backup rclone:onedrive:Backup

This will again write to file the outputs of the backup, forget, and check commands. If you are online it will also email you when any of these commands fail. If you are offline when this script runs, however, it will not be able to email you, and because you are offline your backup will have failed.

If you keep your machine on overnight you might schedule the script to be run at, say, 02.01. To do so, call Crontab from the command line.

$ crontab -e

This will open a Cron file (containing a table of scheduled jobs) in your default text editor. Append the following line, being sure to leave a carriage return at its end.

1 2 * * * sh /usr/local/bin/linux_backup.sh

If you do not leave your machine on overnight, it is probably best to schedule your backups for startup. Instead, append the following to your Cron file.

@reboot sh /usr/local/bin/linux_backup.sh

This will back up your machine when you reboot (i.e. start or restart) it.

You still want to avoid writing to files while the backup is being made. If you do write to file then it is possible that the logs will show the backup to have failed. However, it will not be the whole backup that has failed, only the backup of the files to which you have written.

To use Restic commands without having to manually enter the password for a repository (as we require when using Cron) you may set Restic's environment variable RESTIC_PASSWORD. To make this available system wide, add the following to the file /etc/environment.

RESTIC_PASSWORD=password

(This file should already exist and probably contains the PATH variable. If it doesn't exist then create it.)

Be sure that the scheduled job is being run as expected by checking that a snapshot has been created and by checking the log files.

Recovering the backup

To recover data from your backup first create the directory to which this data will be restored, /tmp/restored_work. You can restore entire snapshots or specific files and directories from a single snapshot. To restore a specific directory or file (with the path Path/) from a specific snapshot, with ID snapshotID, use the following command.

$ restic -r rclone:onedrive:Backup restore snapshotID --target /tmp/restored_work --include Path/To/Dir/Or/file

To restore the same directory or file from the latest snapshot use this.

$ restic -r rclone:onedrive:Backup restore latest --target /tmp/restored_work --include Path/To/Dir/Or/file

To restore a specific snapshot, with ID snapshotID, use this (it will of course take considerable time).

$ restic -r rclone:onedrive:Backup restore snapshotID --target /tmp/restored_work

And to restore the latest snapshot use this.

$ restic -r rclone:onedrive:Backup restore latest --target /tmp/restored_work

The really neat thing is that the directory /tmp/restored_work need not be on the machine from which your back up has been made. If this machine is, for example, irreparably damaged, you can simply restore data to a new machine in the same way (of course you will need your repository password).

Backing up to an external storage device with Restic

OneDrive is itself backed up, and this might make you think that a second storage medium is redundant. (For details on recovering data from OneDrive see Microsoft's advice here.) But the point is that OneDrive may be unavailable (if, say, your connection to it is broken) and that a third party has control of your account, which might be suspended or closed at any point. Hence the need for a second backup medium in the form of an external storage device.

On your external storage device again create a repository called Backup. This will be available as Path/To/External/Storage/Device/Backup.

$ restic -r Path/To/External/Storage/Device/Backup init

Then back up to it using the following command.

$ restic -r Path/To/External/Storage/Device/Backup backup Path/ --dry-run

If all is well run the same command without the dry-run flag. This will again take some considerable time.

$ restic -r Path/To/External/Storage/Device/Backup backup Path/

To schedule automatic backups using Cron we need change only the file /usr/local/bin/linux_backup.sh. Add the following line to it.

backup Path/To/External/Storage/Device/Backup

Again, check that the scheduled job runs as expected.

Final words

You are all done. If you have set things up so that Mail sends you alerts when a backup fails or a repository error is detected then you need never think about this again. If not then it is wise to check your log files intermittently.