Server Maintenance: Cleanup, Backup, and Restoration

Overview
Creative Commons License: CC-BY Questions:
  • How can I back up my Galaxy?

  • What data should be included?

  • How can I ensure jobs get cleaned up appropriately?

  • How do I maintain a Galaxy server?

  • What happens if I lose everything?

Objectives:
  • Learn about different maintenance steps

  • Setup postgres backups

  • Setup cleanups

  • Learn what to back up and how to recover

Requirements:
Time estimation: 30 minutes
Supporting Materials:
Published: Apr 16, 2023
Last modification: Jul 13, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00324
rating Rating: 4.5 (0 recent ratings, 2 all time)
version Revision: 5

Keeping your Galaxy cleaned up is an important way to retain space, especially since for many groups that is the limiting factor in their deployment.

Additionally, backups are necessary to ensure that if you ever experience system level failures, you can safely recover from these.

Agenda
  1. Cleanups
    1. User Created Files
    2. Galaxy Created Files
  2. Backups
    1. Galaxy
    2. Database Backups
    3. Data Backup
  3. Restoration
    1. Restoring the Database
    2. Restoring Galaxy
    3. Restoring User Data
Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon

Cleanups

There are two kinds of data that are produced when running a Galaxy: files users create and then delete or purge, and then files Galaxy creates itself. Both of these can be cleaned to save space.

User Created Files

You can use gxadmin to cleanup user created files. gxadmin is covered in more detail in its own dedicated tutorial.

Hands-on: Installing gxadmin with Ansible
  1. Edit your requirements.yml and add the following:

    --- a/requirements.yml
    +++ b/requirements.yml
    @@ -11,3 +11,6 @@
       version: 0.3.1
     - src: usegalaxy_eu.certbot
       version: 0.1.11
    +# gxadmin (used in cleanup, and later monitoring.)
    +- src: galaxyproject.gxadmin
    +  version: 0.0.12
       
    
  2. Install the role with:

    Input: Bash
    ansible-galaxy install -p roles -r requirements.yml
    
  3. Add the role to your playbook:

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -27,3 +27,4 @@
           become: true
           become_user: "{{ galaxy_user_name }}"
         - galaxyproject.nginx
    +    - galaxyproject.gxadmin
       
    
  4. Setup a cleanup task to run regularly:

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -28,3 +28,11 @@
           become_user: "{{ galaxy_user_name }}"
         - galaxyproject.nginx
         - galaxyproject.gxadmin
    +  post_tasks:
    +    - name: Setup gxadmin cleanup task
    +      ansible.builtin.cron:
    +        name: "Cleanup Old User Data"
    +        user: galaxy # Run as the Galaxy user
    +        minute: "0"
    +        hour: "0"
    +        job: "SHELL=/bin/bash source {{ galaxy_venv_dir }}/bin/activate &&  GALAXY_LOG_DIR=/tmp/gxadmin/ GALAXY_ROOT={{ galaxy_root }}/server GALAXY_CONFIG_FILE={{ galaxy_config_file }} /usr/local/bin/gxadmin galaxy cleanup 60"
       
    

    This will cause datasets deleted for more than 60 days to be purged.

  5. Run the playbook

    Input: Bash
    ansible-playbook galaxy.yml
    

Whenever gxadmin runs, it will create logs you can read in /tmp/gxadmin which you can check later.

Galaxy Created Files

Before we begin backing up our Galaxy data, let’s set up automated cleanups to ensure we backup the minimal required set of data.

Hands-on: Configuring PostgreSQL Backups
  1. Edit galaxy.yml to install tmpwatch (if using RHEL/CentOS/Rocky) and tmpreaper if using Debian/Ubuntu

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -21,6 +21,14 @@
         - name: Install Dependencies
           package:
             name: ['acl', 'bzip2', 'git', 'make', 'tar', 'python3-venv', 'python3-setuptools']
    +    - name: Install RHEL/CentOS/Rocky specific dependencies
    +      package:
    +        name: ['tmpwatch']
    +      when: ansible_os_family == 'RedHat'
    +    - name: Install Debian/Ubuntu specific dependencies
    +      package:
    +        name: ['tmpreaper']
    +      when: ansible_os_family == 'Debian'
       roles:
         - galaxyproject.galaxy
         - role: galaxyproject.miniconda
       
    
  2. Edit group_vars/galaxyservers.yml and add some variables to configure PostgreSQL:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -2,6 +2,7 @@
     galaxy_create_user: true # False by default, as e.g. you might have a 'galaxy' user provided by LDAP or AD.
     galaxy_separate_privileges: true # Best practices for security, configuration is owned by 'root' (or a different user) than the processes
     galaxy_manage_paths: true # False by default as your administrator might e.g. have root_squash enabled on NFS. Here we can create the directories so it's fine.
    +galaxy_manage_cleanup: true
     galaxy_layout: root-dir
     galaxy_root: /srv/galaxy
     galaxy_user: {name: "{{ galaxy_user_name }}", shell: /bin/bash}
       
    
  3. Input: Bash
    ansible-playbook galaxy.yml
    
  4. Check out the cleanup task which has been generated in: /etc/cron.d/ansible_galaxy_tmpclean

This will setup tmpwatch to cleanup a few folders:

  • the job working directory, important if you set cleanup: onsuccess, to cleanup old failed jobs once you’re done debugging their failures.
  • the new file upload path, to catch uploaded temporary files that are no longer necessary.

Backups

There are a few important things to back up with your Ansible Galaxy:

  • Galaxy
    • The Galaxy-managed config files
    • The playbooks
  • The Database
  • The Data

Galaxy

By using Ansible, as long as you are storing your playbooks on another system, you are generally safe from failues of the Galaxy node, and you’ll be able to re-run your playbook at a later date.

However, playbooks often do not include:

  • Which tools you’ve installed (have you ever installed a tool outside of ephemeris? This might be lost!)
  • Conda environments, which will not always resolve identically over time. If strong guarantees of reproducibility are important, then consider backing these up as well.

Database Backups

We’re setting a couple of variables to control the automatic backups, they’ll be placed in the /data/backups folder next to our user uploaded Galaxy data.

Hands-on: Configuring PostgreSQL Backups
  1. Edit group_vars/galaxyservers.yml and add some variables to configure PostgreSQL:

    --- a/group_vars/dbservers.yml
    +++ b/group_vars/dbservers.yml
    @@ -5,3 +5,7 @@ postgresql_objects_users:
     postgresql_objects_databases:
       - name: "{{ galaxy_db_name }}"
         owner: "{{ galaxy_user_name }}"
    +
    +# PostgreSQL Backups
    +postgresql_backup_dir: /data/backups
    +postgresql_backup_local_dir: "{{ '~postgres' | expanduser }}/backups"
       
    

This will setup our backups to run as a cron job.

Data Backup

With Galaxy it is technically only necessary to backup your inputs, as the downstream files should, in theory be re-createable due to the reproducibility of Galaxy.

In practice, some groups either choose to not backup, or to backup everything, often to extremely cheap and slow storage like Glacier or a tape library.

Most groups choose to implement this as a custom cron job, e.g.

post_tasks:
  - name: Setup backup cron job
    ansible.builtin.cron:
      name: "Backup User Data"
      minute: "0"
      hour: "5,2"
      job: "rsync -avr /data/galaxy/ backup@backup.example.org:/backups/$(date -I)/"

People who, let’s say, care strongly about backups will often insist that you need to version files. This is of course unnecessary in the Galaxy case as files are essentially Write Once Read Many (WORM)s, which is a really good file storage practice. Files can get removed so it isn’t a true WORM strategy that you’d use for e.g. audit logs, but it is close. That said, since files never get changed, keeping multiple versions is unnecesary.

Please consider communicating very well with your users what the data backup policy is.

Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

Restoration

Sometimes failures happen! We’re sorry you have to read this section.

Restoring the Database

This procedure is more complicated, you can read about the restoration procedure in the associated PR.

This step assumes you have pre-existing backups in place, you must check this first:

ls /data/backups/

If you have backups, you’re ready to restore:

# Stop Galaxy, you do NOT want galaxy to connect mid-restoration in case it
# tries to modify the database.
sudo systemctl stop galaxy

# Stop the database
sudo systemctl stop postgresql
# Ensure that it is stopped
sudo systemctl status postgresql

# Begin the backup procedure by becoming postgres:
sudo su - postgres

# Move the current, live database to a backup location just in case:
mkdir /tmp/test/

# ====
# NOTE THAT THIS NUMBER MAY BE DIFFERENT FOR YOU!
# You will need to change 12 to whatever version of postgres you're running
# in every subsequent command
# ====
mv /var/lib/postgresql/12/main/* /tmp/test/

# Add backup
rsync -av /data/backups/YOUR_LATEST_BACKUP/ /var/lib/postgresql/12/main
# Add the restore_command, to your backup file:
# restore_command = 'cp "/tmp/backup/current/wal/%f" "%p"'
$EDITOR ./12/main/postgresql.auto.conf

# Touch a recovery file
touch /var/lib/postgresql/12/main/recovery.signal

# As $username (with sudo right)
sudo systemctl restart postgresql
sudo systemctl status postgresql
# Restart Galaxy
sudo systemctl start galaxy

If you encounter issues, we suggest reading Lucille’s log of her experiences restoring as you might encounter similar issues.

Restoring Galaxy

Restoring Galaxy is easy via Ansible (maybe ensuring users cannot login by disabling the routes in nginx)

ansible-playbook galaxy.yml

And if you are following best practices, you probably have your tools stored in a YAML file to use with Ephemeris:

shed-tools install -g https://galaxy.example.org -a <api-key> -t our_tools.yml

Restoring User Data

This should simply be rsyncing your data from the backup location back into /data/galaxy.

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon