Server Maintenance: Cleanup, Backup, and Restoration
Author(s) | Helena Rasche Lucille Delisle Nate Coraor |
OverviewQuestions:Objectives:
How can I back up my Galaxy?
What data should be included?
How can I ensure jobs get cleaned up appropriately?
How do I maintain a Galaxy server?
What happens if I lose everything?
Requirements:
Learn about different maintenance steps
Setup postgres backups
Setup cleanups
Learn what to back up and how to recover
- slides Slides: Galaxy Installation with Ansible
- tutorial Hands-on: Galaxy Installation with Ansible
- A VM with at least 2 vCPUs and 4 GB RAM, preferably running Ubuntu 18.04 - 20.04.
Time estimation: 30 minutesSupporting Materials:Published: Apr 16, 2023Last modification: Jul 13, 2023License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00324rating Rating: 4.5 (0 recent ratings, 2 all time)version Revision: 5
Keeping your Galaxy cleaned up is an important way to retain space, especially since for many groups that is the limiting factor in their deployment.
Additionally, backups are necessary to ensure that if you ever experience system level failures, you can safely recover from these.
Agenda
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon
Cleanups
There are two kinds of data that are produced when running a Galaxy: files users create and then delete or purge, and then files Galaxy creates itself. Both of these can be cleaned to save space.
User Created Files
You can use gxadmin
to cleanup user created files. gxadmin
is covered in more detail in its own dedicated
tutorial.
Hands-on: Installing gxadmin with Ansible
Edit your
requirements.yml
and add the following:--- a/requirements.yml +++ b/requirements.yml @@ -11,3 +11,6 @@ version: 0.3.1 - src: usegalaxy_eu.certbot version: 0.1.11 +# gxadmin (used in cleanup, and later monitoring.) +- src: galaxyproject.gxadmin + version: 0.0.12
Install the role with:
Input: Bashansible-galaxy install -p roles -r requirements.yml
Add the role to your playbook:
--- a/galaxy.yml +++ b/galaxy.yml @@ -27,3 +27,4 @@ become: true become_user: "{{ galaxy_user_name }}" - galaxyproject.nginx + - galaxyproject.gxadmin
Setup a cleanup task to run regularly:
--- a/galaxy.yml +++ b/galaxy.yml @@ -28,3 +28,11 @@ become_user: "{{ galaxy_user_name }}" - galaxyproject.nginx - galaxyproject.gxadmin + post_tasks: + - name: Setup gxadmin cleanup task + ansible.builtin.cron: + name: "Cleanup Old User Data" + user: galaxy # Run as the Galaxy user + minute: "0" + hour: "0" + job: "SHELL=/bin/bash source {{ galaxy_venv_dir }}/bin/activate && GALAXY_LOG_DIR=/tmp/gxadmin/ GALAXY_ROOT={{ galaxy_root }}/server GALAXY_CONFIG_FILE={{ galaxy_config_file }} /usr/local/bin/gxadmin galaxy cleanup 60"
This will cause datasets deleted for more than 60 days to be purged.
Run the playbook
Input: Bashansible-playbook galaxy.yml
Whenever gxadmin
runs, it will create logs you can read in /tmp/gxadmin
which you can check later.
Galaxy Created Files
Before we begin backing up our Galaxy data, let’s set up automated cleanups to ensure we backup the minimal required set of data.
Hands-on: Configuring PostgreSQL Backups
Edit
galaxy.yml
to installtmpwatch
(if using RHEL/CentOS/Rocky) andtmpreaper
if using Debian/Ubuntu--- a/galaxy.yml +++ b/galaxy.yml @@ -21,6 +21,14 @@ - name: Install Dependencies package: name: ['acl', 'bzip2', 'git', 'make', 'tar', 'python3-venv', 'python3-setuptools'] + - name: Install RHEL/CentOS/Rocky specific dependencies + package: + name: ['tmpwatch'] + when: ansible_os_family == 'RedHat' + - name: Install Debian/Ubuntu specific dependencies + package: + name: ['tmpreaper'] + when: ansible_os_family == 'Debian' roles: - galaxyproject.galaxy - role: galaxyproject.miniconda
Edit
group_vars/galaxyservers.yml
and add some variables to configure PostgreSQL:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -2,6 +2,7 @@ galaxy_create_user: true # False by default, as e.g. you might have a 'galaxy' user provided by LDAP or AD. galaxy_separate_privileges: true # Best practices for security, configuration is owned by 'root' (or a different user) than the processes galaxy_manage_paths: true # False by default as your administrator might e.g. have root_squash enabled on NFS. Here we can create the directories so it's fine. +galaxy_manage_cleanup: true galaxy_layout: root-dir galaxy_root: /srv/galaxy galaxy_user: {name: "{{ galaxy_user_name }}", shell: /bin/bash}
Input: Bashansible-playbook galaxy.yml
- Check out the cleanup task which has been generated in:
/etc/cron.d/ansible_galaxy_tmpclean
This will setup tmpwatch
to cleanup a few folders:
- the job working directory, important if you set
cleanup: onsuccess
, to cleanup old failed jobs once you’re done debugging their failures. - the new file upload path, to catch uploaded temporary files that are no longer necessary.
Backups
There are a few important things to back up with your Ansible Galaxy:
- Galaxy
- The Galaxy-managed config files
- The playbooks
- The Database
- The Data
Galaxy
By using Ansible, as long as you are storing your playbooks on another system, you are generally safe from failues of the Galaxy node, and you’ll be able to re-run your playbook at a later date.
However, playbooks often do not include:
- Which tools you’ve installed (have you ever installed a tool outside of ephemeris? This might be lost!)
- Conda environments, which will not always resolve identically over time. If strong guarantees of reproducibility are important, then consider backing these up as well.
Database Backups
We’re setting a couple of variables to control the automatic backups, they’ll be placed in the /data/backups
folder next to our user uploaded Galaxy data.
Hands-on: Configuring PostgreSQL Backups
Edit
group_vars/galaxyservers.yml
and add some variables to configure PostgreSQL:--- a/group_vars/dbservers.yml +++ b/group_vars/dbservers.yml @@ -5,3 +5,7 @@ postgresql_objects_users: postgresql_objects_databases: - name: "{{ galaxy_db_name }}" owner: "{{ galaxy_user_name }}" + +# PostgreSQL Backups +postgresql_backup_dir: /data/backups +postgresql_backup_local_dir: "{{ '~postgres' | expanduser }}/backups"
This will setup our backups to run as a cron job.
Data Backup
With Galaxy it is technically only necessary to backup your inputs, as the downstream files should, in theory be re-createable due to the reproducibility of Galaxy.
In practice, some groups either choose to not backup, or to backup everything, often to extremely cheap and slow storage like Glacier or a tape library.
Most groups choose to implement this as a custom cron job, e.g.
post_tasks:
- name: Setup backup cron job
ansible.builtin.cron:
name: "Backup User Data"
minute: "0"
hour: "5,2"
job: "rsync -avr /data/galaxy/ backup@backup.example.org:/backups/$(date -I)/"
People who, let’s say, care strongly about backups will often insist that you need to version files. This is of course unnecessary in the Galaxy case as files are essentially Write Once Read Many (WORM)s, which is a really good file storage practice. Files can get removed so it isn’t a true WORM strategy that you’d use for e.g. audit logs, but it is close. That said, since files never get changed, keeping multiple versions is unnecesary.
Please consider communicating very well with your users what the data backup policy is.
Comment: Got lost along the way?If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.
If you’re using
git
to track your progress, remember to add your changes and commit with a good commit message!
Restoration
Sometimes failures happen! We’re sorry you have to read this section.
Restoring the Database
This procedure is more complicated, you can read about the restoration procedure in the associated PR.
This step assumes you have pre-existing backups in place, you must check this first:
ls /data/backups/
If you have backups, you’re ready to restore:
# Stop Galaxy, you do NOT want galaxy to connect mid-restoration in case it
# tries to modify the database.
sudo systemctl stop galaxy
# Stop the database
sudo systemctl stop postgresql
# Ensure that it is stopped
sudo systemctl status postgresql
# Begin the backup procedure by becoming postgres:
sudo su - postgres
# Move the current, live database to a backup location just in case:
mkdir /tmp/test/
# ====
# NOTE THAT THIS NUMBER MAY BE DIFFERENT FOR YOU!
# You will need to change 12 to whatever version of postgres you're running
# in every subsequent command
# ====
mv /var/lib/postgresql/12/main/* /tmp/test/
# Add backup
rsync -av /data/backups/YOUR_LATEST_BACKUP/ /var/lib/postgresql/12/main
# Add the restore_command, to your backup file:
# restore_command = 'cp "/tmp/backup/current/wal/%f" "%p"'
$EDITOR ./12/main/postgresql.auto.conf
# Touch a recovery file
touch /var/lib/postgresql/12/main/recovery.signal
# As $username (with sudo right)
sudo systemctl restart postgresql
sudo systemctl status postgresql
# Restart Galaxy
sudo systemctl start galaxy
If you encounter issues, we suggest reading Lucille’s log of her experiences restoring as you might encounter similar issues.
Restoring Galaxy
Restoring Galaxy is easy via Ansible (maybe ensuring users cannot login by disabling the routes in nginx)
ansible-playbook galaxy.yml
And if you are following best practices, you probably have your tools stored in a YAML file to use with Ephemeris:
shed-tools install -g https://galaxy.example.org -a <api-key> -t our_tools.yml
Restoring User Data
This should simply be rsync
ing your data from the backup location back into /data/galaxy
.
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon