Setting up Celery Workers for Galaxy

Author(s)	Mira Kuntz
Editor(s)	Helena Rasche
Reviewers

Overview
Questions:

Objectives:

Have an understanding of what Celery is and how it works

Install Redis

Configure and start Celery workers

Install Flower to the Galaxy venv and configure it

Use an Ansible playbook for all of the above.

Monitor a Celery task using the Flower dashboard

Requirements:

slides Slides: Ansible

tutorial Hands-on: Ansible

slides Slides: Galaxy Installation with Ansible

tutorial Hands-on: Galaxy Installation with Ansible

slides Slides: Running Jobs on Remote Resources with Pulsar

tutorial Hands-on: Running Jobs on Remote Resources with Pulsar

Time estimation: 1 hour

Supporting Materials:

Slides

Published: Apr 16, 2023

Last modification: Apr 8, 2025

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00326

version Revision: 12

Celery is a distributed task queue written in Python that can spawn multiple workers and enables asynchronous task processing on multiple nodes. It supports scheduling, but focuses more on real-time operations.

From the Celery website:

“Task queues are used as a mechanism to distribute work across threads or machines.

A task queue’s input is a unit of work called a task. Dedicated worker processes constantly monitor task queues for new work to perform.

Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task the client adds a message to the queue, the broker then delivers that message to a worker.

A Celery system can consist of multiple workers and brokers, giving way to high availability and horizontal scaling.

Celery is written in Python, but the protocol can be implemented in any language. In addition to Python there’s node-celery and node-celery-ts for Node.js, and a PHP client.

Language interoperability can also be achieved exposing an HTTP endpoint and having a task that requests it (webhooks).”

A slideshow presentation on this subject is available.

If you are not interesting in managing Redis and Flower, you might be interested in the lower-configuration deployment option.

Agenda

Installing and Configuring

Installing and Configuring

Test Celery

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

Step 1

ansible-galaxy

Step 2

backup-cleanup

Step 3

customization

Step 4

tus

Step 5

cvmfs

Step 6

apptainer

Step 7

tool-management

Step 8

reference-genomes

Step 9

data-library

Step 10

dev/bioblend-api

Step 11

connect-to-compute-cluster

Step 12

job-destinations

Step 13

pulsar

Step 14

celery

Step 15

gxadmin

Step 16

reports

Step 17

monitoring

Step 18

tiaas

Step 19

sentry

Step 20

ftp

Step 21

beacon

The agenda we’re going to follow today is: We’re going to enable and configure celery, install a Redis server, the Flower dashboard and start Celery workers.

Installing and Configuring

To proceed from here it is expected that:

Comment: Requirements for Running This Tutorial

You have set up a working Galaxy instance as described in the ansible-galaxy tutorial.

You have a working RabbitMQ server installed and added the connection string to the galaxy configuration. (RabbitMQ is installed when doing the Pulsar tutorial.)

Your VM has a public DNS name: this tutorial sets up SSL certificates from the start and as an integral part of the tutorial.

You have the following ports exposed:

22 for SSH, this can be a different port or via VPN or similar.

80 for HTTP, this needs to be available to the world if you want to follow the LetsEncrypt portion of the tutorial.

443 for HTTPs, this needs to be available to the world if you want to follow the LetsEncrypt portion of the tutorial.

5671 for AMQP for Pulsar, needed if you plan to setup Pulsar for remote job running.

In order to run a production ready Celery setup, we need to discuss and install some other software that works together with Celery.
We already learned about RabbitMQ in the Pulsar tutorial. The RabbitMQ server you already installed there will be our broker for Celery. As a backend we are going to use Redis.
Redis is a very popular key-value-store database. It is very fast and easy to set up backend for Celery. If you want to learn more about Redis, visit their website: https://redis.io/

For monitoring and debugging Celery, we use the Flower dashboard.
Flower is lightweight and has a clear but powerful UI and can be installed in Galaxy’s venv using our role.

Installing and Configuring

First we need to add our new Ansible Roles to the requirements.yml:

Hands On: Set up Redis, Flower, Systemd and Celery with Ansible
In your working directory, add the roles to your requirements.yml
--- a/requirements.yml
+++ b/requirements.yml
@@ -38,3 +38,8 @@
   version: 1.4.4
 - src: galaxyproject.pulsar
   version: 1.0.10
+# Celery, Redis, and Flower (dashboard)
+- name: geerlingguy.redis
+  version: 1.8.0
+- name: usegalaxy_eu.flower
+  version: 1.0.2
   
If you haven’t worked with diffs before, this can be something quite new or different.

If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.
Code In: Old
$ cat old
🍎
🍐
🍊
🍋
🍒
🥑
Code Out: New
$ cat new
🍎
🍐
🍊
🍋
🍍
🥑
We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

Diff lets us compare these files
$ diff old new
5c5
< 🍒
---
> 🍍
Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

There are a couple different formats to diffs, one is the ‘unified diff’
$ diff -U2 old new
--- old	2022-02-16 14:06:19.697132568 +0100
+++ new	2022-02-16 14:06:36.340962616 +0100
@@ -3,4 +3,4 @@
 🍊
 🍋
-🍒
+🍍
 🥑
This is basically what you see in the training materials which gives you a lot of context about the changes:

--- old is the ‘old’ file in our view

+++ new is the ‘new’ file

@@ these lines tell us where the change occurs and how many lines are added or removed.

Lines starting with a - are removed from our ‘new’ file

Lines with a + have been added.

So when you go to apply these diffs to your files in the training:

Ignore the header

Remove lines starting with - from your file

Add lines starting with + to your file

The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

Added & Removed Lines

Removals are very easy to spot, we just have removed lines
--- old	2022-02-16 14:06:19.697132568 +0100
+++ new	2022-02-16 14:10:14.370722802 +0100
@@ -4,3 +4,2 @@
 🍋
 🍒
-🥑
And additions likewise are very easy, just add a new line, between the other lines in your file.
--- old	2022-02-16 14:06:19.697132568 +0100
+++ new	2022-02-16 14:11:11.422135393 +0100
@@ -1,3 +1,4 @@
 🍎
+🍍
 🍐
 🍊
Completely new files

Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.
$ diff -U2 /dev/null old
--- /dev/null	2022-02-15 11:47:16.100000270 +0100
+++ old	2022-02-16 14:06:19.697132568 +0100
@@ -0,0 +1,6 @@
+🍎
+🍐
+🍊
+🍋
+🍒
+🥑
And removed files are similar, except with the new file being /dev/null
--- old	2022-02-16 14:06:19.697132568 +0100
+++ /dev/null	2022-02-15 11:47:16.100000270 +0100
@@ -1,6 +0,0 @@
-🍎
-🍐
-🍊
-🍋
-🍒
-🥑
Install the roles with:
Code In: Bash
ansible-galaxy install -p roles -r requirements.yml
Let’s go now through all the Roles step-by-step:
Since we can stick to the basic default settings of Redis, we will look only at a few variables:

Variable Type Description

redis_port integer The port Redis should listen on. 6379 by default.

redis_bind_interface string The interface Redis should listen to. 127.0.0.1 is default.

redis_conf_path string The path where your redis configuration will be stored. Default: /etc/redis

Luckily we can leave them all on default and don’t need to change anything for Redis in the vars.
We only need to add Redis’ Python package in the group_vars/galaxyservers.yml:
--- a/group_vars/galaxyservers.yml
+++ b/group_vars/galaxyservers.yml
@@ -282,3 +282,7 @@ rabbitmq_users:
 # TUS
 galaxy_tusd_port: 1080
 galaxy_tus_upload_store: /data/tus
+
+#Redis
+galaxy_additional_venv_packages:
+  - redis
       
Let’s add the role to our playbook then:
--- a/galaxy.yml
+++ b/galaxy.yml
@@ -42,6 +42,7 @@
     - role: galaxyproject.miniconda
       become: true
       become_user: "{{ galaxy_user_name }}"
+    - geerlingguy.redis
     - galaxyproject.nginx
     - geerlingguy.docker
     - usegalaxy_eu.rabbitmqserver
       
Because Flower needs it’s own RabbitMQ user, we should add that to the respective part of our vars Edit your group_vars/secret.yml and define some random passwords:
Code In: Bash
ansible-vault edit group_vars/secret.yml
vault_rabbitmq_password_flower: "a-really-long-password-here"
vault_rabbitmq_password_galaxy: "a-different-really-long-password"
vault_flower_user_password: "another-different-really-long-password"
This is going in the vault as they are secrets we need to set. Flower needs it’s own RabbitMQ user with admin access and we want a different vhost for galaxy and celery.

Replace both with long random (or not) string.
Now add new users to the RabbitMQ configuration:
--- a/group_vars/galaxyservers.yml
+++ b/group_vars/galaxyservers.yml
@@ -269,6 +269,7 @@ rabbitmq_config:
        
 rabbitmq_vhosts:
   - /pulsar/pulsar_au
+  - galaxy_internal
        
 rabbitmq_users:
   - user: admin
@@ -278,6 +279,13 @@ rabbitmq_users:
   - user: pulsar_au
     password: "{{ vault_rabbitmq_password_vhost }}"
     vhost: /pulsar/pulsar_au
+  - user: galaxy
+    password: "{{ vault_rabbitmq_password_galaxy }}"
+    vhost: galaxy_internal
+  - user: flower
+    password: "{{ vault_rabbitmq_password_flower }}"
+    tags: administrator
+    vhost: galaxy_internal
        
 # TUS
 galaxy_tusd_port: 1080
       
Flower Flower has a few variables, too, for example, we need to point it to our virtual environment:

Variable Type Description

flower_python_version string Python version to use when installing flower to a venv. Default: python39

flower_port integer The port Flower should listen on. 5555 by default.

flower_bind_interface string The interface Flower should listen to. 0.0.0.0 is default.

flower_conf_dir string The path where your Flower configuration will be stored. Default: /etc/flower

flower_venv_dir string The path to the venv where Flower should be installed. Default: /home//.local

flower_user string User that owns the flower process. Default: galaxy

flower_group string Group that owns the flower process. Default: galaxy

flower_ui_users list of dicts Name and password of the UI users for basic auth.

flower_app_dir string Root directory of your Python app to run with Celery. In our case galaxy_root

flower_app_name string Python module to import. In our case ‘galaxy.celery’

flower_python_path string Should point to galaxy’s server/lib directory (default)

flower_broker_api string URL to broker’s API with login credentials.

flower_broker_url string Flower’s RabbitMQ connection string.

flower_db_file string When Flower is in persistent mode, use this path for the database.

Let’s add variables to our group_vars/galaxyservers.yml:
--- a/group_vars/galaxyservers.yml
+++ b/group_vars/galaxyservers.yml
@@ -294,3 +294,22 @@ galaxy_tus_upload_store: /data/tus
 #Redis
 galaxy_additional_venv_packages:
   - redis
+
+# Flower
+flower_python_version: python3
+flower_app_dir: "{{ galaxy_root }}"
+flower_python_path: "{{ galaxy_root }}/server/lib"
+flower_venv_dir: "{{ galaxy_venv_dir }}"
+flower_app_name: galaxy.celery
+flower_db_file: "{{ galaxy_root }}/var/flower.db"
+flower_persistent: true
+flower_broker_api: "https://flower:{{ vault_rabbitmq_password_flower }}@localhost:5671/api/"
+flower_broker_url: "amqp://flower:{{ vault_rabbitmq_password_flower }}@localhost:5671/galaxy_internal?ssl=true"
+flower_proxy_prefix: /flower
+
+flower_ui_users:
+  - name: admin
+    password: "{{ vault_flower_user_password}}"
+
+flower_environment_variables:
+  GALAXY_CONFIG_FILE: "{{ galaxy_config_file }}"
       
It has a dashboard, so we need to expose that via nginx:
--- a/templates/nginx/galaxy.j2
+++ b/templates/nginx/galaxy.j2
@@ -94,4 +94,13 @@ server {
 		proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
 		proxy_set_header X-Forwarded-Proto $scheme;
 	}
+
+	location /flower {
+		proxy_pass http://localhost:5555;
+		proxy_set_header Host $host;
+		proxy_redirect off;
+		proxy_http_version 1.1;
+		proxy_set_header Upgrade $http_upgrade;
+		proxy_set_header Connection "upgrade";
+	}
 }
       
Now we can add the Flower Role to our Playbook:
--- a/galaxy.yml
+++ b/galaxy.yml
@@ -43,6 +43,7 @@
       become: true
       become_user: "{{ galaxy_user_name }}"
     - geerlingguy.redis
+    - usegalaxy_eu.flower
     - galaxyproject.nginx
     - geerlingguy.docker
     - usegalaxy_eu.rabbitmqserver
       
Now it is time to change the group_vars/galaxyservers.yml and enable celery in galaxy.gravity config. Add the following lines to your file:
--- a/group_vars/galaxyservers.yml
+++ b/group_vars/galaxyservers.yml
@@ -130,6 +130,11 @@ galaxy_config:
       preload: true
     celery:
       concurrency: 2
+      enable_beat: true
+      enable: true
+      queues: celery,galaxy.internal,galaxy.external
+      pool: threads
+      memory_limit: 2
       loglevel: DEBUG
     tusd:
       enable: true
   
Now add the second part, Galaxy’s Celery configuration:
--- a/group_vars/galaxyservers.yml
+++ b/group_vars/galaxyservers.yml
@@ -111,6 +111,11 @@ galaxy_config:
     # Data Library Directories
     library_import_dir: /libraries/admin
     user_library_import_dir: /libraries/user
+    # Celery
+    amqp_internal_connection: "pyamqp://galaxy:{{ vault_rabbitmq_password_galaxy }}@localhost:5671/galaxy_internal?ssl=1"
+    celery_conf:
+      result_backend: "redis://localhost:6379/0"
+    enable_celery_tasks: true
   gravity:
     process_manager: systemd
     galaxy_root: "{{ galaxy_root }}/server"
   
We are done with the changes and you can enter the command to run your playbook:
Code In: Bash
ansible-playbook galaxy.yml
This should also restart Galaxy and spawn the amount of Celery workers, that we defined in the Gravity configuration.

Variable	Type	Description
`redis_port`	integer	The port Redis should listen on. 6379 by default.
`redis_bind_interface`	string	The interface Redis should listen to. 127.0.0.1 is default.
`redis_conf_path`	string	The path where your redis configuration will be stored. Default: /etc/redis

Variable	Type	Description
`flower_python_version`	string	Python version to use when installing flower to a venv. Default: python39
`flower_port`	integer	The port Flower should listen on. 5555 by default.
`flower_bind_interface`	string	The interface Flower should listen to. 0.0.0.0 is default.
`flower_conf_dir`	string	The path where your Flower configuration will be stored. Default: /etc/flower
`flower_venv_dir`	string	The path to the venv where Flower should be installed. Default: `/home//.local`
`flower_user`	string	User that owns the flower process. Default: galaxy
`flower_group`	string	Group that owns the flower process. Default: galaxy
`flower_ui_users`	list of dicts	Name and password of the UI users for basic auth.
`flower_app_dir`	string	Root directory of your Python app to run with Celery. In our case `galaxy_root`
`flower_app_name`	string	Python module to import. In our case ‘galaxy.celery’
`flower_python_path`	string	Should point to galaxy’s `server/lib` directory (default)
`flower_broker_api`	string	URL to broker’s API with login credentials.
`flower_broker_url`	string	Flower’s RabbitMQ connection string.
`flower_db_file`	string	When Flower is in persistent mode, use this path for the database.

Test Celery

Now that everything is running, we want to test celery and watch it processing tasks. We can simply do that by starting an upload to our Galaxy.

Hands On: Test Celery and monitor tasks with Flower

First, open a new tab and enter your machines hostname followed by /flower/dashboard then log in with username: admin and you password. You should see an overview with active workers.
Keep that tab open

In split view, open a second browser window and open your Galaxy page. Click on galaxy-upload Upload Data, select a file from your computer and click upload.

The Workers should now receive a new tasks. Click on Succeeded and then on the UUID of the last upload task.
You should see all its details here and the info that it was successful.

Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

Step 1

ansible-galaxy

Step 2

backup-cleanup

Step 3

customization

Step 4

tus

Step 5

cvmfs

Step 6

apptainer

Step 7

tool-management

Step 8

reference-genomes

Step 9

data-library

Step 10

dev/bioblend-api

Step 11

connect-to-compute-cluster

Step 12

job-destinations

Step 13

pulsar

Step 14

celery

Step 15

gxadmin

Step 16

reports

Step 17

monitoring

Step 18

tiaas

Step 19

sentry

Step 20

ftp

Step 21

beacon

You've Finished the Tutorial

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Mira Kuntz, Setting up Celery Workers for Galaxy (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/celery/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{admin-celery,
author = "Mira Kuntz",
	title = "Setting up Celery Workers for Galaxy (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/celery/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Funding

These individuals or organisations provided funding support for the development of this resource

EuroScienceGateway

EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963.

Congratulations on successfully completing this tutorial!

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.
shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/admin/tutorials/celery/tutorial.json | jq .admin_install_yaml -r)
Alternatively you can copy and paste the following YAML
---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools: []

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.