Setting up Celery Workers for Galaxy
Author(s) | Mira Kuntz |
Editor(s) | Helena Rasche |
OverviewQuestions:Objectives:
Requirements:
Have an understanding of what Celery is and how it works
Install Redis
Configure and start Celery workers
Install Flower to the Galaxy venv and configure it
Use an Ansible playbook for all of the above.
Monitor a Celery task using the Flower dashboard
- slides Slides: Ansible
- tutorial Hands-on: Ansible
- slides Slides: Galaxy Installation with Ansible
- tutorial Hands-on: Galaxy Installation with Ansible
- slides Slides: Running Jobs on Remote Resources with Pulsar
- tutorial Hands-on: Running Jobs on Remote Resources with Pulsar
Time estimation: 1 hourSupporting Materials:Published: Apr 16, 2023Last modification: Nov 7, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00326version Revision: 11
Celery is a distributed task queue written in Python that can spawn multiple workers and enables asynchronous task processing on multiple nodes. It supports scheduling, but focuses more on real-time operations.
From the Celery website:
“Task queues are used as a mechanism to distribute work across threads or machines.
A task queue’s input is a unit of work called a task. Dedicated worker processes constantly monitor task queues for new work to perform.
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task the client adds a message to the queue, the broker then delivers that message to a worker.
A Celery system can consist of multiple workers and brokers, giving way to high availability and horizontal scaling.
Celery is written in Python, but the protocol can be implemented in any language. In addition to Python there’s node-celery and node-celery-ts for Node.js, and a PHP client.
Language interoperability can also be achieved exposing an HTTP endpoint and having a task that requests it (webhooks).”
A slideshow presentation on this subject is available.
If you are not interesting in managing Redis and Flower, you might be interested in the lower-configuration deployment option.
Agenda
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon
The agenda we’re going to follow today is: We’re going to enable and configure celery, install a Redis server, the Flower dashboard and start Celery workers.
Installing and Configuring
To proceed from here it is expected that:
Comment: Requirements for Running This Tutorial
You have set up a working Galaxy instance as described in the ansible-galaxy tutorial.
You have a working RabbitMQ server installed and added the connection string to the galaxy configuration. (RabbitMQ is installed when doing the Pulsar tutorial.)
Your VM has a public DNS name: this tutorial sets up SSL certificates from the start and as an integral part of the tutorial.
You have the following ports exposed:
- 22 for SSH, this can be a different port or via VPN or similar.
- 80 for HTTP, this needs to be available to the world if you want to follow the LetsEncrypt portion of the tutorial.
- 443 for HTTPs, this needs to be available to the world if you want to follow the LetsEncrypt portion of the tutorial.
- 5671 for AMQP for Pulsar, needed if you plan to setup Pulsar for remote job running.
In order to run a production ready Celery setup, we need to discuss and install some other software that works together with Celery.
We already learned about RabbitMQ in the Pulsar tutorial. The RabbitMQ server you already installed there will be our broker for Celery.
As a backend we are going to use Redis.
Redis is a very popular key-value-store database. It is very fast and easy to set up backend for Celery.
If you want to learn more about Redis, visit their website: https://redis.io/
For monitoring and debugging Celery, we use the Flower dashboard.
Flower is lightweight and has a clear but powerful UI and can be installed in Galaxy’s venv using our role.
Installing and Configuring
First we need to add our new Ansible Roles to the requirements.yml
:
Hands-on: Set up Redis, Flower, Systemd and Celery with Ansible
In your working directory, add the roles to your
requirements.yml
--- a/requirements.yml +++ b/requirements.yml @@ -38,3 +38,8 @@ version: 1.4.4 - src: galaxyproject.pulsar version: 1.0.10 +# Celery, Redis, and Flower (dashboard) +- name: geerlingguy.redis + version: 1.8.0 +- name: usegalaxy_eu.flower + version: 1.0.2
If you haven’t worked with diffs before, this can be something quite new or different.
If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.
Input: Old$ cat old
🍎
🍐
🍊
🍋
🍒
🥑Output: New$ cat new
🍎
🍐
🍊
🍋
🍍
🥑We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍
Diff lets us compare these files
$ diff old new
5c5
< 🍒
---
> 🍍Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.
There are a couple different formats to diffs, one is the ‘unified diff’
$ diff -U2 old new
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:06:36.340962616 +0100
@@ -3,4 +3,4 @@
🍊
🍋
-🍒
+🍍
🥑This is basically what you see in the training materials which gives you a lot of context about the changes:
--- old
is the ‘old’ file in our view+++ new
is the ‘new’ file- @@ these lines tell us where the change occurs and how many lines are added or removed.
- Lines starting with a - are removed from our ‘new’ file
- Lines with a + have been added.
So when you go to apply these diffs to your files in the training:
- Ignore the header
- Remove lines starting with - from your file
- Add lines starting with + to your file
The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍
Added & Removed Lines
Removals are very easy to spot, we just have removed lines
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:10:14.370722802 +0100
@@ -4,3 +4,2 @@
🍋
🍒
-🥑And additions likewise are very easy, just add a new line, between the other lines in your file.
--- old 2022-02-16 14:06:19.697132568 +0100
+++ new 2022-02-16 14:11:11.422135393 +0100
@@ -1,3 +1,4 @@
🍎
+🍍
🍐
🍊Completely new files
Completely new files look a bit different, there the “old” file is
/dev/null
, the empty file in a Linux machine.$ diff -U2 /dev/null old
--- /dev/null 2022-02-15 11:47:16.100000270 +0100
+++ old 2022-02-16 14:06:19.697132568 +0100
@@ -0,0 +1,6 @@
+🍎
+🍐
+🍊
+🍋
+🍒
+🥑And removed files are similar, except with the new file being /dev/null
--- old 2022-02-16 14:06:19.697132568 +0100
+++ /dev/null 2022-02-15 11:47:16.100000270 +0100
@@ -1,6 +0,0 @@
-🍎
-🍐
-🍊
-🍋
-🍒
-🥑Install the roles with:
Input: Bashansible-galaxy install -p roles -r requirements.yml
Let’s go now through all the Roles step-by-step:
Since we can stick to the basic default settings of Redis, we will look only at a few variables:
Variable Type Description redis_port
integer The port Redis should listen on. 6379 by default. redis_bind_interface
string The interface Redis should listen to. 127.0.0.1 is default. redis_conf_path
string The path where your redis configuration will be stored. Default: /etc/redis Luckily we can leave them all on default and don’t need to change anything for Redis in the vars.
We only need to add Redis’ Python package in the
group_vars/galaxyservers.yml
:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -282,3 +282,7 @@ rabbitmq_users: # TUS galaxy_tusd_port: 1080 galaxy_tus_upload_store: /data/tus + +#Redis +galaxy_additional_venv_packages: + - redis
Let’s add the role to our playbook then:
--- a/galaxy.yml +++ b/galaxy.yml @@ -42,6 +42,7 @@ - role: galaxyproject.miniconda become: true become_user: "{{ galaxy_user_name }}" + - geerlingguy.redis - galaxyproject.nginx - geerlingguy.docker - usegalaxy_eu.rabbitmqserver
Because Flower needs it’s own RabbitMQ user, we should add that to the respective part of our vars Edit your
group_vars/secret.yml
and define some random passwords:Input: Bashansible-vault edit group_vars/secret.yml
vault_rabbitmq_password_flower: "a-really-long-password-here" vault_rabbitmq_password_galaxy: "a-different-really-long-password" vault_flower_user_password: "another-different-really-long-password"
This is going in the vault as they are secrets we need to set. Flower needs it’s own RabbitMQ user with admin access and we want a different vhost for galaxy and celery.
Replace both with long random (or not) string.
Now add new users to the RabbitMQ configuration:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -269,6 +269,7 @@ rabbitmq_config: rabbitmq_vhosts: - /pulsar/pulsar_au + - galaxy_internal rabbitmq_users: - user: admin @@ -278,6 +279,13 @@ rabbitmq_users: - user: pulsar_au password: "{{ vault_rabbitmq_password_vhost }}" vhost: /pulsar/pulsar_au + - user: galaxy + password: "{{ vault_rabbitmq_password_galaxy }}" + vhost: galaxy_internal + - user: flower + password: "{{ vault_rabbitmq_password_flower }}" + tags: administrator + vhost: galaxy_internal # TUS galaxy_tusd_port: 1080
Flower Flower has a few variables, too, for example, we need to point it to our virtual environment:
Variable Type Description flower_python_version
string Python version to use when installing flower to a venv. Default: python39 flower_port
integer The port Flower should listen on. 5555 by default. flower_bind_interface
string The interface Flower should listen to. 0.0.0.0 is default. flower_conf_dir
string The path where your Flower configuration will be stored. Default: /etc/flower flower_venv_dir
string The path to the venv where Flower should be installed. Default: /home//.local
flower_user
string User that owns the flower process. Default: galaxy flower_group
string Group that owns the flower process. Default: galaxy flower_ui_users
list of dicts Name and password of the UI users for basic auth. flower_app_dir
string Root directory of your Python app to run with Celery. In our case galaxy_root
flower_app_name
string Python module to import. In our case ‘galaxy.celery’ flower_python_path
string Should point to galaxy’s server/lib
directory (default)flower_broker_api
string URL to broker’s API with login credentials. flower_broker_url
string Flower’s RabbitMQ connection string. flower_db_file
string When Flower is in persistent mode, use this path for the database. Let’s add variables to our
group_vars/galaxyservers.yml
:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -294,3 +294,22 @@ galaxy_tus_upload_store: /data/tus #Redis galaxy_additional_venv_packages: - redis + +# Flower +flower_python_version: python3 +flower_app_dir: "{{ galaxy_root }}" +flower_python_path: "{{ galaxy_root }}/server/lib" +flower_venv_dir: "{{ galaxy_venv_dir }}" +flower_app_name: galaxy.celery +flower_db_file: "{{ galaxy_root }}/var/flower.db" +flower_persistent: true +flower_broker_api: "https://flower:{{ vault_rabbitmq_password_flower }}@localhost:5671/api/" +flower_broker_url: "amqp://flower:{{ vault_rabbitmq_password_flower }}@localhost:5671/galaxy_internal?ssl=true" +flower_proxy_prefix: /flower + +flower_ui_users: + - name: admin + password: "{{ vault_flower_user_password}}" + +flower_environment_variables: + GALAXY_CONFIG_FILE: "{{ galaxy_config_file }}"
It has a dashboard, so we need to expose that via nginx:
--- a/templates/nginx/galaxy.j2 +++ b/templates/nginx/galaxy.j2 @@ -94,4 +94,13 @@ server { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } + + location /flower { + proxy_pass http://localhost:5555; + proxy_set_header Host $host; + proxy_redirect off; + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + } }
Now we can add the Flower Role to our Playbook:
--- a/galaxy.yml +++ b/galaxy.yml @@ -43,6 +43,7 @@ become: true become_user: "{{ galaxy_user_name }}" - geerlingguy.redis + - usegalaxy_eu.flower - galaxyproject.nginx - geerlingguy.docker - usegalaxy_eu.rabbitmqserver
Now it is time to change the
group_vars/galaxyservers.yml
and enable celery in galaxy.gravity config. Add the following lines to your file:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -130,6 +130,11 @@ galaxy_config: preload: true celery: concurrency: 2 + enable_beat: true + enable: true + queues: celery,galaxy.internal,galaxy.external + pool: threads + memory_limit: 2 loglevel: DEBUG tusd: enable: true
Now add the second part, Galaxy’s Celery configuration:
--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -111,6 +111,11 @@ galaxy_config: # Data Library Directories library_import_dir: /libraries/admin user_library_import_dir: /libraries/user + # Celery + amqp_internal_connection: "pyamqp://galaxy:{{ vault_rabbitmq_password_galaxy }}@localhost:5671/galaxy_internal?ssl=1" + celery_conf: + result_backend: "redis://localhost:6379/0" + enable_celery_tasks: true gravity: process_manager: systemd galaxy_root: "{{ galaxy_root }}/server"
We are done with the changes and you can enter the command to run your playbook:
Input: Bashansible-playbook galaxy.yml
This should also restart Galaxy and spawn the amount of Celery workers, that we defined in the Gravity configuration.
Test Celery
Now that everything is running, we want to test celery and watch it processing tasks. We can simply do that by starting an upload to our Galaxy.
Hands-on: Test Celery and monitor tasks with Flower
- First, open a new tab and enter your machines hostname followed by
/flower/dashboard
then log in withusername: admin
and you password. You should see an overview with active workers.
Keep that tab open- In split view, open a second browser window and open your Galaxy page. Click on galaxy-upload Upload Data, select a file from your computer and click
upload
.- The Workers should now receive a new tasks. Click on
Succeeded
and then on the UUID of the last upload task.
You should see all its details here and the info that it was successful.
Comment: Got lost along the way?If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.
If you’re using
git
to track your progress, remember to add your changes and commit with a good commit message!
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon