Setting up Celery Workers for Galaxy

Author(s) orcid logoMira Kuntz avatar Mira Kuntz
Editor(s) orcid logoHelena Rasche avatar Helena Rasche
Reviewers Nate Coraor avatar Helena Rasche avatar Marius van den Beek avatar Björn Grüning avatar Saskia Hiltemann avatar
Overview
Creative Commons License: CC-BY Questions:
Objectives:
  • Have an understanding of what Celery is and how it works

  • Install Redis

  • Configure and start Celery workers

  • Install Flower to the Galaxy venv and configure it

  • Use an Ansible playbook for all of the above.

  • Monitor a Celery task using the Flower dashboard

Requirements:
Time estimation: 1 hour
Supporting Materials:
Published: Apr 16, 2023
Last modification: Nov 7, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00326
version Revision: 11

Celery is a distributed task queue written in Python that can spawn multiple workers and enables asynchronous task processing on multiple nodes. It supports scheduling, but focuses more on real-time operations.

From the Celery website:

“Task queues are used as a mechanism to distribute work across threads or machines.

A task queue’s input is a unit of work called a task. Dedicated worker processes constantly monitor task queues for new work to perform.

Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task the client adds a message to the queue, the broker then delivers that message to a worker.

A Celery system can consist of multiple workers and brokers, giving way to high availability and horizontal scaling.

Celery is written in Python, but the protocol can be implemented in any language. In addition to Python there’s node-celery and node-celery-ts for Node.js, and a PHP client.

Language interoperability can also be achieved exposing an HTTP endpoint and having a task that requests it (webhooks).”

A slideshow presentation on this subject is available.

If you are not interesting in managing Redis and Flower, you might be interested in the lower-configuration deployment option.

Agenda
  1. Installing and Configuring
  2. Installing and Configuring
  3. Test Celery
Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon

The agenda we’re going to follow today is: We’re going to enable and configure celery, install a Redis server, the Flower dashboard and start Celery workers.

Installing and Configuring

To proceed from here it is expected that:

Comment: Requirements for Running This Tutorial
  1. You have set up a working Galaxy instance as described in the ansible-galaxy tutorial.

  2. You have a working RabbitMQ server installed and added the connection string to the galaxy configuration. (RabbitMQ is installed when doing the Pulsar tutorial.)

  3. Your VM has a public DNS name: this tutorial sets up SSL certificates from the start and as an integral part of the tutorial.

  4. You have the following ports exposed:

    • 22 for SSH, this can be a different port or via VPN or similar.
    • 80 for HTTP, this needs to be available to the world if you want to follow the LetsEncrypt portion of the tutorial.
    • 443 for HTTPs, this needs to be available to the world if you want to follow the LetsEncrypt portion of the tutorial.
    • 5671 for AMQP for Pulsar, needed if you plan to setup Pulsar for remote job running.

In order to run a production ready Celery setup, we need to discuss and install some other software that works together with Celery.
We already learned about RabbitMQ in the Pulsar tutorial. The RabbitMQ server you already installed there will be our broker for Celery. As a backend we are going to use Redis.
Redis is a very popular key-value-store database. It is very fast and easy to set up backend for Celery. If you want to learn more about Redis, visit their website: https://redis.io/

For monitoring and debugging Celery, we use the Flower dashboard.
Flower is lightweight and has a clear but powerful UI and can be installed in Galaxy’s venv using our role.

Installing and Configuring

First we need to add our new Ansible Roles to the requirements.yml:

Hands-on: Set up Redis, Flower, Systemd and Celery with Ansible
  1. In your working directory, add the roles to your requirements.yml

    --- a/requirements.yml
    +++ b/requirements.yml
    @@ -38,3 +38,8 @@
       version: 1.4.4
     - src: galaxyproject.pulsar
       version: 1.0.10
    +# Celery, Redis, and Flower (dashboard)
    +- name: geerlingguy.redis
    +  version: 1.8.0
    +- name: usegalaxy_eu.flower
    +  version: 1.0.2
       
    

    If you haven’t worked with diffs before, this can be something quite new or different.

    If we have two files, let’s say a grocery list, in two files. We’ll call them ‘a’ and ‘b’.

    Input: Old
    $ cat old
    🍎
    🍐
    🍊
    🍋
    🍒
    🥑
    Output: New
    $ cat new
    🍎
    🍐
    🍊
    🍋
    🍍
    🥑

    We can see that they have some different entries. We’ve removed 🍒 because they’re awful, and replaced them with an 🍍

    Diff lets us compare these files

    $ diff old new
    5c5
    < 🍒
    ---
    > 🍍

    Here we see that 🍒 is only in a, and 🍍 is only in b. But otherwise the files are identical.

    There are a couple different formats to diffs, one is the ‘unified diff’

    $ diff -U2 old new
    --- old 2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:06:36.340962616 +0100
    @@ -3,4 +3,4 @@
    🍊
    🍋
    -🍒
    +🍍
    🥑

    This is basically what you see in the training materials which gives you a lot of context about the changes:

    • --- old is the ‘old’ file in our view
    • +++ new is the ‘new’ file
    • @@ these lines tell us where the change occurs and how many lines are added or removed.
    • Lines starting with a - are removed from our ‘new’ file
    • Lines with a + have been added.

    So when you go to apply these diffs to your files in the training:

    1. Ignore the header
    2. Remove lines starting with - from your file
    3. Add lines starting with + to your file

    The other lines (🍊/🍋 and 🥑) above just provide “context”, they help you know where a change belongs in a file, but should not be edited when you’re making the above change. Given the above diff, you would find a line with a 🍒, and replace it with a 🍍

    Added & Removed Lines

    Removals are very easy to spot, we just have removed lines

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:10:14.370722802 +0100
    @@ -4,3 +4,2 @@
    🍋
    🍒
    -🥑

    And additions likewise are very easy, just add a new line, between the other lines in your file.

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ new 2022-02-16 14:11:11.422135393 +0100
    @@ -1,3 +1,4 @@
    🍎
    +🍍
    🍐
    🍊

    Completely new files

    Completely new files look a bit different, there the “old” file is /dev/null, the empty file in a Linux machine.

    $ diff -U2 /dev/null old
    --- /dev/null 2022-02-15 11:47:16.100000270 +0100
    +++ old 2022-02-16 14:06:19.697132568 +0100
    @@ -0,0 +1,6 @@
    +🍎
    +🍐
    +🍊
    +🍋
    +🍒
    +🥑

    And removed files are similar, except with the new file being /dev/null

    --- old	2022-02-16 14:06:19.697132568 +0100
    +++ /dev/null 2022-02-15 11:47:16.100000270 +0100
    @@ -1,6 +0,0 @@
    -🍎
    -🍐
    -🍊
    -🍋
    -🍒
    -🥑

  2. Install the roles with:

    Input: Bash
    ansible-galaxy install -p roles -r requirements.yml
    
  3. Let’s go now through all the Roles step-by-step:

    1. Since we can stick to the basic default settings of Redis, we will look only at a few variables:

      Variable Type Description
      redis_port integer The port Redis should listen on. 6379 by default.
      redis_bind_interface string The interface Redis should listen to. 127.0.0.1 is default.
      redis_conf_path string The path where your redis configuration will be stored. Default: /etc/redis

      Luckily we can leave them all on default and don’t need to change anything for Redis in the vars.

    2. We only need to add Redis’ Python package in the group_vars/galaxyservers.yml:

      --- a/group_vars/galaxyservers.yml
      +++ b/group_vars/galaxyservers.yml
      @@ -282,3 +282,7 @@ rabbitmq_users:
       # TUS
       galaxy_tusd_port: 1080
       galaxy_tus_upload_store: /data/tus
      +
      +#Redis
      +galaxy_additional_venv_packages:
      +  - redis
             
      
    3. Let’s add the role to our playbook then:

      --- a/galaxy.yml
      +++ b/galaxy.yml
      @@ -42,6 +42,7 @@
           - role: galaxyproject.miniconda
             become: true
             become_user: "{{ galaxy_user_name }}"
      +    - geerlingguy.redis
           - galaxyproject.nginx
           - geerlingguy.docker
           - usegalaxy_eu.rabbitmqserver
             
      
    4. Because Flower needs it’s own RabbitMQ user, we should add that to the respective part of our vars Edit your group_vars/secret.yml and define some random passwords:

      Input: Bash
      ansible-vault edit group_vars/secret.yml
      
      vault_rabbitmq_password_flower: "a-really-long-password-here"
      vault_rabbitmq_password_galaxy: "a-different-really-long-password"
      vault_flower_user_password: "another-different-really-long-password"
      

      This is going in the vault as they are secrets we need to set. Flower needs it’s own RabbitMQ user with admin access and we want a different vhost for galaxy and celery.

      Replace both with long random (or not) string.
      Now add new users to the RabbitMQ configuration:

      --- a/group_vars/galaxyservers.yml
      +++ b/group_vars/galaxyservers.yml
      @@ -269,6 +269,7 @@ rabbitmq_config:
              
       rabbitmq_vhosts:
         - /pulsar/pulsar_au
      +  - galaxy_internal
              
       rabbitmq_users:
         - user: admin
      @@ -278,6 +279,13 @@ rabbitmq_users:
         - user: pulsar_au
           password: "{{ vault_rabbitmq_password_vhost }}"
           vhost: /pulsar/pulsar_au
      +  - user: galaxy
      +    password: "{{ vault_rabbitmq_password_galaxy }}"
      +    vhost: galaxy_internal
      +  - user: flower
      +    password: "{{ vault_rabbitmq_password_flower }}"
      +    tags: administrator
      +    vhost: galaxy_internal
              
       # TUS
       galaxy_tusd_port: 1080
             
      
    5. Flower Flower has a few variables, too, for example, we need to point it to our virtual environment:

      Variable Type Description
      flower_python_version string Python version to use when installing flower to a venv. Default: python39
      flower_port integer The port Flower should listen on. 5555 by default.
      flower_bind_interface string The interface Flower should listen to. 0.0.0.0 is default.
      flower_conf_dir string The path where your Flower configuration will be stored. Default: /etc/flower
      flower_venv_dir string The path to the venv where Flower should be installed. Default: /home//.local
      flower_user string User that owns the flower process. Default: galaxy
      flower_group string Group that owns the flower process. Default: galaxy
      flower_ui_users list of dicts Name and password of the UI users for basic auth.
      flower_app_dir string Root directory of your Python app to run with Celery. In our case galaxy_root
      flower_app_name string Python module to import. In our case ‘galaxy.celery’
      flower_python_path string Should point to galaxy’s server/lib directory (default)
      flower_broker_api string URL to broker’s API with login credentials.
      flower_broker_url string Flower’s RabbitMQ connection string.
      flower_db_file string When Flower is in persistent mode, use this path for the database.

      Let’s add variables to our group_vars/galaxyservers.yml:

      --- a/group_vars/galaxyservers.yml
      +++ b/group_vars/galaxyservers.yml
      @@ -294,3 +294,22 @@ galaxy_tus_upload_store: /data/tus
       #Redis
       galaxy_additional_venv_packages:
         - redis
      +
      +# Flower
      +flower_python_version: python3
      +flower_app_dir: "{{ galaxy_root }}"
      +flower_python_path: "{{ galaxy_root }}/server/lib"
      +flower_venv_dir: "{{ galaxy_venv_dir }}"
      +flower_app_name: galaxy.celery
      +flower_db_file: "{{ galaxy_root }}/var/flower.db"
      +flower_persistent: true
      +flower_broker_api: "https://flower:{{ vault_rabbitmq_password_flower }}@localhost:5671/api/"
      +flower_broker_url: "amqp://flower:{{ vault_rabbitmq_password_flower }}@localhost:5671/galaxy_internal?ssl=true"
      +flower_proxy_prefix: /flower
      +
      +flower_ui_users:
      +  - name: admin
      +    password: "{{ vault_flower_user_password}}"
      +
      +flower_environment_variables:
      +  GALAXY_CONFIG_FILE: "{{ galaxy_config_file }}"
             
      
    6. It has a dashboard, so we need to expose that via nginx:

      --- a/templates/nginx/galaxy.j2
      +++ b/templates/nginx/galaxy.j2
      @@ -94,4 +94,13 @@ server {
       		proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
       		proxy_set_header X-Forwarded-Proto $scheme;
       	}
      +
      +	location /flower {
      +		proxy_pass http://localhost:5555;
      +		proxy_set_header Host $host;
      +		proxy_redirect off;
      +		proxy_http_version 1.1;
      +		proxy_set_header Upgrade $http_upgrade;
      +		proxy_set_header Connection "upgrade";
      +	}
       }
             
      
    7. Now we can add the Flower Role to our Playbook:

      --- a/galaxy.yml
      +++ b/galaxy.yml
      @@ -43,6 +43,7 @@
             become: true
             become_user: "{{ galaxy_user_name }}"
           - geerlingguy.redis
      +    - usegalaxy_eu.flower
           - galaxyproject.nginx
           - geerlingguy.docker
           - usegalaxy_eu.rabbitmqserver
             
      
  4. Now it is time to change the group_vars/galaxyservers.yml and enable celery in galaxy.gravity config. Add the following lines to your file:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -130,6 +130,11 @@ galaxy_config:
           preload: true
         celery:
           concurrency: 2
    +      enable_beat: true
    +      enable: true
    +      queues: celery,galaxy.internal,galaxy.external
    +      pool: threads
    +      memory_limit: 2
           loglevel: DEBUG
         tusd:
           enable: true
       
    

    Now add the second part, Galaxy’s Celery configuration:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -111,6 +111,11 @@ galaxy_config:
         # Data Library Directories
         library_import_dir: /libraries/admin
         user_library_import_dir: /libraries/user
    +    # Celery
    +    amqp_internal_connection: "pyamqp://galaxy:{{ vault_rabbitmq_password_galaxy }}@localhost:5671/galaxy_internal?ssl=1"
    +    celery_conf:
    +      result_backend: "redis://localhost:6379/0"
    +    enable_celery_tasks: true
       gravity:
         process_manager: systemd
         galaxy_root: "{{ galaxy_root }}/server"
       
    
  5. We are done with the changes and you can enter the command to run your playbook:

    Input: Bash
    ansible-playbook galaxy.yml
    

    This should also restart Galaxy and spawn the amount of Celery workers, that we defined in the Gravity configuration.

Test Celery

Now that everything is running, we want to test celery and watch it processing tasks. We can simply do that by starting an upload to our Galaxy.

Hands-on: Test Celery and monitor tasks with Flower
  1. First, open a new tab and enter your machines hostname followed by /flower/dashboard then log in with username: admin and you password. You should see an overview with active workers.
    Keep that tab open
  2. In split view, open a second browser window and open your Galaxy page. Click on galaxy-upload Upload Data, select a file from your computer and click upload.
  3. The Workers should now receive a new tasks. Click on Succeeded and then on the UUID of the last upload task.
    You should see all its details here and the info that it was successful.
Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon