Alternative Celery Deployment for Galaxy

Author(s) orcid logoHelena Rasche avatar Helena Rasche
Overview
Creative Commons License: CC-BY Questions:
  • What is required for Celery to work in Galaxy?

Objectives:
  • Setup the bare minimum configuration to get tasks working

  • Avoid deploying, securing, and managing RabbitMQ and Redis and Flower

Requirements:
Time estimation: 1 hour
Supporting Materials:
Published: Nov 7, 2024
Last modification: Nov 7, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00468
version Revision: 1

Celery is a new component to the Galaxy world (ca 2023) and is a distributed task queue that can be used to run tasks asynchronously. It isn’t mandatory, but you might find some features you expect to use to be missing without it.

If you are running a large production deployment you probably want to follow the Celery+Redis+Flower Tutorial.

However, if you are running a smaller Galaxy you may not want to manage deploying Celery (past what Gravity does for you automatically), you may not want to add Redis to your stack, and you may not have need of Flower!

Agenda
  1. Configuring Galaxy to use Postgres
  2. Configuring Celery to use Postgres
  3. Configuring with Ansible

Configuring Galaxy to use Postgres

AMQP is a message queue protocol which processes can use to pass messages between each other. While a real message queue like RabbitMQ is perhaps the most robust choice, there is an easier option: Postgres

Add the following to your Galaxy configuration to use Postgres:

amqp_internal_connection: "sqlalchemy+postgresql:///galaxy?host=/var/run/postgresql"

Configuring Celery to use Postgres

Celery would prefer you use Redis (a Key-Value store) as a backend to store results. But we have a database! So let’s try using that instead:

enable_celery_tasks: true
celery_conf:
  broker_url: null  # This should default to using amqp_internal_connection
  result_backend: "db+postgresql:///galaxy?host=/var/run/postgresql"
  task_routes:
    galaxy.fetch_data: galaxy.external
    galaxy.set_job_metadata: galaxy.external

With that we should now be able to use useful features like:

  • Changing the datatype of a collection.
  • Exporting histories
  • other things!

Configuring with Ansible

If you’re using Ansible, this could also look like:

amqp_internal_connection: "sqlalchemy+"
enable_celery_tasks: true
celery_conf:
  broker_url: null  # This should default to using amqp_internal_connection
  result_backend: "db+"
  task_routes:
    galaxy.fetch_data: galaxy.external
    galaxy.set_job_metadata: galaxy.external