Alternative Celery Deployment for Galaxy
Author(s) | Helena Rasche |
OverviewQuestions:Objectives:
What is required for Celery to work in Galaxy?
Requirements:
Setup the bare minimum configuration to get tasks working
Avoid deploying, securing, and managing RabbitMQ and Redis and Flower
- slides Slides: Ansible
- tutorial Hands-on: Ansible
- slides Slides: Galaxy Installation with Ansible
- tutorial Hands-on: Galaxy Installation with Ansible
Time estimation: 1 hourSupporting Materials:
Published: Nov 7, 2024Last modification: Nov 7, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00468version Revision: 1
Celery is a new component to the Galaxy world (ca 2023) and is a distributed task queue that can be used to run tasks asynchronously. It isn’t mandatory, but you might find some features you expect to use to be missing without it.
If you are running a large production deployment you probably want to follow the Celery+Redis+Flower Tutorial.
However, if you are running a smaller Galaxy you may not want to manage deploying Celery (past what Gravity does for you automatically), you may not want to add Redis to your stack, and you may not have need of Flower!
Agenda
Configuring Galaxy to use Postgres
AMQP is a message queue protocol which processes can use to pass messages between each other. While a real message queue like RabbitMQ is perhaps the most robust choice, there is an easier option: Postgres
Add the following to your Galaxy configuration to use Postgres:
amqp_internal_connection: "sqlalchemy+postgresql:///galaxy?host=/var/run/postgresql"
Configuring Celery to use Postgres
Celery would prefer you use Redis (a Key-Value store) as a backend to store results. But we have a database! So let’s try using that instead:
enable_celery_tasks: true
celery_conf:
broker_url: null # This should default to using amqp_internal_connection
result_backend: "db+postgresql:///galaxy?host=/var/run/postgresql"
task_routes:
galaxy.fetch_data: galaxy.external
galaxy.set_job_metadata: galaxy.external
With that we should now be able to use useful features like:
- Changing the datatype of a collection.
- Exporting histories
- other things!
Configuring with Ansible
If you’re using Ansible, this could also look like:
amqp_internal_connection: "sqlalchemy+"
enable_celery_tasks: true
celery_conf:
broker_url: null # This should default to using amqp_internal_connection
result_backend: "db+"
task_routes:
galaxy.fetch_data: galaxy.external
galaxy.set_job_metadata: galaxy.external