Alternative Celery Deployment for Galaxy

Author(s)	Helena Rasche
Reviewers

Overview
Questions:

What is required for Celery to work in Galaxy?

Objectives:

Setup the bare minimum configuration to get tasks working

Avoid deploying, securing, and managing RabbitMQ and Redis and Flower

Requirements:

slides Slides: Ansible

tutorial Hands-on: Ansible

slides Slides: Galaxy Installation with Ansible

tutorial Hands-on: Galaxy Installation with Ansible

Time estimation: 1 hour

Supporting Materials:

Published: Nov 7, 2024

Last modification: Nov 7, 2024

License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT

purl PURL: https://gxy.io/GTN:T00468

version Revision: 1

Celery is a new component to the Galaxy world (ca 2023) and is a distributed task queue that can be used to run tasks asynchronously. It isn’t mandatory, but you might find some features you expect to use to be missing without it.

If you are running a large production deployment you probably want to follow the Celery+Redis+Flower Tutorial.

However, if you are running a smaller Galaxy you may not want to manage deploying Celery (past what Gravity does for you automatically), you may not want to add Redis to your stack, and you may not have need of Flower!

Agenda

Configuring Galaxy to use Postgres

Configuring Celery to use Postgres

Configuring with Ansible

Configuring Galaxy to use Postgres

AMQP is a message queue protocol which processes can use to pass messages between each other. While a real message queue like RabbitMQ is perhaps the most robust choice, there is an easier option: Postgres

Add the following to your Galaxy configuration to use Postgres:

amqp_internal_connection: "sqlalchemy+postgresql:///galaxy?host=/var/run/postgresql"

Configuring Celery to use Postgres

Celery would prefer you use Redis (a Key-Value store) as a backend to store results. But we have a database! So let’s try using that instead:

enable_celery_tasks: true
celery_conf:
  broker_url: null  # This should default to using amqp_internal_connection
  result_backend: "db+postgresql:///galaxy?host=/var/run/postgresql"
  task_routes:
    galaxy.fetch_data: galaxy.external
    galaxy.set_job_metadata: galaxy.external

With that we should now be able to use useful features like:

Changing the datatype of a collection.
Exporting histories
other things!

Configuring with Ansible

If you’re using Ansible, this could also look like:

amqp_internal_connection: "sqlalchemy+"
enable_celery_tasks: true
celery_conf:
  broker_url: null  # This should default to using amqp_internal_connection
  result_backend: "db+"
  task_routes:
    galaxy.fetch_data: galaxy.external
    galaxy.set_job_metadata: galaxy.external

You've Finished the Tutorial

Key points

While a combination of RabbitMQ and Redis is perhaps the most production ready, you can use Postgres as a backend for Celery

This significantly simplifies operational complexity, and reduces the attack surface of your Galaxy.

Frequently Asked Questions

Have questions about this tutorial? Have a look at the available FAQ pages and support channels

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Citing this Tutorial

Helena Rasche, Alternative Celery Deployment for Galaxy (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/admin/tutorials/celeryless/tutorial.html Online; accessed TODAY
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

@misc{admin-celeryless,
author = "Helena Rasche",
	title = "Alternative Celery Deployment for Galaxy (Galaxy Training Materials)",
	year = "",
	month = "",
	day = "",
	url = "\url{https://training.galaxyproject.org/training-material/topics/admin/tutorials/celeryless/tutorial.html}",
	note = "[Online; accessed TODAY]"
}
@article{Hiltemann_2023,
	doi = {10.1371/journal.pcbi.1010752},
	url = {https://doi.org/10.1371%2Fjournal.pcbi.1010752},
	year = 2023,
	month = {jan},
	publisher = {Public Library of Science ({PLoS})},
	volume = {19},
	number = {1},
	pages = {e1010752},
	author = {Saskia Hiltemann and Helena Rasche and Simon Gladman and Hans-Rudolf Hotz and Delphine Larivi{\`{e}}re and Daniel Blankenberg and Pratik D. Jagtap and Thomas Wollmann and Anthony Bretaudeau and Nadia Gou{\'{e}} and Timothy J. Griffin and Coline Royaux and Yvan Le Bras and Subina Mehta and Anna Syme and Frederik Coppens and Bert Droesbeke and Nicola Soranzo and Wendi Bacon and Fotis Psomopoulos and Crist{\'{o}}bal Gallardo-Alba and John Davis and Melanie Christine Föll and Matthias Fahrner and Maria A. Doyle and Beatriz Serrano-Solano and Anne Claire Fouilloux and Peter van Heusden and Wolfgang Maier and Dave Clements and Florian Heyl and Björn Grüning and B{\'{e}}r{\'{e}}nice Batut and},
	editor = {Francis Ouellette},
	title = {Galaxy Training: A powerful framework for teaching!},
	journal = {PLoS Comput Biol}
}

                   

Congratulations on successfully completing this tutorial!

You can use Ephemeris's shed-tools install command to install the tools used in this tutorial.
shed-tools install [-g GALAXY] [-a API_KEY] -t <(curl https://training.galaxyproject.org/training-material/api/topics/admin/tutorials/celeryless/tutorial.json | jq .admin_install_yaml -r)
Alternatively you can copy and paste the following YAML
---
install_tool_dependencies: true
install_repository_dependencies: true
install_resolver_dependencies: true
tools: []

No feedback has been recieved yet for this training. Be the first one by filling in the feedback form.