Galaxy and Celery

name: inverse
layout: true
class: center, middle, inverse

</span></div>

</span></div>

---

# Galaxy and Celery

<div class="contributors-line">
		
	
<ul class="text-list">
			
			<li>
				<a href="/training-material/hall-of-fame/mira-miracoli/" class="contributor-badge contributor-mira-miracoli"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/mira-miracoli?s=36" alt="Mira Kuntz avatar" width="36" class="avatar" />
    Mira Kuntz</a></li>
</ul>

</div>

<div class="footnote" style="bottom: 8em;">
  <i class="far fa-calendar" aria-hidden="true"></i><span class="visually-hidden">last_modification</span> Updated:   
  <i class="fas fa-fingerprint" aria-hidden="true"></i><span class="visually-hidden">purl</span><abbr title="Persistent URL">PURL</abbr>: <a href="https://gxy.io/GTN:S00003">gxy.io/GTN:S00003</a>
</div>

<div class="footnote" style="bottom: 5em;">

<i class="fas fa-file-alt" aria-hidden="true"></i><span class="visually-hidden">text-document</span><a href="slides-plain.html"> Plain-text slides</a> |

</div>

<div class="footnote" style="bottom: 2em;">
    <strong>Tip: </strong>press <kbd>P</kbd> to view the presenter notes
    | <i class="fa fa-arrows" aria-hidden="true"></i><span class="visually-hidden">arrow-keys</span> Use arrow keys to move between slides

</div>

???
Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press `P` again to switch presenter notes off

Press `C` to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.

Useful when presenting.

---

## Can you eat it?

.pull-left[
Celery is an asynchronous distributed task queue.

It consists of:

- Your Application that sends tasks
- To a broker with queues
- Celery workers that execute the tasks
- A result backend to store the task results

It's written in Python, multiple other languages are supported.
]

.pull-right[
	![celery logo, a cartoon of a piece of celery](images/celery.png)
]

---

## So many features

.pull-left[
- Different worker/process [pool options](https://distributedpython.com/posts/celery-execution-pools-what-is-it-all-about/), depending on your needs – I/O or CPU bound
- CeleryBeat a scheduler for repeated tasks
- Flower, a monitoring interface for
	- Showing tasks, queues and workers
	- Prometheus + Grafana integration
]

.pull-right[
	![celery logo, a cartoon of a piece of celery, now with the text celery next to it](images/celery-logo.png)
]

---

# How does Celery improve Galaxy?

---

**The Problem**

The Galaxy Server should respond quickly to every request.  
While this is easily possible for small instances with few users,  
when scaling to thousands of users and millions of jobs,  
without Celery, Gunicorn and the job handlers spend much of their time on I/O bound side tasks,  
for example packing a zip for history export.  
  
This leads to slow responses and scheduling.

---
.pull-left[
**The Solution**

Queue asynchronous tasks with Celery  
on a different node or even a whole cluster...

]

.pull-right[
	![galaxy logo next to celery logo, with two purple hearts](images/galaxy+celery.png)
]

---

![A workflow diagram is shown with logos and arrows. Galaxy on the left sends tasks to rabbit MQ. Celery fetches tasks from Rabbit MQ and proceses them. Then celery sends results to the backend database. Finally Galaxy fetches those same results back from the backend.](images/workflow.png)

---

## How does the magic work?

- Celery loads the Galaxy code from NFS when you start the workers
 - Workers connect to the broker and fetch tasks from the queue
 - Since all the python modules are already loaded, it can execute the task directly, with almost no delay
 - Now it runs the code according to the task, for example a SQL update, or a file deletion on the NFS
 - Results are either sent directly back to the broker or stored e.g. in a Redis DB

---

## What is Celery used for

- Processing upload jobs
- Processing metadata
- Recalculating disk usage
- Purging datasets
- Changing datatypes
- Preparing compressed downloads (histories, etc.)
- Creating PDFs for Galaxy workflow reports
- Cleaning up short term storage
- Preparing history exports
. . .

---

.pull-left[
## What do you need to enable Celery?

- A properly set up broker, for example with [UseGalaxy.eu RabbitMQ Ansible Role](https://github.com/usegalaxy-eu/ansible-rabbitmq)
- A result backend, e.g. a Redis server, for example with [geerlingguy's Ansible Role](https://github.com/geerlingguy/ansible-role-redis)
- A shared filesystem (e.g. NFS) to which you export Galaxy's root dir and which is mounted on the Celery nodes
- Optional: Flower, the Celery UI, for example with [UseGalaxy.eu Flower Ansible Role](https://github.com/usegalaxy-eu/flower-ansible-role) 
]

.pull-right[
![Collection of three logos, Galaxy, RabbitMQ, and celery](images/logos.png)
]

---

.pull-left[
![](images/celery.png)
]

.pull-right[

## Where to get Celery?

- It is in your Galaxy virtual environment (`venv`) already!
- Mount the Galaxy root with `venv` and config dirs on your Celery node
- Create an Ansible Playbook e.g. with [UseGalaxy.eu's Systemd Role](https://github.com/usegalaxy-eu/ansible-galaxy-systemd)
- For inspiration, you can check out GalaxyEU's vars [file](https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/group_vars/celerycluster.yml)
]

---
.pull-left[
	![screenshot of galaxy documentation page, linked in next section](images/docs.png)
]
.pull-right[
## How to set the Galaxy Config
- [Configuration in `galaxy.yml` file](https://docs.galaxyproject.org/en/latest/admin/options.html#amqp-internal-connection)
- Documentation still WIP
- To connect to your broker, `celery_broker` can be set. Defaults to `amqp_internal_connection` otherwise
- `celery_conf` takes basically all [documented options](https://docs.celeryq.dev/en/stable/getting-started/first-steps-with-celery.html)
- Set your [`result_backend`](https://docs.celeryq.dev/en/stable/getting-started/first-steps-with-celery.html#keeping-results) / Redis connection here
- Celery, Flower and RabbitMQ are all well documented
]

---

.pull-top[
## How to monitor Celery
In the Flower dashboard you can monitor your workers live.  
The screenshot below shows all registered workers and their status.
]

.pull-bottom[
	![screenshot of the flower dashboard showing a bunch of workers](images/flower-screenshot.png)
]

---

.pull-top[
## How to monitor Celery Tasks
In the 'Tasks' tab, you can click on each individual task and see all its details, like    
args, timestamp, worker, result and stack trace if it errored.
]

.pull-bottom[
	![screenshot of a simple flower task](images/flower-screenshot-task.png)
]

---

## Take Home Message
 - Celery is a nice way to load off computation from your head-node
 - On the other hand you have to maintain a broker and monitor your Celery node/cluster
 - If you have strong load fluctuations, you might need to find a way to scale your Celery cluster flexibly
 - Not necessarily needed for smaller instances, but can be considered, if your head node has too high load or I/O

---

## Thank You!

This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!

<div class="contributors-line">
		
<table class="contributions">
	
	<tr>
		<td><abbr title="These people wrote the bulk of the tutorial, they may have done the analysis, built the workflow, and wrote the text themselves.">Author(s)</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/mira-miracoli/" class="contributor-badge contributor-mira-miracoli"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/mira-miracoli?s=36" alt="Mira Kuntz avatar" width="36" class="avatar" />
    Mira Kuntz</a>
		</td>
	</tr>

<tr>
		<td><abbr title="These people edited the text, either for spelling and grammar, flow, GTN-fit, or other similar editing categories">Editor(s)</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-hexylena"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/hexylena?s=36" alt="Helena Rasche avatar" width="36" class="avatar" />
    Helena Rasche</a></td>
	</tr>

<tr class="reviewers">
		<td><abbr title="These people reviewed this material for accuracy and correctness">Reviewers</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/mira-miracoli/" class="contributor-badge contributor-badge-small contributor-mira-miracoli"><img src="https://avatars.githubusercontent.com/mira-miracoli?s=36" alt="Mira Kuntz avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/natefoo/" class="contributor-badge contributor-badge-small contributor-natefoo"><img src="https://avatars.githubusercontent.com/natefoo?s=36" alt="Nate Coraor avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-badge-small contributor-hexylena"><img src="https://avatars.githubusercontent.com/hexylena?s=36" alt="Helena Rasche avatar" width="36" class="avatar" /></a></td>
	</tr>

</table>

</div>

</div>

Tutorial Content is licensed under <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.<br/>