View markdown source on GitHub

Connecting Galaxy to a compute cluster

Contributors

Questions

Objectives

last_modification Published: Jan 7, 2018
last_modification Last Updated: Apr 21, 2023

Galaxy Job Configuration

Speaker Notes


Why cluster?

Running jobs on the Galaxy server negatively impacts Galaxy UI performance

Even adding one other host helps

Can restart Galaxy without interrupting jobs

Speaker Notes


Runners

Correspond to job runner plugins in lib/galaxy/jobs/runners

.left[Plugins for:]

Speaker Notes


Cluster library stack (DRMAA)

Cluster library stack

Speaker Notes


Handlers

Control how jobs are assigned to handlers (use db-skip-locked)

Can statically define handler configuration (uncommon)

Speaker Notes


Environments

Formerly “Destinations”

Define how jobs should be run

Speaker Notes


The default job configuration

runners:
  local:
    load: galaxy.jobs.runners.local:LocalJobRunner
    workers: 4

execution:
  default: local
  environments:
    local:
      runner: local

Speaker Notes


Job Config - Tags

Both environments and handlers can be grouped by tags

Speaker Notes


Job Environment

env key in environments: configure the job execution environment

syntax function
- {name: NAME, value: VALUE} Set $NAME to VALUE
- {file: /path/to/file} Source shell file at /path/to/file
- {execute: CMD} Execute CMD

Source and command execution will be handled on the remote destination, don’t need to work on the Galaxy server

Speaker Notes


Limits

Available limits

Speaker Notes


Concurrency Limits

Available limits

Speaker Notes


Shared Filesystem

Most job plugins require a shared filesystem between the Galaxy server and compute.

The exception is Pulsar. More on this in Running Jobs on Remote Resources with Pulsar.

Speaker Notes


Shared Filesystem

Our simple example works because of two important principles:

  1. Some things are located at the same path on Galaxy server and node(s)
    • Galaxy application (/srv/galaxy/server)
    • Tool dependencies
  2. Some things are the same on Galaxy server and node(s)
    • Job working directory
    • Input and output datasets

The first can be worked around with symlinks, copies, or Pulsar embedded

The second can be worked around with Pulsar REST/MQ (with a performance/throughput penalty)

Speaker Notes


Multiprocessing

Some tools can greatly improve performance by using multiple cores

Galaxy automatically sets $GALAXY_SLOTS to the CPU/core count you specify when submitting, for example, 4:

Tool configs: Consume \${GALAXY_SLOTS:-4}

Speaker Notes


Memory requirements

For Slurm and Gridengine only, Galaxy will set $GALAXY_MEMORY_MB and $GALAXY_MEMORY_MB_PER_SLOT as integers.

Other DRMs: Please PR the appropriate code.

For Java tools, be sure to set -Xmx, e.g.:

    java_cluster:
      runner: drmaa
      env:
        - name: '_JAVA_OPTIONS'
          value: '-Xmx6G'

Speaker Notes


Run jobs as the “real” user

If your Galaxy users == System users:

See: Cluster documentation

Speaker Notes


Job Config - Mapping Tools to Environments

Problem: Tool A uses single core, Tool B uses multiple

Speaker Notes


Job Config - Mapping Tools to Environments

Solution:

execution:
  default: singlecore_slurm
  environments:
    singlecore_slurm:
      runner: slurm

    multicore_slurm:
      runner: slurm
      native_specification: '--ntasks=4'
tools:
- id: hisat2
  handler: multicore_slurm

Speaker Notes


The Dynamic Job Runner

For when basic tool-to-environment mapping isn’t enough

Speaker Notes


The Dynamic Job Runner

A special built-in job runner plugin

Map jobs to destinations on more than just tool IDs

.left[Two types:]

See: Dynamic Destination Mapping

Speaker Notes


Total Perspective Vortex (TPV)

Powerful, fully dynamic tool-to-environment mapping based on tool, user, resource requirements, tags, and more.

Discussed in detail in its own tutorial.

See also: TPV Documentation.


Arbitrary Python Functions

.left[Programmable mappings:]

Speaker Notes


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! page logo Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.