Galaxy Workflow Format 2 Description §

The traditional Galaxy workflow description (.ga) is not meant to be concise and is neither readily human readable or human writable. Format 2 addresses all three of these limitations while also converging (where it makes sense without sacrificing these other goals) with the workflow description with that used by the Common Workflow Language.

This standard is in active development and a moving target in many ways, but we will try to keep what is ingestible by Galaxy backward-compatible going forward.

GalaxyWorkflow §

A Galaxy workflow description. This record corresponds to the description of a workflow that should be executable on a Galaxy server that includes the contained tool definitions.

The workflows API or the user interface of Galaxy instances that are of version 19.09 or newer should be able to import a document defining this record.

A note about label field. §

This is the name of the workflow in the Galaxy user interface. This is the mechanism that users will primarily identify the workflow using. Legacy support - this may also be called 'name' and Galaxy will consume the workflow document fine and treat this attribute correctly - however in order to validate against this workflow definition schema the attribute should be called label.

Fields

field
required
type
description
inputs
required

Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.

When accepting an input object, all input parameters must have a value. If an input parameter is missing from the input object, it must be assigned a value of null (or the value of default for that parameter, if provided) for the purposes of validation and evaluation of expressions.

outputs
required

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

class
required
steps
required

The individual steps that make up the workflow. Each step is executed when all of its input data links are fulfilled.

tags
required
array<string> | null

Tags for the workflow.

id
optional

The unique identifier for this object.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

uuid
optional

UUID uniquely representing this element.

report
optional

Workflow invocation report template.

creator
optional

Can be a schema.org Person (https://schema.org/Person) or Organization (https://schema.org/Organization) entity

license
optional

Must be a valid license listed at https://spdx.org/licenses/

release
optional

If listed should correspond to the release of the workflow in its source reposiory.

WorkflowInputParameter §

Fields

field
required
type
description
type
required

Specify valid types of data that may be assigned to this parameter.

optional
required

If set to true, WorkflowInputParameter is not required to submit the workflow.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional
format
optional
array<string>

Specify datatype extension for valid input datasets.

collection_type
optional

Collection type (defaults to list if type is collection). Nested collection types are separated with colons, e.g. list:list:paired.

Any §

The Any type validates for any non-null value.

Symbols

symboldescription
Any

StepPosition §

This field specifies the location of the step's node when rendered in the workflow editor.

Fields

field
required
type
description
top
required

Relative vertical position of the step's node when rendered in the workflow editor.

left
required

Relative horizontal position of the step's node when rendered in the workflow editor.

GalaxyType §

Extends primitive types with the native Galaxy concepts such datasets and collections.

Symbols

symboldescription
null no value
boolean a binary value
int 32-bit signed integer
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string Unicode character sequence
null no value
boolean a binary value
int 32-bit signed integer
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string Unicode character sequence
integer an alias for int type - matches syntax used by Galaxy tools
text an alias for string type - matches syntax used by Galaxy tools
File an alias for data - there are subtle differences between a plain file, the CWL concept of 'File', and the Galaxy concept of a dataset - this may have subtly difference semantics in the future
data a Galaxy dataset
collection a Galaxy dataset collection

WorkflowOutputParameter §

Describe an output parameter of a workflow. The parameter must be connected to one parameter defined in the workflow that will provide the value of the output parameter. It is legal to connect a WorkflowInputParameter to a WorkflowOutputParameter.

Fields

field
required
type
description
label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

outputSource
optional

Specifies workflow parameter that supply the value of to the output parameter.

type
optional

Specify valid types of data that may be assigned to this parameter.

WorkflowStep §

This represents a non-input step a Galaxy Workflow.

A note about state and tool_state fields. §

Only one or the other should be specified. These are two ways to represent the "state" of a tool at this workflow step. Both are essentially maps from parameter names to parameter values.

tool_state is much more low-level and expects a flat dictionary with each value a JSON dump. Nested tool structures such as conditionals and repeats should have all their values in the JSON dumped string. In general tool_state may be present in workflows exported from Galaxy but shouldn't be written by humans.

state can contained a typed map. Repeat values can be represented as YAML arrays. An alternative to representing state this way is defining inputs with default values.

Fields

field
required
type
description
out
required
array<string | WorkflowStepOutput> |
map<idsource | string | WorkflowStepOutput> | null

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

This can also be called 'outputs' for legacy reasons - but the resulting workflow document is not a valid instance of this schema.

id
optional

The unique identifier for this object.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

position
optional
tool_id
optional

The tool ID used to run this step of the workflow (e.g. 'cat1' or 'toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.0').

tool_shed_repository
optional

The Galaxy Tool Shed repository that should be installed in order to use this tool.

tool_version
optional

The tool version corresponding used to run this step of the workflow. For tool shed installed tools, the ID generally uniquely specifies a version and this field is optional.

errors
optional

During Galaxy export there may be some problem validating the tool state, tool used, etc.. that will be indicated by this field. The Galaxy user should be warned of these problems before the workflow can be used in Galaxy.

This field should not be used in human written Galaxy workflow files.

A typical problem is the referenced tool is not installed, this can be fixed by installed the tool and re-saving the workflow and then re-exporting it.

uuid
optional

UUID uniquely representing this element.

in
optional
array<WorkflowStepInput> |
map<idsource | WorkflowStepInput>

Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object.

state
optional

Structured tool state.

tool_state
optional

Unstructured tool state.

type
optional

Workflow step module's type (defaults to 'tool').

run
optional

Specifies a subworkflow to run.

runtime_inputs
optional
array<string>
when
optional

If defined, only run the step when the expression evaluates to true. If false the step is skipped. A skipped step produces a null on each output.

Expression should be an ecma5.1 expression.

WorkflowStepInput §

TODO:

Fields

field
required
type
description
id
optional

The unique identifier for this object.

source
optional
string | array<string>

Specifies one or more workflow parameters that will provide input to the underlying step parameter.

label
optional

A short, human-readable label of this object.

default
optional

The default value for this parameter to use if either there is no source field, or the value produced by the source is null. The default must be applied prior to scattering or evaluating valueFrom.

WorkflowStepOutput §

Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the id field) be may be used as a source to connect with input parameters of other workflow steps, or with an output parameter of the process.

A unique identifier for this workflow output parameter. This is the identifier to use in the source field of WorkflowStepInput to connect the output value to downstream parameters.

Fields

field
required
type
description
id
optional

The unique identifier for this object.

add_tags
optional
array<string>
change_datatype
optional
delete_intermediate_datasets
optional
hide
optional
remove_tags
optional
array<string>
rename
optional
set_columns
optional
array<string>

ToolShedRepository §

Fields

field
required
type
description
changeset_revision
required

The revision of the tool shed repository this tool can be found in.

name
required

The name of the tool shed repository this tool can be found in.

owner
required

The owner of the tool shed repository this tool can be found in.

tool_shed
required

The URI of the tool shed containing the repository this tool can be found in - typically this should be toolshed.g2.bx.psu.edu.

WorkflowStepType §

Module types used by Galaxy steps. Galaxy's native format allows additional types such as data_input, data_input_collection, and parameter_type but these should be represented as inputs in Format2.

Symbols

symboldescription
tool Run a tool.
subworkflow Run a subworkflow.
pause Pause computation on this branch of workflow until user allows it to continue.

Report §

Definition of an invocation report for this workflow. Currently the only field is 'markdown'.

Fields

field
required
type
description
markdown
required

Galaxy flavored Markdown to define an invocation report.