Galaxy Workflow Format 2 Description §

The traditional Galaxy workflow description (.ga) is not meant to be concise and is neither readily human readable or human writable. Format 2 addresses all three of these limitations while also converging (where it makes sense without sacrificing these other goals) with the workflow description with that used by the Common Workflow Language.

This standard is in active development and a moving target in many ways, but we will try to keep what is ingestible by Galaxy backward-compatible going forward.

GalaxyWorkflow §

A Galaxy workflow description. This record corresponds to the description of a workflow that should be executable on a Galaxy server that includes the contained tool definitions.

The workflows API or the user interface of Galaxy instances that are of version 19.09 or newer should be able to import a document defining this record.

A note about label field. §

This is the name of the workflow in the Galaxy user interface. This is the mechanism that users will primarily identify the workflow using. Legacy support - this may also be called 'name' and Galaxy will consume the workflow document fine and treat this attribute correctly - however in order to validate against this workflow definition schema the attribute should be called label.

Fields

field
required
type
description
inputs
required

Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.

When accepting an input object, all input parameters must have a value. If an input parameter is missing from the input object, it must be assigned a value of null (or the value of default for that parameter, if provided) for the purposes of validation and evaluation of expressions.

outputs
required

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

class
required
constant value GalaxyWorkflow
steps
required

The individual steps that make up the workflow. Each step is executed when all of its input data links are fulfilled.

tags
required
array<string> | null

Tags for the workflow.

id
optional

The unique identifier for this object.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

uuid
optional

UUID uniquely representing this element.

report
optional

Workflow invocation report template.

comments
optional

Visual annotations for the workflow editor canvas. Comments are non-functional and do not affect workflow execution. May be specified as a list or as a mapping keyed by label.

creator
optional

Workflow creators. Can be schema.org Person (https://schema.org/Person) or Organization (https://schema.org/Organization) entities.

license
optional

Must be a valid license listed at https://spdx.org/licenses/

release
optional

If listed should correspond to the release of the workflow in its source reposiory.

WorkflowDataParameter §

A data input parameter for a Galaxy workflow. Represents one Galaxy dataset. Normalized gxformat2 output uses type: data. type: File is accepted as an alias, but should not be confused with workflow test job syntax where type: File means stage a file as test input data.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

type
required

Specify valid types of data that may be assigned to this parameter.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional
format
optional
array<string>

Specify datatype extensions for valid input datasets.

Any §

The Any type validates for any non-null value.

Symbols

symboldescription
Any

StepPosition §

This field specifies the location of the step's node when rendered in the workflow editor.

Fields

field
required
type
description
top
required

Relative vertical position of the step's node when rendered in the workflow editor.

left
required

Relative horizontal position of the step's node when rendered in the workflow editor.

WorkflowCollectionParameter §

A collection input parameter for a Galaxy workflow - represents a dataset collection.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

type
required

Must be collection.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional
format
optional
array<string>

Specify datatype extensions for valid input datasets.

collection_type
optional

Collection type (defaults to list if type is collection). Nested collection types are separated with colons, e.g. list:list:paired.

column_definitions
optional

Column schema for sample-sheet collection inputs. Only meaningful when collection_type begins with sample_sheet - cross-field validation is applied in the pydantic post-validator.

fields
optional

Field schema for record collection inputs. Only meaningful when collection_type contains record (e.g. record, list:record, sample_sheet:record).

SampleSheetColumnDefinition §

Describes one column of a sample-sheet collection input. Used in column_definitions on a collection_type: sample_sheet[:<type>] workflow input.

Fields

field
required
type
description
name
required

Column name. Must not contain special characters (matches ^[\w\-_ \?]*$).

type
required

Value type for this column. One of string, int, float, boolean, or element_identifier. Mirrors Galaxy's runtime SampleSheetColumnType.

optional
required

If true, rows may omit a value for this column.

description
optional

Optional human-readable column description.

default_value
optional

Default value used when a row omits this column. Type must be compatible with type - validated by the pydantic post-validator.

validators
optional
array<Any>

Galaxy-style parameter validators. Modelled as opaque records here - full validator schema lives in galaxy.tool_util_models.

restrictions
optional
array<string | int | float | boolean>

Closed set of permitted values for this column. Item type must be compatible with the column type (post-validated).

suggestions
optional
array<string | int | float | boolean>

Open suggestion list for this column.

RecordFieldDefinition §

Describes one field of a record collection input. Used in fields on a collection_type containing record (e.g. record, list:record, sample_sheet:record). Mirrors a subset of the CWL InputRecordSchema shape that Galaxy persists on DatasetCollection.fields.

Fields

field
required
type
description
name
required

Field name. Must equal the corresponding element identifier in the materialized record collection.

type
required
string | array<string>

Field value type. A subset of the CWL primitive types: File, null, boolean, int, float, string. May be a list to express a union (e.g. ["File", "null"] for an optional file).

format
optional

Optional Galaxy datatype hint for File-typed fields.

WorkflowIntegerParameter §

A scalar integer workflow parameter. Normalized gxformat2 output uses type: int. type: integer is accepted for compatibility with native Galaxy parameter state and Galaxy tool XML terminology.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

min
required

Minimum allowed value (inclusive).

max
required

Maximum allowed value (inclusive).

type
required

Must be integer or int.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional

WorkflowFloatParameter §

A float input parameter for a Galaxy workflow.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

min
required

Minimum allowed value (inclusive).

max
required

Maximum allowed value (inclusive).

type
required

Must be float.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional

WorkflowTextParameter §

A scalar text workflow parameter. Normalized gxformat2 output uses type: string. type: text is accepted for compatibility with native Galaxy parameter state and Galaxy tool XML terminology.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

type
required

Must be text or string.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional
restrictions
optional

Closed set of permitted values. When present, Galaxy renders the runtime input as a select. Items may be plain strings or {value, label} records.

suggestions
optional

Open suggestion list. Galaxy still treats the input as text but offers these as suggestions.

restrictOnConnections
optional

Ask Galaxy to derive valid choices from connected tool or subworkflow select inputs at runtime. Falls back to free text when derivation fails.

WorkflowTextOption §

A {value, label} option used in restrictions or suggestions on a text workflow parameter. Plain strings are also accepted in those arrays as shorthand for {value: <str>, label: <str>}.

Fields

field
required
type
description
value
required

Machine value submitted to the connected tool input.

label
optional

Human label shown in Galaxy. Defaults to value when omitted.

WorkflowBooleanParameter §

A boolean input parameter for a Galaxy workflow.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

type
required

Must be boolean.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional

WorkflowInputParameter §

An input parameter to a Galaxy workflow. This is the catch-all type used by the Schema Salad codegen. The pydantic layer uses a discriminated union of the specific parameter types instead.

Fields

field
required
type
description
optional
required

Controls whether Galaxy allows invocation of the workflow without a user-supplied value for this input. If true, the input may be omitted at invocation time. optional and default are independent: a required input (optional: false) may still declare a default, and an optional input may have no default. default supplies a value when the invocation input is missing or null; optional controls whether the missing case is even permitted.

min
required

Minimum allowed value (inclusive).

max
required

Maximum allowed value (inclusive).

type
required

Specify valid types of data that may be assigned to this parameter.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

position
optional
format
optional
array<string>

Specify datatype extensions for valid input datasets.

collection_type
optional

Collection type (defaults to list if type is collection). Nested collection types are separated with colons, e.g. list:list:paired.

column_definitions
optional

Column schema for sample-sheet collection inputs. Only meaningful when collection_type begins with sample_sheet.

fields
optional

Field schema for record collection inputs. Only meaningful when collection_type contains record.

restrictions
optional

Closed set of permitted values for text-typed inputs. See WorkflowTextParameter.restrictions.

suggestions
optional

Open suggestion list for text-typed inputs.

restrictOnConnections
optional

For text-typed inputs - derive runtime choices from connected tool/subworkflow select inputs.

GalaxyType §

Extends primitive types with the native Galaxy concepts such as datasets and collections.Normalized gxformat2 workflow input declaration spellings are data, collection, string, int, float, and boolean. Other spellings are accepted as compatibility aliases on import but normalized gxformat2 output emits the normalized spellings.

Symbols

symboldescription
null no value
boolean a binary value
int normalized gxformat2 spelling for native Galaxy integer workflow parameters.
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string normalized gxformat2 spelling for native Galaxy text workflow parameters.
null no value
boolean a binary value
int normalized gxformat2 spelling for native Galaxy integer workflow parameters.
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string normalized gxformat2 spelling for native Galaxy text workflow parameters.
integer accepted alias for ``int`` because native Galaxy parameter state and Galaxy tool XML terminology use ``integer``.
text accepted alias for ``string`` because native Galaxy parameter state and Galaxy tool XML terminology use ``text``.
File accepted alias for ``data``, but normalized gxformat2 output emits ``data``. Note: workflow **test job** YAML uses ``type: File`` to mean 'stage this file as test input data', which is a separate concept from workflow input declaration.
data one Galaxy dataset input. Native Galaxy ``data_input`` converts to this spelling.
collection one Galaxy dataset collection input. Native Galaxy ``data_collection_input`` converts to this spelling.

WorkflowOutputParameter §

Describe an output parameter of a workflow. The parameter must be connected to one parameter defined in the workflow that will provide the value of the output parameter. It is legal to connect a WorkflowInputParameter to a WorkflowOutputParameter.

Fields

field
required
type
description
label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

outputSource
optional

Specifies workflow parameter that supply the value of to the output parameter.

type
optional

Specify valid types of data that may be assigned to this parameter.

WorkflowStep §

This represents a non-input step a Galaxy Workflow.

A note about state and tool_state fields. §

Only one or the other should be specified. These are two ways to represent the "state" of a tool at this workflow step. Both are essentially maps from parameter names to parameter values.

tool_state is much more low-level and expects a flat dictionary with each value a JSON dump. Nested tool structures such as conditionals and repeats should have all their values in the JSON dumped string. In general tool_state may be present in workflows exported from Galaxy but shouldn't be written by humans.

state can contained a typed map. Repeat values can be represented as YAML arrays. An alternative to representing state this way is defining inputs with default values.

Fields

field
required
type
description
out
required
array<string | WorkflowStepOutput> |
map<idsource | string | WorkflowStepOutput> | null

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

This can also be called 'outputs' for legacy reasons - but the resulting workflow document is not a valid instance of this schema.

id
optional

The unique identifier for this object.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

position
optional
tool_id
optional

The tool ID used to run this step of the workflow (e.g. 'cat1' or 'toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.0').

tool_shed_repository
optional

The Galaxy Tool Shed repository that should be installed in order to use this tool.

tool_version
optional

The tool version corresponding used to run this step of the workflow. For tool shed installed tools, the ID generally uniquely specifies a version and this field is optional.

errors
optional

During Galaxy export there may be some problem validating the tool state, tool used, etc.. that will be indicated by this field. The Galaxy user should be warned of these problems before the workflow can be used in Galaxy.

This field should not be used in human written Galaxy workflow files.

A typical problem is the referenced tool is not installed, this can be fixed by installed the tool and re-saving the workflow and then re-exporting it.

uuid
optional

UUID uniquely representing this element.

in
optional
array<WorkflowStepInput> |
map<idsource | WorkflowStepInput>

Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object.

state
optional

Structured tool state.

tool_state
optional

Unstructured tool state.

post_job_actions
optional

Optional dict of post-job actions keyed by {ActionType}{OutputName} compound strings. Same shape as the native post_job_actions field; each value is a record with action_type, output_name, action_arguments. Use the out: shorthand (rename:, hide:, change_datatype:, etc.) for common actions; this explicit form covers actions without an out: shorthand (ValidateOutputsAction, etc.) and any case where the typed record is preferred.

type
optional

Workflow step module's type (defaults to 'tool').

run
optional

Specifies a subworkflow to run. May be an inline workflow definition, a URL string, or an @import reference dict.

runtime_inputs
optional
array<string>
when
optional

If defined, only run the step when the expression evaluates to true. If false the step is skipped. A skipped step produces a null on each output.

Expression should be an ecma5.1 expression.

WorkflowStepInput §

TODO:

Fields

field
required
type
description
id
optional

The unique identifier for this object.

source
optional
string | array<string>

Specifies one or more workflow parameters that will provide input to the underlying step parameter.

label
optional

A short, human-readable label of this object.

default
optional

The default value for this parameter to use if either there is no source field, or the value produced by the source is null. The default must be applied prior to scattering or evaluating valueFrom.

WorkflowStepOutput §

Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the id field) be may be used as a source to connect with input parameters of other workflow steps, or with an output parameter of the process.

A unique identifier for this workflow output parameter. This is the identifier to use in the source field of WorkflowStepInput to connect the output value to downstream parameters.

Fields

field
required
type
description
id
optional

The unique identifier for this object.

add_tags
optional
array<string>
change_datatype
optional
delete_intermediate_datasets
optional
hide
optional
remove_tags
optional
array<string>
rename
optional
set_columns
optional

ToolShedRepository §

Fields

field
required
type
description
changeset_revision
required

The revision of the tool shed repository this tool can be found in.

name
required

The name of the tool shed repository this tool can be found in.

owner
required

The owner of the tool shed repository this tool can be found in.

tool_shed
required

The URI of the tool shed containing the repository this tool can be found in - typically this should be toolshed.g2.bx.psu.edu.

WorkflowStepType §

Module types used by Galaxy steps. Galaxy's native format allows additional types such as data_input, data_input_collection, and parameter_type but these should be represented as inputs in Format2.

Symbols

symboldescription
tool Run a tool.
subworkflow Run a subworkflow.
pause Pause computation on this branch of workflow until user allows it to continue.
pick_value Select the first non-null value from multiple inputs. Used to merge branches of conditional or optional workflow paths.

Report §

Definition of an invocation report for this workflow. Currently the only field is 'markdown'.

Fields

field
required
type
description
markdown
required

Galaxy flavored Markdown to define an invocation report.