Galaxy Workflow Format 2 Description §
The traditional Galaxy workflow description (.ga) is not meant to be concise and is neither readily human readable or human writable. Format 2 addresses all three of these limitations while also converging (where it makes sense without sacrificing these other goals) with the workflow description with that used by the Common Workflow Language.
This standard is in active development and a moving target in many ways, but we will try to keep what is ingestible by Galaxy backward-compatible going forward.
GalaxyWorkflow §
A Galaxy workflow description. This record corresponds to the description of a workflow that should be executable on a Galaxy server that includes the contained tool definitions.
The workflows API or the user interface of Galaxy instances that are of version 19.09 or newer should be able to import a document defining this record.
A note about label
field. §
This is the name of the workflow in the Galaxy user interface. This is the mechanism that
users will primarily identify the workflow using. Legacy support - this may also be called 'name' and Galaxy will
consume the workflow document fine and treat this attribute correctly - however in order to validate against this
workflow definition schema the attribute should be called label
.
Fields
inputs
Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.
When accepting an input object, all input parameters must have a value.
If an input parameter is missing from the input object, it must be
assigned a value of null
(or the value of default
for that
parameter, if provided) for the purposes of validation and evaluation
of expressions.
outputs
Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.
class
GalaxyWorkflow
steps
The individual steps that make up the workflow. Each step is executed when all of its input data links are fulfilled.
doc
A documentation string for this object, or an array of strings which should be concatenated.
creator
Can be a schema.org Person (https://schema.org/Person) or Organization (https://schema.org/Organization) entity
release
If listed should correspond to the release of the workflow in its source reposiory.
WorkflowInputParameter §
Fields
type
Specify valid types of data that may be assigned to this parameter.
optional
If set to true, WorkflowInputParameter
is not required to submit the workflow.
doc
A documentation string for this object, or an array of strings which should be concatenated.
default
The default value to use for this parameter if the parameter is missing
from the input object, or if the value of the parameter in the input
object is null
. Default values are applied before evaluating expressions
(e.g. dependent valueFrom
fields).
collection_type
Collection type (defaults to list
if type
is collection
). Nested
collection types are separated with colons, e.g. list:list:paired
.
Any §
The Any type validates for any non-null value.
Symbols
symbol | description |
---|---|
Any |
StepPosition §
This field specifies the location of the step's node when rendered in the workflow editor.
Fields
top
Relative vertical position of the step's node when rendered in the workflow editor.
GalaxyType §
Extends primitive types with the native Galaxy concepts such datasets and collections.
Symbols
symbol | description |
---|---|
null | no value |
boolean | a binary value |
int | 32-bit signed integer |
long | 64-bit signed integer |
float | single precision (32-bit) IEEE 754 floating-point number |
double | double precision (64-bit) IEEE 754 floating-point number |
string | Unicode character sequence |
null | no value |
boolean | a binary value |
int | 32-bit signed integer |
long | 64-bit signed integer |
float | single precision (32-bit) IEEE 754 floating-point number |
double | double precision (64-bit) IEEE 754 floating-point number |
string | Unicode character sequence |
integer | an alias for int type - matches syntax used by Galaxy tools |
text | an alias for string type - matches syntax used by Galaxy tools |
File | an alias for data - there are subtle differences between a plain file, the CWL concept of 'File', and the Galaxy concept of a dataset - this may have subtly difference semantics in the future |
data | a Galaxy dataset |
collection | a Galaxy dataset collection |
WorkflowOutputParameter §
Describe an output parameter of a workflow. The parameter must be connected to one parameter defined in the workflow that will provide the value of the output parameter. It is legal to connect a WorkflowInputParameter to a WorkflowOutputParameter.
Fields
doc
A documentation string for this object, or an array of strings which should be concatenated.
outputSource
Specifies workflow parameter that supply the value of to the output parameter.
WorkflowStep §
This represents a non-input step a Galaxy Workflow.
A note about state
and tool_state
fields. §
Only one or the other should be specified. These are two ways to represent the "state" of a tool at this workflow step. Both are essentially maps from parameter names to parameter values.
tool_state
is much more low-level and expects a flat dictionary with each value a JSON
dump. Nested tool structures such as conditionals and repeats should have all their values
in the JSON dumped string. In general tool_state
may be present in workflows exported from
Galaxy but shouldn't be written by humans.
state
can contained a typed map. Repeat values can be represented as YAML arrays. An alternative
to representing state
this way is defining inputs with default values.
Fields
out
Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.
This can also be called 'outputs' for legacy reasons - but the resulting workflow document is not a valid instance of this schema.
doc
A documentation string for this object, or an array of strings which should be concatenated.
tool_id
The tool ID used to run this step of the workflow (e.g. 'cat1' or 'toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.0').
tool_shed_repository
The Galaxy Tool Shed repository that should be installed in order to use this tool.
tool_version
The tool version corresponding used to run this step of the workflow. For tool shed installed tools, the ID generally uniquely specifies a version and this field is optional.
errors
During Galaxy export there may be some problem validating the tool state, tool used, etc.. that will be indicated by this field. The Galaxy user should be warned of these problems before the workflow can be used in Galaxy.
This field should not be used in human written Galaxy workflow files.
A typical problem is the referenced tool is not installed, this can be fixed by installed the tool and re-saving the workflow and then re-exporting it.
in
Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object.
when
If defined, only run the step when the expression evaluates to
true
. If false
the step is skipped. A skipped step
produces a null
on each output.
Expression should be an ecma5.1 expression.
WorkflowStepInput §
TODO:
Fields
source
Specifies one or more workflow parameters that will provide input to the underlying step parameter.
default
The default value for this parameter to use if either there is no
source
field, or the value produced by the source
is null
. The
default must be applied prior to scattering or evaluating valueFrom
.
WorkflowStepOutput §
Associate an output parameter of the underlying process with a workflow
parameter. The workflow parameter (given in the id
field) be may be used
as a source
to connect with input parameters of other workflow steps, or
with an output parameter of the process.
A unique identifier for this workflow output parameter. This is
the identifier to use in the source
field of WorkflowStepInput
to connect the output value to downstream parameters.
Fields
ToolShedRepository §
Fields
changeset_revision
The revision of the tool shed repository this tool can be found in.
tool_shed
The URI of the tool shed containing the repository this tool can be found in - typically this should be toolshed.g2.bx.psu.edu.
WorkflowStepType §
Module types used by Galaxy steps. Galaxy's native format allows additional types such as data_input, data_input_collection, and parameter_type
but these should be represented as inputs
in Format2.
Symbols
symbol | description |
---|---|
tool | Run a tool. |
subworkflow | Run a subworkflow. |
pause | Pause computation on this branch of workflow until user allows it to continue. |
Report §
Definition of an invocation report for this workflow. Currently the only field is 'markdown'.