Validate the mutated draft against draft-contract rules; with --concrete, also gate the extracted concrete subset (including the step just implemented) against full gxformat2.
Trigger: After implementing or modifying a concrete tool step in the draft.
on-demand runtime sidecar hypothesis deterministic 4.7 KB
- bundle
references/cli/draft-validate.json - source
content/cli/gxwf/draft-validate.md
Preview json
{
"type": "cli-command",
"tool": "gxwf",
"command": "draft-validate",
"summary": "Validate a `class: GalaxyWorkflowDraft` workflow against draft-contract rules; with --concrete, also validate the extracted concrete subset.",
"source_path": "content/cli/gxwf/draft-validate.md",
"source_revision": 1,
"body": "# `gxwf draft-validate`\n\nValidate a draft Galaxy workflow against the **draft contract**: sentinel form, dangling edge references, top-level `_plan_*` placement, `_plan_*` on fully-resolved tool steps, and recursive draft subworkflows. Native (.ga) input is rejected — drafts are format2-only.\n\nDistinct from [[validate]], which validates a fully concrete `class: GalaxyWorkflow` and would reject the draft relaxations outright. Use `draft-validate` during the per-step authoring loop; use `validate` at the terminal pass once `promoteFullyConcreteDrafts` has flipped the class.\n\n## Output\n\nDefault output is human-readable: counted buckets for structure / topology / semantic errors and warnings, plus a one-line survey (TODO sentinel count, paths carrying `_plan_*`). `--json` emits a `SingleDraftValidationReport`; `--report-html` and `--report-markdown` write the same data as a self-contained HTML page or templated Markdown. With `--concrete`, the report carries an optional `concrete: ConcreteValidationReport` whose buckets (`structure_errors`, `strict_structure_errors`, `strict_encoding_errors`, `strict_state_errors`, `tool_state`, `connection_report`) are **absent when the corresponding check did not run** — readers should treat absence as \"not run,\" not as \"passed.\"\n\n## Flags\n\n`--concrete` runs the extract+promote pipeline (`extractConcreteSubset` → `stripPlanFields` → `promoteFullyConcreteDrafts`) and applies the full concrete `gxformat2` validation surface to the result. The following pass-through flags only take effect under `--concrete`; passing them without it prints a stderr warning and no-ops:\n\n- `--cache-dir <dir>` — tool cache for tool-state lookups.\n- `--no-tool-state` — skip tool-state validation on the concrete pass. Combined with `--strict-state`, the strict flag warns + no-ops (there's no state to be strict about).\n- `--connections` — run connection validation on the concrete subset.\n- `--strict` — escalate every strict bucket (structure, encoding, state) to error.\n- `--strict-structure` / `--strict-encoding` /
...
Implement identifier-derived collection reshaping via Apply Rules.
Trigger: When collection element identifiers need regex parsing, nesting-level swaps, regrouping, or paired identifier assignment.
on-demand runtime verbatim corpus-observed deterministic 34.9 KB
- bundle
references/notes/galaxy-apply-rules-dsl.md - source
content/research/galaxy-apply-rules-dsl.md
Preview md
---
type: research
subtype: component
title: "Galaxy Apply Rules DSL"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-04-30
revised: 2026-05-02
revision: 2
ai_generated: true
related_notes:
- "[[galaxy-collection-tools]]"
- "[[galaxy-collection-semantics]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
- "[[collection-build-list-paired-with-apply-rules]]"
- "[[collection-build-named-bundle]]"
- "[[collection-cleanup-after-mapover-failure]]"
- "[[collection-flatten-after-fanout]]"
- "[[collection-split-identifier-via-rules]]"
- "[[collection-swap-nesting-with-apply-rules]]"
- "[[collection-unbox-singleton]]"
- "[[relabel-via-rules-and-find-replace]]"
- "[[iwc-transformations-survey]]"
sources:
- "https://github.com/jmchilton/galaxy-agentic-collection-transform (initial research seed)"
- "https://github.com/galaxyproject/galaxy/blob/main/lib/galaxy/util/rules_dsl.py"
- "https://github.com/galaxyproject/galaxy/blob/main/lib/galaxy/util/rules_dsl_spec.yml"
summary: "Reference for Galaxy's Apply Rules DSL: rule operations, mapping operations, composition patterns, pitfalls."
---
Reference for Galaxy's Apply Rules DSL — the rule grammar consumed by `__APPLY_RULES__` (see [[galaxy-collection-tools]] for the surrounding tool catalog and [[galaxy-collection-semantics]] for collection mapping/reduction semantics).
**Key principle:** rules transform collection metadata (identifiers, indices, tags) as tabular data; mapping operations turn the resulting columns back into collection structure.
**Sources of truth in Galaxy:**
- `lib/galaxy/util/rules_dsl.py` — rule implementation
- `lib/galaxy/util/rules_dsl_spec.yml` — test spec covering every rule type
- `lib/gala
...
Connect concrete Galaxy tool inputs/outputs while preserving collection mapping and reduction semantics.
Trigger: When implementing a step with data_collection inputs, mapped outputs, reductions, or nested collection wiring.
on-demand runtime verbatim corpus-observed deterministic 1.9 KB
- bundle
references/notes/galaxy-collection-semantics.md - source
content/research/galaxy-collection-semantics.md
Preview md
---
type: research
subtype: component
title: "Galaxy collection semantics"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-04-30
revised: 2026-05-05
revision: 3
ai_generated: false
related_notes:
- "[[galaxy-xsd]]"
- "[[galaxy-collection-tools]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
- "[[galaxy-tool-job-failure-reference]]"
- "[[galaxy-workflow-invocation-failure-reference]]"
- "[[iwc-transformations-survey]]"
- "[[galaxy-discover-datasets]]"
sources:
- "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/lib/galaxy/model/dataset_collections/types/collection_semantics.yml"
companions:
- "galaxy-collection-semantics.yml"
- "galaxy-collection-semantics.upstream.myst"
summary: "Vendored formal spec of Galaxy dataset-collection mapping/reduction semantics, with labeled examples and pinned test references."
---
> **Vendored from upstream**, pinned at SHA `7765fae`. Two files live next to this note:
>
> - `galaxy-collection-semantics.yml` — the structured source. **Agents and casting should consume this.** It carries the `tests:` blocks that pin concrete Galaxy test names; the rendered upstream view drops them.
> - `galaxy-collection-semantics.upstream.myst` — Galaxy's auto-generated MyST/LaTeX rendering of the YAML, vendored only so the human view below has something to render. Sync is manual.
>
> **When to consult:** authoring or reasoning about Molds and patterns that touch `data_collection` inputs, map-over / reduction shape changes, sub-collection mapping, `paired_or_unpaired`, or `sample_sheet`.
```vendored-myst
file: galaxy-collection-semantics.upstream.myst
source: https://github.com/g
...
Connect concrete Galaxy tool inputs/outputs while preserving collection mapping and reduction semantics.
Trigger: When implementing a step with data_collection inputs, mapped outputs, reductions, or nested collection wiring.
on-demand runtime verbatim corpus-observed deterministic 33.4 KB
- bundle
references/notes/galaxy-collection-semantics.upstream.myst - source
content/research/galaxy-collection-semantics.upstream.myst
Preview myst
# Collection Semantics
This document describes the semantics around working with Galaxy dataset collections.
In particular it describes how they operate within Galaxy tools and workflows.
:::{admonition} You Probably Don't Need to Read This
:class: caution
Any significantly sophisticated workflow language will have ways to collect data
into arrays or vectors or dictionaries and apply operations across this data (mapping)
or reduce the dimensionality of this data (reductions). Typically, this is explicitly
annotated with map functions or for loops. Galaxy however is designed to be a point
and click interface for connecting steps and running tools. It is important that steps
just connect and just do the most natural thing - and this is what Galaxy does.
This document just provides a mathematical formalism to that "what should just
intuitively work" that can be used to document test cases and help with implementation.
This is reference documentation not user documentation, Galaxy should just work.
:::
## Mapping
If a tool consumes a simple dataset parameter and produces a simple dataset parameter,
then any collection type may be "mapped over" the data input to that tool. The result of
that is the tool being applied to each element of the collection and "implicit collections"
being created from the outputs that are produced from those operations. Those implicit
collections have the same element identifiers in the same order as the input collection that is
mapped over. Each element of the implicit collections correspond to their own job and
Galaxy very naturally and intuitively parallelizes jobs without extra work from the user
and without any knowledge of the tool.
(BASIC_MAPPING_PAIRED)=
(BASIC_MAPPING_PAIRED_OR_UNPAIRED_PAIRED)=
(BASIC_MAPPING_PAIRED_OR_UNPAIRED_UN
...
Connect concrete Galaxy tool inputs/outputs while preserving collection mapping and reduction semantics.
Trigger: When implementing a step with data_collection inputs, mapped outputs, reductions, or nested collection wiring.
on-demand runtime verbatim corpus-observed deterministic 43.8 KB
- bundle
references/notes/galaxy-collection-semantics.yml - source
content/research/galaxy-collection-semantics.yml
Preview yml
- doc: |
# Collection Semantics
This document describes the semantics around working with Galaxy dataset collections.
In particular it describes how they operate within Galaxy tools and workflows.
:::{admonition} You Probably Don't Need to Read This
:class: caution
Any significantly sophisticated workflow language will have ways to collect data
into arrays or vectors or dictionaries and apply operations across this data (mapping)
or reduce the dimensionality of this data (reductions). Typically, this is explicitly
annotated with map functions or for loops. Galaxy however is designed to be a point
and click interface for connecting steps and running tools. It is important that steps
just connect and just do the most natural thing - and this is what Galaxy does.
This document just provides a mathematical formalism to that "what should just
intuitively work" that can be used to document test cases and help with implementation.
This is reference documentation not user documentation, Galaxy should just work.
:::
## Mapping
If a tool consumes a simple dataset parameter and produces a simple dataset parameter,
then any collection type may be "mapped over" the data input to that tool. The result of
that is the tool being applied to each element of the collection and "implicit collections"
being created from the outputs that are produced from those operations. Those implicit
collections have the same element identifiers in the same order as the input collection that is
mapped over. Each element of the implicit collections correspond to their own job and
Galaxy very naturally and intuitively parallelizes jobs without extra work from the user
and without any knowledge of the tool.
...
Insert built-in Galaxy collection-operation steps when a direct tool connection cannot express the needed shape.
Trigger: When a step needs collection construction, filtering, extraction, zipping, unzipping, flattening, merging, or relabeling.
on-demand runtime verbatim corpus-observed deterministic 12.5 KB
- bundle
references/notes/galaxy-collection-tools.md - source
content/research/galaxy-collection-tools.md
Preview md
---
type: research
subtype: component
title: "Galaxy collection-operation tools"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-04-30
revised: 2026-05-02
revision: 2
ai_generated: true
related_notes:
- "[[galaxy-collection-semantics]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
- "[[iwc-transformations-survey]]"
sources:
- "https://github.com/jmchilton/galaxy-agentic-collection-transform (initial research seed)"
- "https://github.com/galaxyproject/galaxy/tree/main/lib/galaxy/tools (XML wrappers; source of truth)"
summary: "Catalog of Galaxy's collection-operation tools — purpose, IO, parameters, selection guide. Companion to galaxy-collection-semantics."
---
Catalog of Galaxy's built-in collection-operation tools (the `__BUILD_LIST__`, `__FILTER_FROM_FILE__`, `__APPLY_RULES__`, … family) — what each tool does, its inputs and outputs, and when to reach for it. Source of truth is the Galaxy XML wrappers under `lib/galaxy/tools/`; this is the high-level catalog. Pairs with [[galaxy-collection-semantics]], which describes the underlying mapping and reduction semantics rather than the user-facing tools.
These are **model operations** — they manipulate collection structure without processing file contents, so they're fast and don't grow storage.
## Tool Categories
### 1. Collection Creation Tools
#### Build List (`__BUILD_LIST__`)
**Version:** 1.2.0
**Purpose:** Build a new list collection from individual datasets or collections.
**Inputs:**
- `datasets` (repeat): Input datasets or collections
- `input`: Data input (optional)
- `id_cond/id_select`: Label selection method
- `idx`: Use index (0, 1, 2...)
- `identifier`: Use da
...
Preserve concrete tool/job failure evidence while implementing step labels, tool ids, output labels, and collection wiring.
Trigger: When a selected wrapper has explicit failure semantics, dynamic outputs, non-default stdio rules, strict-shell behavior, or runtime-only failure risk.
on-demand runtime verbatim corpus-observed deterministic 7.3 KB
- bundle
references/notes/galaxy-tool-job-failure-reference.md - source
content/research/galaxy-tool-job-failure-reference.md
Preview md
---
type: research
subtype: component
title: "Galaxy tool and job failure reference"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
related_notes:
- "[[galaxy-workflow-invocation-failure-reference]]"
- "[[planemo-workflow-test-architecture]]"
- "[[galaxy-collection-semantics]]"
related_molds:
- "[[implement-galaxy-tool-step]]"
- "[[debug-galaxy-workflow-output]]"
sources:
- "~/projects/repositories/galaxy/lib/galaxy/tools/__init__.py"
- "~/projects/repositories/galaxy/lib/galaxy/tool_util/parser/xml.py"
- "~/projects/repositories/galaxy/lib/galaxy/tool_util/output_checker.py"
- "~/projects/repositories/galaxy/lib/galaxy/jobs"
- "~/projects/repositories/galaxy/lib/galaxy/webapps/galaxy/api/jobs.py"
summary: "Reference for Galaxy tool stdio rules, job failure detection, job states, and job API failure surfaces."
---
# Galaxy Tool And Job Failure Reference
This is reference material, not a debug recipe. Use it to understand what Galaxy can know about a failed tool job and which API surfaces preserve that evidence.
## Model
Galaxy tool failure handling is layered:
- The tool wrapper defines expected failure semantics through `detect_errors`, `<stdio>`, exit-code checks, regex checks, and command strictness.
- The job runner executes the command and captures exit code plus tool/job stdout and stderr streams.
- Galaxy evaluates configured failure rules and records structured `job_messages`.
- The job reaches a terminal state, output datasets may become `error`, and dependent jobs may pause or fail later.
- Workflow invocation APIs summarize those jobs, but job APIs preserve the most detailed tool-level evidence.
## Tool Wrapper Failure Controls
Important wrapper con
...
Preserve testable output labels and collection element identifiers while replacing abstract steps with concrete gxformat2 steps.
Trigger: When a concrete step changes output labels, emits collection outputs, creates a diagnostic checkpoint, or makes a final output too weakly assertable.
on-demand runtime verbatim corpus-observed deterministic 11.4 KB
- bundle
references/notes/galaxy-workflow-testability-design.md - source
content/research/galaxy-workflow-testability-design.md
Preview md
---
type: research
subtype: component
tags:
- research/component
- target/galaxy
status: draft
created: 2026-05-03
revised: 2026-05-06
revision: 2
ai_generated: true
related_notes:
- "[[iwc-workflow-testability-survey]]"
- "[[iwc-test-data-conventions]]"
- "[[planemo-asserts-idioms]]"
- "[[iwc-shortcuts-anti-patterns]]"
- "[[planemo-workflow-test-architecture]]"
- "[[implement-galaxy-workflow-test]]"
- "[[gxformat2-schema]]"
- "[[gxformat2-workflow-inputs]]"
- "[[galaxy-datatypes-conf]]"
summary: "Design guidance for Galaxy workflow inputs, outputs, and checkpoints that make IWC-style workflow tests possible."
---
# Galaxy workflow testability design
Use this note when authoring or translating a Galaxy workflow **before** the `-tests.yml` file exists. It covers workflow structure choices that make later IWC-style tests meaningful: labels, promoted checkpoints, collection identifiers, and fixture-compatible inputs.
This is not a `content/patterns/` page. It is cross-cutting design guidance for Molds that need testable Galaxy workflows. Assertion syntax lives in [[planemo-asserts-idioms]]. Test YAML fixture shapes live in [[iwc-test-data-conventions]]. Accepted shortcut vs smell calls live in [[iwc-shortcuts-anti-patterns]]. Corpus evidence trail lives in [[iwc-workflow-testability-survey]].
## 1. Treat labels as API
Workflow input and output labels are not cosmetic. Planemo and IWC tests address workflow inputs and outputs by label, and the survey found exact label matches for every asserted output across 114 matched workflow/test pairs. A generated workflow should therefore pick stable, descriptive labels before test authoring starts.
Rules:
- Label every output that may need a test assertion.
- Treat input/output renames as breaking changes
...
Turn operator-derived abstract transforms into concrete Galaxy wiring, collection operations, or review requests.
Trigger: When a concrete step implements behavior traced to map, join, groupTuple, branch, mix, combine, or multiMap.
on-demand runtime verbatim corpus-observed deterministic 6.5 KB
- bundle
references/notes/nextflow-operators-to-galaxy-collection-recipes.md - source
content/research/nextflow-operators-to-galaxy-collection-recipes.md
Preview md
---
type: research
subtype: component
title: "Nextflow operators to Galaxy collection recipes"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
related_notes:
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[galaxy-collection-semantics]]"
- "[[galaxy-collection-tools]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[iwc-transformations-survey]]"
- "[[iwc-tabular-operations-survey]]"
- "[[galaxy-data-flow-draft-contract]]"
- "[[iwc-map-over-lifecycle-survey]]"
- "[[nextflow-patterns]]"
related_molds:
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[implement-galaxy-tool-step]]"
- "[[debug-galaxy-workflow-output]]"
sources:
- "https://github.com/galaxyproject/foundry/issues/53"
summary: "Classifies common Nextflow operators as Galaxy wiring, collection semantics, explicit steps, or review triggers."
---
# Nextflow Operators To Galaxy Collection Recipes
Most Nextflow operators are not Galaxy tools. Translate them first as source-side data-flow intent, then decide whether the Galaxy representation is simple wiring, collection semantics, an explicit Galaxy step, or a user-review checkpoint.
## Decision Vocabulary
| Label | Meaning |
|---|---|
| `channel-only rewiring` | The operator disappears into Galaxy connections, labels, branch wiring, or output selection. |
| `Galaxy collection semantics` | Translation relies on collection identifiers, collection type, map-over, reduction, or nesting behavior. |
| `explicit Galaxy step` | Add a collection-operation, tabular, text-processing, or domain tool step. |
| `user review` | Translation is likely lossy or semantically ambiguous. |
## Operator Recipes
| Nextflow operator | Galaxy recipe | Class | Confi
...
Check whether a concrete tool input/output can preserve the intended source-derived Galaxy collection shape.
Trigger: When implementing concrete steps for source-derived File/list/paired/list:paired/list:list inputs or outputs.
on-demand runtime verbatim corpus-observed deterministic 8.6 KB
- bundle
references/notes/nextflow-to-galaxy-channel-shape-mapping.md - source
content/research/nextflow-to-galaxy-channel-shape-mapping.md
Preview md
---
type: research
subtype: component
title: "Nextflow-to-Galaxy channel shape mapping"
tags:
- research/component
- source/nextflow
- target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-06
revision: 2
ai_generated: true
related_notes:
- "[[nextflow-workflow-io-semantics]]"
- "[[nextflow-params-to-galaxy-inputs]]"
- "[[nextflow-path-glob-to-galaxy-datatype]]"
- "[[galaxy-collection-semantics]]"
- "[[galaxy-collection-tools]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[iwc-transformations-survey]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
- "[[galaxy-data-flow-draft-contract]]"
- "[[iwc-conditionals-survey]]"
- "[[manifest-to-mapped-collection-lifecycle]]"
- "[[map-workflow-enum-to-tool-parameter]]"
- "[[regex-relabel-via-tabular]]"
- "[[relabel-via-rules-and-find-replace]]"
- "[[reshape-relabel-remap-by-collection-axis]]"
- "[[sync-collections-by-identifier]]"
- "[[tabular-compute-new-column]]"
- "[[tabular-concatenate-collection-to-table]]"
- "[[tabular-cut-and-reorder-columns]]"
- "[[tabular-filter-by-column-value]]"
- "[[tabular-filter-by-regex]]"
- "[[tabular-group-and-aggregate-with-datamash]]"
- "[[tabular-join-on-key]]"
- "[[tabular-pivot-collection-to-wide]]"
- "[[tabular-prepend-header]]"
- "[[tabular-relabel-by-row-counter]]"
- "[[tabular-split-taxonomy-string]]"
- "[[tabular-sql-query]]"
- "[[tabular-synthesize-bed-from-3col]]"
- "[[tabular-to-collection-by-row]]"
- "[[iwc-map-over-lifecycle-survey]]"
- "[[nextflow-patterns]]"
related_molds:
- "[[nextflow-summary-to-galaxy-interface]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[implement-galaxy-tool-step]]"
sources:
- "https://github.com/galaxyproject/foundry/
...