Ground collection reshape, relabel, cleanup, and map-over choices in corpus-observed Galaxy recipes.
Trigger: When CWL scatter, arrays, nested arrays, records, or secondary-file contracts require explicit Galaxy collection operations.
on-demand runtime verbatim corpus-observed deterministic 4.4 KB
- bundle
references/patterns/galaxy-collection-patterns.md - source
content/patterns/galaxy-collection-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: collection patterns"
aliases:
- "Galaxy collection pattern MOC"
- "collection transformation patterns"
- "IWC collection pattern map"
tags:
- pattern
- target/galaxy
- topic/galaxy-transform
- topic/collection-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy collection transformation patterns."
related_notes:
- "[[iwc-transformations-survey]]"
- "[[iwc-conditionals-survey]]"
related_patterns:
- "[[manifest-to-mapped-collection-lifecycle]]"
- "[[cleanup-sync-and-publish-nonempty-results]]"
- "[[reshape-relabel-remap-by-collection-axis]]"
- "[[fan-in-bundle-consume-and-flatten]]"
- "[[collection-cleanup-after-mapover-failure]]"
- "[[sync-collections-by-identifier]]"
- "[[harmonize-by-sortlist-from-identifiers]]"
- "[[regex-relabel-via-tabular]]"
- "[[relabel-via-rules-and-find-replace]]"
- "[[collection-swap-nesting-with-apply-rules]]"
- "[[collection-split-identifier-via-rules]]"
- "[[collection-build-list-paired-with-apply-rules]]"
- "[[tabular-to-collection-by-row]]"
- "[[tabular-concatenate-collection-to-table]]"
- "[[tabular-pivot-collection-to-wide]]"
related_molds:
- "[[implement-galaxy-tool-step]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[cwl-summary-to-galaxy-template]]"
- "[[freeform-summary-to-galaxy-template]]"
- "[[compare-against-iwc-exemplar]]"
---
# Galaxy: collection patterns
This is the runtime-facing map for Galaxy collection transformation choices. Use it before loading raw survey notes. The survey remains evidence backin
...
Ground conditional-branch and optional-step choices in curated, corpus-observed Galaxy when/pick_value patterns.
Trigger: When data-flow translation needs optional steps, gating on non-empty results, routing between alternative outputs, or transform-or-pass-through branches.
on-demand runtime verbatim corpus-observed deterministic 2.6 KB
- bundle
references/patterns/galaxy-conditionals-patterns.md - source
content/patterns/galaxy-conditionals-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: conditionals patterns"
aliases:
- "Galaxy conditional pattern MOC"
- "Galaxy when patterns"
- "conditional workflow patterns"
tags:
- pattern
- target/galaxy
- topic/galaxy-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy when and pick_value conditional patterns."
related_notes:
- "[[iwc-conditionals-survey]]"
related_patterns:
- "[[conditional-run-optional-step]]"
- "[[conditional-route-between-alternative-outputs]]"
- "[[conditional-gate-on-nonempty-result]]"
- "[[conditional-transform-or-pass-through]]"
- "[[collection-cleanup-after-mapover-failure]]"
related_molds:
- "[[implement-galaxy-tool-step]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[cwl-summary-to-galaxy-template]]"
- "[[freeform-summary-to-galaxy-template]]"
- "[[compare-against-iwc-exemplar]]"
---
# Galaxy: conditionals patterns
This is the runtime-facing map for Galaxy conditional workflow choices. Use it before loading raw survey notes. The survey remains evidence backing; the operation and recipe pages are the actionable references.
## Direct Gates
- [[conditional-run-optional-step]] — expose or derive a boolean, connect it as `inputs.when`, and use `when: $(inputs.when)` to skip optional steps.
- [[conditional-gate-on-nonempty-result]] — compute a boolean from empty/non-empty dataset or collection state before gating downstream reporting/export. The MGnify recipe is corpus-backed but clunky pending verified-pattern workflow work.
## Routes and Fallbacks
- [[conditional-route-between-alternati
...
Ground genomic-interval operation choices in curated, corpus-observed Galaxy interval recipes.
Trigger: When the workflow operates on genomic intervals (BED/GFF/VCF coordinate features) and data-flow translation needs overlap, merge, coverage, windowing, masking, or set-algebra steps.
on-demand runtime verbatim corpus-observed deterministic 5.3 KB
- bundle
references/patterns/galaxy-interval-patterns.md - source
content/patterns/galaxy-interval-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: genomic interval patterns"
aliases:
- "Galaxy interval pattern MOC"
- "genomic interval transformation patterns"
- "IWC interval pattern map"
tags:
- pattern
- target/galaxy
- topic/galaxy-transform
- topic/interval-transform
status: draft
created: 2026-06-10
revised: 2026-06-10
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy genomic interval operations and recipes on coordinate features."
related_notes:
- "[[iwc-interval-operations-survey]]"
related_patterns:
- "[[interval-overlap-filter]]"
- "[[interval-coverage]]"
- "[[interval-merge-overlapping]]"
- "[[interval-window-flank]]"
- "[[interval-consensus-by-multi-intersect]]"
- "[[interval-mask-by-set-algebra]]"
- "[[interval-windowed-coverage]]"
- "[[tabular-synthesize-bed-from-3col]]"
related_molds:
- "[[implement-galaxy-tool-step]]"
- "[[nextflow-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-template]]"
- "[[cwl-summary-to-galaxy-template]]"
- "[[paper-summary-to-galaxy-template]]"
- "[[compare-against-iwc-exemplar]]"
---
# Galaxy: genomic interval patterns
The runtime-facing map for Galaxy **coordinate-feature** choices — operations that understand `chrom/start/end/strand`, as opposed to opaque-column [[galaxy-tabular-patterns]] or container-shaped [[galaxy-collection-patterns]]. Use it before loading raw survey notes; [[iwc-interval-operations-survey]] is the evidence backing, these pages are the actionable references.
This is the smallest of the three data-shape MOCs by design. Interval algebra is a real but moderate cluster in IWC — concentrated in epigenetics peak-consensus and SARS-CoV-2 mask
...
Map CWL pickValue (first_non_null / the_only_non_null / all_non_null) on workflow outputs or step inputs into Galaxy's native `pick_value` workflow module added by galaxy#22222.
Trigger: When any summary-cwl edge `via` contains a `pickValue:*` marker, OR any workflow_outputs[].output_source is multi-valued with pickValue, OR any steps[].in[].pick_value is non-null in the source workflow or referenced subworkflows.
on-demand runtime verbatim corpus-observed deterministic 10.9 KB
- bundle
references/notes/cwl-pickvalue-to-galaxy.md - source
content/research/cwl-pickvalue-to-galaxy.md
Preview md
---
type: research
subtype: component
title: "CWL pickValue → Galaxy pick_value (post galaxy#22222)"
tags:
- research/component
- source/cwl
- target/galaxy
status: draft
created: 2026-05-11
revised: 2026-05-11
revision: 1
ai_generated: true
related_notes:
- "[[component-cwl-workflow-anatomy]]"
- "[[galaxy-data-flow-draft-contract]]"
- "[[galaxy-workflow-draft-format]]"
related_molds:
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[cwl-summary-to-galaxy-template]]"
summary: "CWL `pickValue` (first_non_null / the_only_non_null / all_non_null) → Galaxy's native `pick_value` workflow step added by galaxyproject/galaxy#22222."
---
# CWL `pickValue` → Galaxy `pick_value`
Audience: a Mold author who just saw a `pickValue:*` marker in a `summary-cwl.json` edge `via:` array (or a `WorkflowOutputParameter.output_source` multi-value carrying a `pickValue` hint) and needs to emit gxformat2.
## CWL `pickValue` — canonical semantics
Source: CWL v1.2 schema `Workflow.yml` (`PickValueMethod`) and the rendered spec at <https://www.commonwl.org/v1.2/Workflow.html#PickValueMethod>.
- **`first_non_null`** — "For the first level of a list input, pick the first non-null element. The result is a scalar. **It is an error if there is no non-null element.**"
- **`the_only_non_null`** — "For the first level of a list input, pick the single non-null element. The result is a scalar. **It is an error if there is more than one non-null element.**"
- **`all_non_null`** — "For the first level of a list input, pick all non-null values. **The result is a list, which may be empty.**"
Placement: declared on **both** `WorkflowStepInput` and `WorkflowOutputParameter` with identical semantics. Operates on the array produced when `source:` / `outputSource:` is multi-valued. First level only
...
Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics.
Trigger: When CWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.
on-demand runtime verbatim corpus-observed deterministic 1.9 KB
- bundle
references/notes/galaxy-collection-semantics.md - source
content/research/galaxy-collection-semantics.md
Preview md
---
type: research
subtype: component
title: "Galaxy collection semantics"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-04-30
revised: 2026-05-05
revision: 3
ai_generated: false
related_notes:
- "[[galaxy-xsd]]"
- "[[galaxy-collection-tools]]"
- "[[galaxy-apply-rules-dsl]]"
- "[[nextflow-to-galaxy-channel-shape-mapping]]"
- "[[nextflow-operators-to-galaxy-collection-recipes]]"
- "[[galaxy-tool-job-failure-reference]]"
- "[[galaxy-workflow-invocation-failure-reference]]"
- "[[iwc-transformations-survey]]"
- "[[galaxy-discover-datasets]]"
sources:
- "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/lib/galaxy/model/dataset_collections/types/collection_semantics.yml"
companions:
- "galaxy-collection-semantics.yml"
- "galaxy-collection-semantics.upstream.myst"
summary: "Vendored formal spec of Galaxy dataset-collection mapping/reduction semantics, with labeled examples and pinned test references."
---
> **Vendored from upstream**, pinned at SHA `7765fae`. Two files live next to this note:
>
> - `galaxy-collection-semantics.yml` — the structured source. **Agents and casting should consume this.** It carries the `tests:` blocks that pin concrete Galaxy test names; the rendered upstream view drops them.
> - `galaxy-collection-semantics.upstream.myst` — Galaxy's auto-generated MyST/LaTeX rendering of the YAML, vendored only so the human view below has something to render. Sync is manual.
>
> **When to consult:** authoring or reasoning about Molds and patterns that touch `data_collection` inputs, map-over / reduction shape changes, sub-collection mapping, `paired_or_unpaired`, or `sample_sheet`.
```vendored-myst
file: galaxy-collection-semantics.upstream.myst
source: https://github.com/g
...
Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics.
Trigger: When CWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.
on-demand runtime verbatim corpus-observed deterministic 33.4 KB
- bundle
references/notes/galaxy-collection-semantics.upstream.myst - source
content/research/galaxy-collection-semantics.upstream.myst
Preview myst
# Collection Semantics
This document describes the semantics around working with Galaxy dataset collections.
In particular it describes how they operate within Galaxy tools and workflows.
:::{admonition} You Probably Don't Need to Read This
:class: caution
Any significantly sophisticated workflow language will have ways to collect data
into arrays or vectors or dictionaries and apply operations across this data (mapping)
or reduce the dimensionality of this data (reductions). Typically, this is explicitly
annotated with map functions or for loops. Galaxy however is designed to be a point
and click interface for connecting steps and running tools. It is important that steps
just connect and just do the most natural thing - and this is what Galaxy does.
This document just provides a mathematical formalism to that "what should just
intuitively work" that can be used to document test cases and help with implementation.
This is reference documentation not user documentation, Galaxy should just work.
:::
## Mapping
If a tool consumes a simple dataset parameter and produces a simple dataset parameter,
then any collection type may be "mapped over" the data input to that tool. The result of
that is the tool being applied to each element of the collection and "implicit collections"
being created from the outputs that are produced from those operations. Those implicit
collections have the same element identifiers in the same order as the input collection that is
mapped over. Each element of the implicit collections correspond to their own job and
Galaxy very naturally and intuitively parallelizes jobs without extra work from the user
and without any knowledge of the tool.
(BASIC_MAPPING_PAIRED)=
(BASIC_MAPPING_PAIRED_OR_UNPAIRED_PAIRED)=
(BASIC_MAPPING_PAIRED_OR_UNPAIRED_UN
...
Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics.
Trigger: When CWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.
on-demand runtime verbatim corpus-observed deterministic 43.8 KB
- bundle
references/notes/galaxy-collection-semantics.yml - source
content/research/galaxy-collection-semantics.yml
Preview yml
- doc: |
# Collection Semantics
This document describes the semantics around working with Galaxy dataset collections.
In particular it describes how they operate within Galaxy tools and workflows.
:::{admonition} You Probably Don't Need to Read This
:class: caution
Any significantly sophisticated workflow language will have ways to collect data
into arrays or vectors or dictionaries and apply operations across this data (mapping)
or reduce the dimensionality of this data (reductions). Typically, this is explicitly
annotated with map functions or for loops. Galaxy however is designed to be a point
and click interface for connecting steps and running tools. It is important that steps
just connect and just do the most natural thing - and this is what Galaxy does.
This document just provides a mathematical formalism to that "what should just
intuitively work" that can be used to document test cases and help with implementation.
This is reference documentation not user documentation, Galaxy should just work.
:::
## Mapping
If a tool consumes a simple dataset parameter and produces a simple dataset parameter,
then any collection type may be "mapped over" the data input to that tool. The result of
that is the tool being applied to each element of the collection and "implicit collections"
being created from the outputs that are produced from those operations. Those implicit
collections have the same element identifiers in the same order as the input collection that is
mapped over. Each element of the implicit collections correspond to their own job and
Galaxy very naturally and intuitively parallelizes jobs without extra work from the user
and without any knowledge of the tool.
...
When the interface brief adopted a `paired_or_unpaired` shape, model inner-tool branching via `has_single_item` semantics instead of a Galaxy-level mode switch.
Trigger: When the preceding cwl-galaxy-interface brief uses `paired_or_unpaired` (or `list:paired_or_unpaired`) as a workflow input, OR the data-flow brief is considering it as an option.
on-demand runtime verbatim corpus-observed deterministic 8.3 KB
- bundle
references/notes/galaxy-paired-or-unpaired-collections.md - source
content/research/galaxy-paired-or-unpaired-collections.md
Preview md
---
type: research
subtype: component
title: "Galaxy paired_or_unpaired collection type"
tags:
- research/component
- target/galaxy
status: draft
created: 2026-05-11
revised: 2026-05-11
revision: 1
ai_generated: true
related_notes:
- "[[galaxy-collection-semantics]]"
- "[[component-cwl-workflow-anatomy]]"
related_molds:
- "[[cwl-summary-to-galaxy-interface]]"
- "[[cwl-summary-to-galaxy-data-flow]]"
- "[[nextflow-summary-to-galaxy-interface]]"
summary: "Galaxy's `paired_or_unpaired` collection type: discriminated-union shape for paired-or-single reads, no workflow-level mode switch needed. Galaxy PR #19377."
---
# Galaxy `paired_or_unpaired` collections
Audience: a Mold author shaping a Galaxy workflow interface from an upstream (CWL / Nextflow / paper) source whose reads can be paired-end *or* single-end *or* a mixed batch of both.
## The shape
`paired_or_unpaired` is a Galaxy collection type modeling a **discriminated union of 1 or 2 elements**:
- **Unpaired variant** — one element with identifier `unpaired`.
- **Paired variant** — two elements with identifiers `forward` and `reverse`.
`list:paired_or_unpaired` lifts the same shape to a *heterogeneous* batch where some samples are paired and some are single-end — a representation that did not exist before this type. A `list:paired` forces every sample to be paired; a plain `list` of flat datasets loses pairing structure.
The type and rank `paired_or_unpaired` may occur at any rank within nested types (`list:paired_or_unpaired`, `list:list:paired_or_unpaired`) but **only at the deepest (innermost) rank** — the subtyping logic is implemented at the suffix level. See "Limitation: only deepest rank" below.
## When to reach for it (decision rule for translators)
Reach for `paired_or_unpaired` when the
...