Claude skill · cast

cwl-summary-to-galaxy-data-flow

Translate a CWL summary into a Galaxy data-flow design brief.

← All cast skills · Source mold →

Install

/plugin marketplace add galaxyproject/foundry
/plugin install foundry-skills@galaxy-workflow-foundry

Then invoke as:

/foundry-skills:cwl-summary-to-galaxy-data-flow

Skill Bundle

/ packaged cast
attached files
12
upfront
4
on demand
8
cast rev
n/a
validated
0

Produces: 1 artifact.

Consumes: 2 artifacts.

Artifact Contract

/ skill handoff

Produces

cwl-galaxy-data-flow

Reviewable Markdown brief: abstract topology, Galaxy collection semantics, placeholder transformations, unresolved Galaxy tool needs.

markdowncwl-galaxy-data-flow.md
Raw artifact contract
{
  "id": "cwl-galaxy-data-flow",
  "kind": "markdown",
  "default_filename": "cwl-galaxy-data-flow.md",
  "description": "Reviewable Markdown brief: abstract topology, Galaxy collection semantics, placeholder transformations, unresolved Galaxy tool needs."
}

Consumes

summary-cwl

Structured CWL summary emitted by [[summarize-cwl]]; consumed alongside the Galaxy interface brief.

Raw artifact contract
{
  "id": "summary-cwl",
  "description": "Structured CWL summary emitted by [[summarize-cwl]]; consumed alongside the Galaxy interface brief.",
  "inherited_schema": "[[summary-cwl]]",
  "producers": [
    "summarize-cwl"
  ]
}

cwl-galaxy-interface

Preceding Galaxy interface brief from [[cwl-summary-to-galaxy-interface]] that pins inputs, outputs, and labels.

Raw artifact contract
{
  "id": "cwl-galaxy-interface",
  "description": "Preceding Galaxy interface brief from [[cwl-summary-to-galaxy-interface]] that pins inputs, outputs, and labels.",
  "producers": [
    "cwl-summary-to-galaxy-interface"
  ]
}

Attached Files

/ runtime references

Load upfront

research

component-cwl-workflow-anatomy

packaged

Use CWL's native graph and mark only the features that need Galaxy reinterpretation.

upfront runtime verbatim hypothesis deterministic 5.5 KB
bundle
references/notes/component-cwl-workflow-anatomy.md
source
content/research/component-cwl-workflow-anatomy.md
Preview md
---
type: research
subtype: component
title: "CWL workflow anatomy"
tags:
  - research/component
  - source/cwl
status: draft
created: 2026-05-10
revised: 2026-05-10
revision: 1
ai_generated: true
related_notes:
  - "[[summary-cwl]]"
  - "[[cwl-v1.2-schemas]]"
  - "[[galaxy-collection-semantics]]"
related_molds:
  - "[[summarize-cwl]]"
  - "[[cwl-summary-to-galaxy-interface]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-template]]"
sources:
  - "https://www.commonwl.org/v1.2/Workflow.html"
  - "https://cwltool.readthedocs.io/en/stable/"
  - "https://github.com/common-workflow-language/cwl-utils#normalize-a-cwl-document"
  - "https://pypi.org/project/cwl-utils/"
  - "https://github.com/common-workflow-language/cwldep"
summary: "CWL structure relevant to summarize-cwl: normalized documents, steps, scatter, conditionals, requirements, and dependency handling."
---

# CWL Workflow Anatomy

CWL is a structured workflow language, not a pipeline framework that must be inferred from ecosystem conventions. The `summarize-cwl` Mold should therefore start from CWL's own validated object model and avoid recreating the heavy Nextflow extraction stack.

## Normalization Posture

Use `cwltool --validate` as the first gate. If validation fails, the summary should emit provenance plus validation diagnostics and stop before producing downstream-looking graph claims.

Use `cwl-normalizer` from `cwl-utils` as the default normalization surface. The cwl-utils README describes it as producing JSON CWL documents with dependencies packed together, upgrading to CWL v1.2 as needed, and optionally refactoring CWL expressions into separate steps. This is the right handoff for `summarize-cwl`: structured enough for extraction, still source-faithful, and not a Galaxy design
...
research

cwl-when-pickvalue-to-galaxy-branching

packaged

Default reference for translating CWL when:/pickValue branching: pick among `paired_or_unpaired` collection input, native `pick_value` workflow step, or sibling workflows per mode.

upfront runtime verbatim corpus-observed deterministic 7.7 KB
bundle
references/notes/cwl-when-pickvalue-to-galaxy-branching.md
source
content/research/cwl-when-pickvalue-to-galaxy-branching.md
Preview md
---
type: research
subtype: design-spec
title: "CWL when:/pickValue → Galaxy branching translation"
tags:
  - research/design-spec
  - source/cwl
  - target/galaxy
status: draft
created: 2026-05-11
revised: 2026-05-11
revision: 1
ai_generated: true
related_notes:
  - "[[cwl-pickvalue-to-galaxy]]"
  - "[[galaxy-paired-or-unpaired-collections]]"
  - "[[galaxy-collection-semantics]]"
  - "[[component-cwl-workflow-anatomy]]"
  - "[[galaxy-data-flow-draft-contract]]"
related_molds:
  - "[[cwl-summary-to-galaxy-interface]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
summary: "CWL `when:`/`pickValue` → Galaxy. Three honest translations (paired_or_unpaired input, native pick_value step, sibling workflows) plus how to pick among them."
---

# CWL `when:`/`pickValue` → Galaxy branching translation

Audience: a Mold author looking at a `summary-cwl.json` whose steps carry `when:` predicates and/or whose workflow outputs use `pickValue`, deciding which Galaxy translation to recommend.

## The three honest translations

CWL has two related branching mechanisms with no 1:1 gxformat2 equivalent (until galaxy#22222 — see `cwl-pickvalue-to-galaxy`):

- **`when:` on a step** — execute conditionally on a JS predicate.
- **`pickValue:` on a step input or workflow output** — fan in N candidate sources and pick `first_non_null` / `the_only_non_null` / `all_non_null`.

Three Galaxy-idiomatic translations are available; each is honest for a different source shape.

### Translation A — `paired_or_unpaired` collection (preferred when the discriminator is paired-vs-single)

When the CWL `when:` predicates discriminate the **paired-vs-single mode of read inputs** (the seqprep-subwf pattern: `single_reads: File?` trigger
...
research

galaxy-data-flow-draft-contract

packaged

Keep the data-flow brief separate from gxformat2 templating and concrete step implementation.

upfront runtime verbatim hypothesis deterministic 6.5 KB
bundle
references/notes/galaxy-data-flow-draft-contract.md
source
content/research/galaxy-data-flow-draft-contract.md
Preview md
---
type: research
subtype: design-spec
title: "Galaxy data-flow draft contract"
tags:
  - research/design-spec
  - target/galaxy
status: draft
created: 2026-05-02
revised: 2026-05-03
revision: 2
ai_generated: true
related_notes:
  - "[[nextflow-to-galaxy-channel-shape-mapping]]"
  - "[[nextflow-operators-to-galaxy-collection-recipes]]"
  - "[[galaxy-workflow-draft]]"
related_molds:
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[freeform-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[freeform-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
  - "[[advance-galaxy-draft-step]]"
sources:
  - "https://github.com/galaxyproject/foundry/issues/54"
summary: "Defines the proposed boundary between Galaxy data-flow drafts, gxformat2 templates, and concrete step implementation."
---

# Galaxy Data-Flow Draft Contract

This is an architectural contract, not a schema. Evidence is strongest for Mold and Pipeline boundaries. Proposed fields are speculative until exercised by two or three worked translations.

## Boundary

The data-flow draft owns a target-shaped abstract DAG for Galaxy. It should not be valid `gxformat2` and should not resolve exact Tool Shed tools.

Data-flow draft owns:

- Galaxy-facing workflow inputs and outputs.
- Abstract nodes, edges, branches, collection mapping, collection reduction, and placeholder transformations.
- Input/output shape decisions such as `File`, `list`, `paired`, `list:paired`, or `list:list`.
- Conceptual Galaxy idioms: map-over, reduction, Apply Rules, collection cleanup, identifier synchronization, tabular bridge.
- Abstract unresolved tool needs with input and output shapes.
- Confidence and rat
...
schema

summary-cwl

packaged

Read CWL step graph, edge markers, scatter, conditionals, secondary files, and tool requirements while drafting Galaxy-facing data flow.

upfront runtime verbatim cast-validated deterministic 19.3 KB
bundle
references/schemas/summary-cwl.schema.json
source
package://@galaxy-foundry/foundry#summaryCwlSchema
Preview json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://galaxyproject.org/foundry/schemas/summary-cwl.schema.json",
  "$comment": "Canonical source: packages/foundry/src/schemas/summary-cwl/summary-cwl.schema.json in galaxyproject/foundry. Mold frontmatter cites this schema via [[summary-cwl]] wiki-links; the cast pipeline imports the `summaryCwlSchema` runtime export and serializes it into cast bundles.",
  "title": "CWL Workflow Summary",
  "description": "Structured per-source summary emitted by the summarize-cwl Mold. CWL is already a typed workflow language, so this schema records validated and normalized workflow/tool structure rather than inferred pipeline semantics.",
  "type": "object",
  "additionalProperties": false,
  "required": [
    "summary_version",
    "source",
    "documents",
    "workflow_inputs",
    "workflow_outputs",
    "steps",
    "tools",
    "graph",
    "tests",
    "warnings"
  ],
  "properties": {
    "summary_version": {
      "type": "string",
      "enum": [
        "1"
      ],
      "description": "Summary schema major version."
    },
    "source": {
      "$ref": "#/$defs/SourceRecord"
    },
    "documents": {
      "$ref": "#/$defs/DocumentSet"
    },
    "workflow_inputs": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/WorkflowInput"
      }
    },
    "workflow_outputs": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/WorkflowOutput"
      }
    },
    "steps": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/WorkflowStep"
      }
    },
    "tools": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/CommandLineTool"
      }
    },
    "graph": {
      "$ref": "#/$defs/WorkflowGraph"
    },
    "tests": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/TestCase"
      }
    },
    "warnings": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/Warning"
      }
    }
  },
  "$defs": {
    "SourceRecord": {
      "type": "object",
      "additionalProperties": false,
      "required": [
        "ecosystem",
        "workflow",
        "url",
        "version",
        "license",
        "slug",
        "cwl_version",
        "entrypoint"
      ],
      "properties": {
        "ecosystem": {
          "type": "string",
          "enum": [
            "cwl"
          ],
          "description": 
...

Load on demand

pattern

galaxy-collection-patterns

packaged

Ground collection reshape, relabel, cleanup, and map-over choices in corpus-observed Galaxy recipes.

Trigger: When CWL scatter, arrays, nested arrays, records, or secondary-file contracts require explicit Galaxy collection operations.

on-demand runtime verbatim corpus-observed deterministic 4.4 KB
bundle
references/patterns/galaxy-collection-patterns.md
source
content/patterns/galaxy-collection-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: collection patterns"
aliases:
  - "Galaxy collection pattern MOC"
  - "collection transformation patterns"
  - "IWC collection pattern map"
tags:
  - pattern
  - target/galaxy
  - topic/galaxy-transform
  - topic/collection-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy collection transformation patterns."
related_notes:
  - "[[iwc-transformations-survey]]"
  - "[[iwc-conditionals-survey]]"
related_patterns:
  - "[[manifest-to-mapped-collection-lifecycle]]"
  - "[[cleanup-sync-and-publish-nonempty-results]]"
  - "[[reshape-relabel-remap-by-collection-axis]]"
  - "[[fan-in-bundle-consume-and-flatten]]"
  - "[[collection-cleanup-after-mapover-failure]]"
  - "[[sync-collections-by-identifier]]"
  - "[[harmonize-by-sortlist-from-identifiers]]"
  - "[[regex-relabel-via-tabular]]"
  - "[[relabel-via-rules-and-find-replace]]"
  - "[[collection-swap-nesting-with-apply-rules]]"
  - "[[collection-split-identifier-via-rules]]"
  - "[[collection-build-list-paired-with-apply-rules]]"
  - "[[tabular-to-collection-by-row]]"
  - "[[tabular-concatenate-collection-to-table]]"
  - "[[tabular-pivot-collection-to-wide]]"
related_molds:
  - "[[implement-galaxy-tool-step]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[freeform-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
---

# Galaxy: collection patterns

This is the runtime-facing map for Galaxy collection transformation choices. Use it before loading raw survey notes. The survey remains evidence backin
...
pattern

galaxy-conditionals-patterns

packaged

Ground conditional-branch and optional-step choices in curated, corpus-observed Galaxy when/pick_value patterns.

Trigger: When data-flow translation needs optional steps, gating on non-empty results, routing between alternative outputs, or transform-or-pass-through branches.

on-demand runtime verbatim corpus-observed deterministic 2.6 KB
bundle
references/patterns/galaxy-conditionals-patterns.md
source
content/patterns/galaxy-conditionals-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: conditionals patterns"
aliases:
  - "Galaxy conditional pattern MOC"
  - "Galaxy when patterns"
  - "conditional workflow patterns"
tags:
  - pattern
  - target/galaxy
  - topic/galaxy-transform
status: draft
created: 2026-05-02
revised: 2026-05-02
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy when and pick_value conditional patterns."
related_notes:
  - "[[iwc-conditionals-survey]]"
related_patterns:
  - "[[conditional-run-optional-step]]"
  - "[[conditional-route-between-alternative-outputs]]"
  - "[[conditional-gate-on-nonempty-result]]"
  - "[[conditional-transform-or-pass-through]]"
  - "[[collection-cleanup-after-mapover-failure]]"
related_molds:
  - "[[implement-galaxy-tool-step]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[freeform-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
---

# Galaxy: conditionals patterns

This is the runtime-facing map for Galaxy conditional workflow choices. Use it before loading raw survey notes. The survey remains evidence backing; the operation and recipe pages are the actionable references.

## Direct Gates

- [[conditional-run-optional-step]] — expose or derive a boolean, connect it as `inputs.when`, and use `when: $(inputs.when)` to skip optional steps.
- [[conditional-gate-on-nonempty-result]] — compute a boolean from empty/non-empty dataset or collection state before gating downstream reporting/export. The MGnify recipe is corpus-backed but clunky pending verified-pattern workflow work.

## Routes and Fallbacks

- [[conditional-route-between-alternati
...
pattern

galaxy-interval-patterns

packaged

Ground genomic-interval operation choices in curated, corpus-observed Galaxy interval recipes.

Trigger: When the workflow operates on genomic intervals (BED/GFF/VCF coordinate features) and data-flow translation needs overlap, merge, coverage, windowing, masking, or set-algebra steps.

on-demand runtime verbatim corpus-observed deterministic 5.3 KB
bundle
references/patterns/galaxy-interval-patterns.md
source
content/patterns/galaxy-interval-patterns.md
Preview md
---
type: pattern
pattern_kind: moc
evidence: corpus-observed
title: "Galaxy: genomic interval patterns"
aliases:
  - "Galaxy interval pattern MOC"
  - "genomic interval transformation patterns"
  - "IWC interval pattern map"
tags:
  - pattern
  - target/galaxy
  - topic/galaxy-transform
  - topic/interval-transform
status: draft
created: 2026-06-10
revised: 2026-06-10
revision: 1
ai_generated: true
summary: "Use this MOC to choose corpus-grounded Galaxy genomic interval operations and recipes on coordinate features."
related_notes:
  - "[[iwc-interval-operations-survey]]"
related_patterns:
  - "[[interval-overlap-filter]]"
  - "[[interval-coverage]]"
  - "[[interval-merge-overlapping]]"
  - "[[interval-window-flank]]"
  - "[[interval-consensus-by-multi-intersect]]"
  - "[[interval-mask-by-set-algebra]]"
  - "[[interval-windowed-coverage]]"
  - "[[tabular-synthesize-bed-from-3col]]"
related_molds:
  - "[[implement-galaxy-tool-step]]"
  - "[[nextflow-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-template]]"
  - "[[cwl-summary-to-galaxy-template]]"
  - "[[paper-summary-to-galaxy-template]]"
  - "[[compare-against-iwc-exemplar]]"
---

# Galaxy: genomic interval patterns

The runtime-facing map for Galaxy **coordinate-feature** choices — operations that understand `chrom/start/end/strand`, as opposed to opaque-column [[galaxy-tabular-patterns]] or container-shaped [[galaxy-collection-patterns]]. Use it before loading raw survey notes; [[iwc-interval-operations-survey]] is the evidence backing, these pages are the actionable references.

This is the smallest of the three data-shape MOCs by design. Interval algebra is a real but moderate cluster in IWC — concentrated in epigenetics peak-consensus and SARS-CoV-2 mask
...
research

cwl-pickvalue-to-galaxy

packaged

Map CWL pickValue (first_non_null / the_only_non_null / all_non_null) on workflow outputs or step inputs into Galaxy's native `pick_value` workflow module added by galaxy#22222.

Trigger: When any summary-cwl edge `via` contains a `pickValue:*` marker, OR any workflow_outputs[].output_source is multi-valued with pickValue, OR any steps[].in[].pick_value is non-null in the source workflow or referenced subworkflows.

on-demand runtime verbatim corpus-observed deterministic 10.9 KB
bundle
references/notes/cwl-pickvalue-to-galaxy.md
source
content/research/cwl-pickvalue-to-galaxy.md
Preview md
---
type: research
subtype: component
title: "CWL pickValue → Galaxy pick_value (post galaxy#22222)"
tags:
  - research/component
  - source/cwl
  - target/galaxy
status: draft
created: 2026-05-11
revised: 2026-05-11
revision: 1
ai_generated: true
related_notes:
  - "[[component-cwl-workflow-anatomy]]"
  - "[[galaxy-data-flow-draft-contract]]"
  - "[[galaxy-workflow-draft-format]]"
related_molds:
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[cwl-summary-to-galaxy-template]]"
summary: "CWL `pickValue` (first_non_null / the_only_non_null / all_non_null) → Galaxy's native `pick_value` workflow step added by galaxyproject/galaxy#22222."
---

# CWL `pickValue` → Galaxy `pick_value`

Audience: a Mold author who just saw a `pickValue:*` marker in a `summary-cwl.json` edge `via:` array (or a `WorkflowOutputParameter.output_source` multi-value carrying a `pickValue` hint) and needs to emit gxformat2.

## CWL `pickValue` — canonical semantics

Source: CWL v1.2 schema `Workflow.yml` (`PickValueMethod`) and the rendered spec at <https://www.commonwl.org/v1.2/Workflow.html#PickValueMethod>.

- **`first_non_null`** — "For the first level of a list input, pick the first non-null element. The result is a scalar. **It is an error if there is no non-null element.**"
- **`the_only_non_null`** — "For the first level of a list input, pick the single non-null element. The result is a scalar. **It is an error if there is more than one non-null element.**"
- **`all_non_null`** — "For the first level of a list input, pick all non-null values. **The result is a list, which may be empty.**"

Placement: declared on **both** `WorkflowStepInput` and `WorkflowOutputParameter` with identical semantics. Operates on the array produced when `source:` / `outputSource:` is multi-valued. First level only
...
research

galaxy-collection-semantics

packaged

Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics.

Trigger: When CWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.

on-demand runtime verbatim corpus-observed deterministic 1.9 KB
bundle
references/notes/galaxy-collection-semantics.md
source
content/research/galaxy-collection-semantics.md
Preview md
---
type: research
subtype: component
title: "Galaxy collection semantics"
tags:
  - research/component
  - target/galaxy
status: draft
created: 2026-04-30
revised: 2026-05-05
revision: 3
ai_generated: false
related_notes:
  - "[[galaxy-xsd]]"
  - "[[galaxy-collection-tools]]"
  - "[[galaxy-apply-rules-dsl]]"
  - "[[nextflow-to-galaxy-channel-shape-mapping]]"
  - "[[nextflow-operators-to-galaxy-collection-recipes]]"
  - "[[galaxy-tool-job-failure-reference]]"
  - "[[galaxy-workflow-invocation-failure-reference]]"
  - "[[iwc-transformations-survey]]"
  - "[[galaxy-discover-datasets]]"
sources:
  - "https://github.com/galaxyproject/galaxy/blob/7765fae934fbfdee77e3be5f5b235e43735273ae/lib/galaxy/model/dataset_collections/types/collection_semantics.yml"
companions:
  - "galaxy-collection-semantics.yml"
  - "galaxy-collection-semantics.upstream.myst"
summary: "Vendored formal spec of Galaxy dataset-collection mapping/reduction semantics, with labeled examples and pinned test references."
---

> **Vendored from upstream**, pinned at SHA `7765fae`. Two files live next to this note:
>
> - `galaxy-collection-semantics.yml` — the structured source. **Agents and casting should consume this.** It carries the `tests:` blocks that pin concrete Galaxy test names; the rendered upstream view drops them.
> - `galaxy-collection-semantics.upstream.myst` — Galaxy's auto-generated MyST/LaTeX rendering of the YAML, vendored only so the human view below has something to render. Sync is manual.
>
> **When to consult:** authoring or reasoning about Molds and patterns that touch `data_collection` inputs, map-over / reduction shape changes, sub-collection mapping, `paired_or_unpaired`, or `sample_sheet`.

```vendored-myst
file: galaxy-collection-semantics.upstream.myst
source: https://github.com/g
...
research

galaxy-collection-semantics

packaged

Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics.

Trigger: When CWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.

on-demand runtime verbatim corpus-observed deterministic 33.4 KB
bundle
references/notes/galaxy-collection-semantics.upstream.myst
source
content/research/galaxy-collection-semantics.upstream.myst
Preview myst
# Collection Semantics

This document describes the semantics around working with Galaxy dataset collections.
In particular it describes how they operate within Galaxy tools and workflows.

:::{admonition} You Probably Don't Need to Read This
:class: caution

Any significantly sophisticated workflow language will have ways to collect data
into arrays or vectors or dictionaries and apply operations across this data (mapping)
or reduce the dimensionality of this data (reductions). Typically, this is explicitly
annotated with map functions or for loops. Galaxy however is designed to be a point
and click interface for connecting steps and running tools. It is important that steps
just connect and just do the most natural thing - and this is what Galaxy does.
This document just provides a mathematical formalism to that "what should just
intuitively work" that can be used to document test cases and help with implementation.
This is reference documentation not user documentation, Galaxy should just work.
:::

## Mapping

If a tool consumes a simple dataset parameter and produces a simple dataset parameter,
then any collection type may be "mapped over" the data input to that tool. The result of
that is the tool being applied to each element of the collection and "implicit collections"
being created from the outputs that are produced from those operations. Those implicit
collections have the same element identifiers in the same order as the input collection that is
mapped over. Each element of the implicit collections correspond to their own job and
Galaxy very naturally and intuitively parallelizes jobs without extra work from the user
and without any knowledge of the tool.


(BASIC_MAPPING_PAIRED)=
(BASIC_MAPPING_PAIRED_OR_UNPAIRED_PAIRED)=
(BASIC_MAPPING_PAIRED_OR_UNPAIRED_UN
...
research

galaxy-collection-semantics

packaged

Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics.

Trigger: When CWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.

on-demand runtime verbatim corpus-observed deterministic 43.8 KB
bundle
references/notes/galaxy-collection-semantics.yml
source
content/research/galaxy-collection-semantics.yml
Preview yml
- doc: |
    # Collection Semantics

    This document describes the semantics around working with Galaxy dataset collections.
    In particular it describes how they operate within Galaxy tools and workflows.

    :::{admonition} You Probably Don't Need to Read This
    :class: caution

    Any significantly sophisticated workflow language will have ways to collect data
    into arrays or vectors or dictionaries and apply operations across this data (mapping)
    or reduce the dimensionality of this data (reductions). Typically, this is explicitly
    annotated with map functions or for loops. Galaxy however is designed to be a point
    and click interface for connecting steps and running tools. It is important that steps
    just connect and just do the most natural thing - and this is what Galaxy does.
    This document just provides a mathematical formalism to that "what should just
    intuitively work" that can be used to document test cases and help with implementation.
    This is reference documentation not user documentation, Galaxy should just work.
    :::

    ## Mapping

    If a tool consumes a simple dataset parameter and produces a simple dataset parameter,
    then any collection type may be "mapped over" the data input to that tool. The result of
    that is the tool being applied to each element of the collection and "implicit collections"
    being created from the outputs that are produced from those operations. Those implicit
    collections have the same element identifiers in the same order as the input collection that is
    mapped over. Each element of the implicit collections correspond to their own job and
    Galaxy very naturally and intuitively parallelizes jobs without extra work from the user
    and without any knowledge of the tool.

...
research

galaxy-paired-or-unpaired-collections

packaged

When the interface brief adopted a `paired_or_unpaired` shape, model inner-tool branching via `has_single_item` semantics instead of a Galaxy-level mode switch.

Trigger: When the preceding cwl-galaxy-interface brief uses `paired_or_unpaired` (or `list:paired_or_unpaired`) as a workflow input, OR the data-flow brief is considering it as an option.

on-demand runtime verbatim corpus-observed deterministic 8.3 KB
bundle
references/notes/galaxy-paired-or-unpaired-collections.md
source
content/research/galaxy-paired-or-unpaired-collections.md
Preview md
---
type: research
subtype: component
title: "Galaxy paired_or_unpaired collection type"
tags:
  - research/component
  - target/galaxy
status: draft
created: 2026-05-11
revised: 2026-05-11
revision: 1
ai_generated: true
related_notes:
  - "[[galaxy-collection-semantics]]"
  - "[[component-cwl-workflow-anatomy]]"
related_molds:
  - "[[cwl-summary-to-galaxy-interface]]"
  - "[[cwl-summary-to-galaxy-data-flow]]"
  - "[[nextflow-summary-to-galaxy-interface]]"
summary: "Galaxy's `paired_or_unpaired` collection type: discriminated-union shape for paired-or-single reads, no workflow-level mode switch needed. Galaxy PR #19377."
---

# Galaxy `paired_or_unpaired` collections

Audience: a Mold author shaping a Galaxy workflow interface from an upstream (CWL / Nextflow / paper) source whose reads can be paired-end *or* single-end *or* a mixed batch of both.

## The shape

`paired_or_unpaired` is a Galaxy collection type modeling a **discriminated union of 1 or 2 elements**:

- **Unpaired variant** — one element with identifier `unpaired`.
- **Paired variant** — two elements with identifiers `forward` and `reverse`.

`list:paired_or_unpaired` lifts the same shape to a *heterogeneous* batch where some samples are paired and some are single-end — a representation that did not exist before this type. A `list:paired` forces every sample to be paired; a plain `list` of flat datasets loses pairing structure.

The type and rank `paired_or_unpaired` may occur at any rank within nested types (`list:paired_or_unpaired`, `list:list:paired_or_unpaired`) but **only at the deepest (innermost) rank** — the subtyping logic is implemented at the suffix level. See "Limitation: only deepest rank" below.

## When to reach for it (decision rule for translators)

Reach for `paired_or_unpaired` when the
...

SKILL.md


# cwl-summary-to-galaxy-data-flow

Follow the procedure below and use the artifact/reference sections as the runtime contract.

## When To Use

- Translate a CWL summary into a Galaxy data-flow design brief.

## Inputs

- Read artifact `summary-cwl`. Schema: summary-cwl. Produced by `summarize-cwl`. Structured CWL summary emitted by summarize-cwl; consumed alongside the Galaxy interface brief.
- Read artifact `cwl-galaxy-interface`. Produced by `cwl-summary-to-galaxy-interface`. Preceding Galaxy interface brief from cwl-summary-to-galaxy-interface that pins inputs, outputs, and labels.

## Outputs

- Write artifact `cwl-galaxy-data-flow` as `cwl-galaxy-data-flow.md`. Format: `markdown`. Reviewable Markdown brief: abstract topology, Galaxy collection semantics, placeholder transformations, unresolved Galaxy tool needs.

## Required Tools

- None declared. Procedure should not assume external CLIs are present.

## Load Upfront

- `references/notes/component-cwl-workflow-anatomy.md`: Research note copied verbatim into the bundle. Use CWL's native graph and mark only the features that need Galaxy reinterpretation.
- `references/notes/cwl-when-pickvalue-to-galaxy-branching.md`: Research note copied verbatim into the bundle. Default reference for translating CWL when:/pickValue branching: pick among `paired_or_unpaired` collection input, native `pick_value` workflow step, or sibling workflows per mode.
- `references/notes/galaxy-data-flow-draft-contract.md`: Research note copied verbatim into the bundle. Keep the data-flow brief separate from gxformat2 templating and concrete step implementation.
- `references/schemas/summary-cwl.schema.json`: Schema file copied verbatim into the bundle. Read CWL step graph, edge markers, scatter, conditionals, secondary files, and tool requirements while drafting Galaxy-facing data flow.

## Load On Demand

- `references/patterns/galaxy-collection-patterns.md`: Pattern note copied verbatim into the bundle. Ground collection reshape, relabel, cleanup, and map-over choices in corpus-observed Galaxy recipes. Use when: cWL scatter, arrays, nested arrays, records, or secondary-file contracts require explicit Galaxy collection operations.
- `references/patterns/galaxy-conditionals-patterns.md`: Pattern note copied verbatim into the bundle. Ground conditional-branch and optional-step choices in curated, corpus-observed Galaxy when/pick_value patterns. Use when: data-flow translation needs optional steps, gating on non-empty results, routing between alternative outputs, or transform-or-pass-through branches.
- `references/patterns/galaxy-interval-patterns.md`: Pattern note copied verbatim into the bundle. Ground genomic-interval operation choices in curated, corpus-observed Galaxy interval recipes. Use when: the workflow operates on genomic intervals (BED/GFF/VCF coordinate features) and data-flow translation needs overlap, merge, coverage, windowing, masking, or set-algebra steps.
- `references/notes/cwl-pickvalue-to-galaxy.md`: Research note copied verbatim into the bundle. Map CWL pickValue (first_non_null / the_only_non_null / all_non_null) on workflow outputs or step inputs into Galaxy's native `pick_value` workflow module added by galaxy#22222. Use when: any summary-cwl edge `via` contains a `pickValue:*` marker, OR any workflow_outputs[].output_source is multi-valued with pickValue, OR any steps[].in[].pick_value is non-null in the source workflow or referenced subworkflows.
- `references/notes/galaxy-collection-semantics.md`: Research note copied verbatim into the bundle. Translate CWL arrays, records, scatter, and secondary-file shapes into Galaxy dataset and collection semantics. Use when: cWL input/output or step wiring implies Galaxy collections, map-over, reduction, or shape changes.
- `references/notes/galaxy-collection-semantics.upstream.myst`: Companion file copied verbatim into the bundle. Sibling of `references/notes/galaxy-collection-semantics.md`; read it where that note directs.
- `references/notes/galaxy-collection-semantics.yml`: Companion file copied verbatim into the bundle. Sibling of `references/notes/galaxy-collection-semantics.md`; read it where that note directs.
- `references/notes/galaxy-paired-or-unpaired-collections.md`: Research note copied verbatim into the bundle. When the interface brief adopted a `paired_or_unpaired` shape, model inner-tool branching via `has_single_item` semantics instead of a Galaxy-level mode switch. Use when: the preceding cwl-galaxy-interface brief uses `paired_or_unpaired` (or `list:paired_or_unpaired`) as a workflow input, OR the data-flow brief is considering it as an option.

## Validation

- None declared.

## Procedure

Read a CWL summary plus the preceding Galaxy interface brief and emit a reviewable Markdown data-flow brief. Capture abstract topology, Galaxy collection semantics, placeholder transformations, unresolved Galaxy tool needs, confidence, and open questions.

CWL already carries structured workflow shape, so this skill should be lighter than nextflow-summary-to-galaxy-data-flow.

Start from `summary-cwl.graph.edges[]` instead of rediscovering the DAG. The main work is translation pressure: CWL scatter into Galaxy map-over or collection steps, `linkMerge`/`pickValue` into explicit fan-in choices, secondary files into output contracts, and `valueFrom`/`when` into reviewable placeholders when Galaxy cannot express them directly.

## Runtime Notes

- Do not read Foundry source files at runtime; use only files packaged in this skill bundle and user-supplied artifacts.
- Preserve declared artifact filenames unless the user or harness supplies explicit paths.
- Carry unresolved assumptions into the output artifact instead of silently inventing missing source evidence.