Home Mold

find-test-data

Search IWC fixtures and public sources for test data matching a data-flow shape.

draft mold

Mold health

ok
  • Source layout

    Directory Mold with only index.md frontmatter.

  • Axis fields

    generic fields are coherent.

  • Eval plan

    eval.md declares cases and check type.

  • Typed refs

    2 typed references; 0 resolver issues.

  • On-demand triggers

    All on-demand references describe triggers.

  • Evidence checks

    Hypothesis references include verification.

axis
generic
name
find-test-data
contract

Reference Loading

Typed Mold references describe what casting consumes and when the generated skill should load each artifact.

Researchiwc-test-data-conventions

Background synthesis loaded by explicit progressive-disclosure metadata.

Purpose
Match IWC test-data conventions — Zenodo/remote-URL first, SHA-1 integrity, per-input collection layout — when selecting or describing candidate fixtures.
Trigger
When choosing where test data should come from or describing the shape a candidate file must have.

Artifact handoffs

/ pipeline contract

Produces

find-test-data

Resolve concrete test data for the workflow’s inputs. Read the upstream data-flow / interface brief, and for each workflow input search the IWC corpus and public sources for data that matches its Galaxy collection shape and datatype. Emit test-data-refs.json: one entry per input, each carrying a URL or path plus the expected shape, ready for implement-galaxy-workflow-test to stage.

This Mold is the first leg of the harness’s test-data-resolution branch. It resolves what it can and reports gaps; the harness routes any unresolved input to the user-supplied fallthrough. Deciding to ask the user is a harness concern, not this Mold’s — its job is an honest, source-backed match.

Sequence

  1. Read the brief. From the data-flow / interface brief, enumerate the workflow inputs: label, Galaxy collection shape (File / list / paired / list:paired / record), and datatype.
  2. Search IWC fixtures first. Prefer existing IWC test data for the same domain — it already conforms to the conventions in iwc-test-data-conventions (remote URL, recorded hash, known collection layout). A near-neighbour IWC workflow’s -tests.yml is the strongest source.
  3. Fall to public sources. When no IWC fixture fits, look for small public data (Zenodo, reference data archives) sized for a fast test run, matching the datatype and shape.
  4. Emit refs. Write one test-data-refs.json entry per input: the URL/path, the expected Galaxy shape, datatype, and integrity hash when known. Per galaxy-workflow-testability-design, make sure each entry maps to an addressable input label.
  5. Report gaps. For any input with no acceptable match, emit the input with resolved: false and a short reason rather than a guessed URL. These are what the harness hands to user-supplied.

No fabrication

Never invent a URL, accession, or path to make an input look resolved. A wrong-but-plausible fixture reference is worse than an honest gap: it survives static checks and fails only at run time, far from this Mold. Every emitted ref must point at data that exists; everything else is a reported gap.

Incoming References (3)

  • INTERVIEW → GALAXYphase of pipeline— Interview-driven path to a Galaxy gxformat2 workflow through the shared freeform-summary handoff.
  • PAPER → CWLphase of pipeline— Direct path from a paper to a CWL Workflow + CommandLineTool set.
  • PAPER → GALAXYphase of pipeline— Direct path from a paper to a Galaxy gxformat2 workflow. No CWL intermediate.