find-test-data
Resolve concrete test data for the workflow’s inputs. Read the upstream data-flow / interface brief, and for each workflow input search the IWC corpus and public sources for data that matches its Galaxy collection shape and datatype. Emit test-data-refs.json: one entry per input, each carrying a URL or path plus the expected shape, ready for implement-galaxy-workflow-test to stage.
This Mold is the first leg of the harness’s test-data-resolution branch. It resolves what it can and reports gaps; the harness routes any unresolved input to the user-supplied fallthrough. Deciding to ask the user is a harness concern, not this Mold’s — its job is an honest, source-backed match.
Sequence
- Read the brief. From the data-flow / interface brief, enumerate the workflow inputs: label, Galaxy collection shape (File / list / paired / list:paired / record), and datatype.
- Search IWC fixtures first. Prefer existing IWC test data for the same domain — it already conforms to the conventions in iwc-test-data-conventions (remote URL, recorded hash, known collection layout). A near-neighbour IWC workflow’s
-tests.yml is the strongest source.
- Fall to public sources. When no IWC fixture fits, look for small public data (Zenodo, reference data archives) sized for a fast test run, matching the datatype and shape.
- Emit refs. Write one
test-data-refs.json entry per input: the URL/path, the expected Galaxy shape, datatype, and integrity hash when known. Per galaxy-workflow-testability-design, make sure each entry maps to an addressable input label.
- Report gaps. For any input with no acceptable match, emit the input with
resolved: false and a short reason rather than a guessed URL. These are what the harness hands to user-supplied.
No fabrication
Never invent a URL, accession, or path to make an input look resolved. A wrong-but-plausible fixture reference is worse than an honest gap: it survives static checks and fails only at run time, far from this Mold. Every emitted ref must point at data that exists; everything else is a reported gap.