compare-against-iwc-exemplar
Find the nearest IWC exemplar workflow(s) for the upstream Galaxy design briefs and emit a structural diff that guides the downstream *-summary-to-galaxy-template Mold before per-step authoring effort is spent.
This Mold is the corpus-first check in Galaxy-targeting pipelines. It runs after the source-specific interface and data-flow briefs and before the gxformat2 template Mold. Discovery, ranking, and comparison are one action — there is no separate retrieval Mold.
Procedure
- Clone or pull and merge the IWC corpus (
https://github.com/galaxyproject/iwc) to ~/.foundry/iwc.
- Normalize candidate workflows with convert as needed for structural comparison.
- Find the closest workflow and rank it.
- Surface the nearest exemplar forward (skip when the result is “no nearest exemplar”): see Nearest exemplar (gxformat2) view.
Feature Hierarchy
- Domain or analysis intent.
- Input collection topology.
- Primary tool families.
- DAG motifs and structural recipes.
- Output types and report shape.
- Test style and fixture topology.
Domain comes first so a structurally similar workflow in the wrong science area does not become a misleading exemplar. Topology comes second because collection shape is one of the most important Galaxy-specific design decisions. Test style is useful after a workflow match, but should not drive initial retrieval. Briefs with no domain signal should not produce a high-confidence exemplar even if they share generic tools.
Confidence Levels
| Level | Meaning |
|---|
| High | Same domain/subdomain, same input topology, same primary tool families, same major DAG motifs, and matching test fixture shape. |
| Medium | Same domain and topology, but partial tool-family or output match. Useful exemplar, not canonical. |
| Low | Cross-domain structural match only. Useful for a pattern comparison, not a nearest domain exemplar. |
| No nearest exemplar | Candidate lacks domain or topology alignment, or only shares generic tools such as MultiQC. |
Routing findings forward
Each finding should name the authoring surface most likely to own the fix:
- Template/data-flow issue: missing node, wrong collection shape, wrong branch, placeholder too vague — surfaced for the downstream
*-summary-to-galaxy-template Mold to apply.
- Pattern issue: recurring Galaxy idiom should become or update a pattern page.
- Tool-step issue: exact wrapper or parameterization will be handled later in the per-step loop.
- Test issue: defer to
*-test-to-galaxy-test-plan or implement-galaxy-workflow-test.
Do not block downstream authoring on low-confidence exemplar mismatches. Report them as review guidance for the template Mold and the user.
The schema tells the template Mold what is legal gxformat2; it does not show idiom. A real, domain-adjacent gxformat2 workflow is high-value signal for constructing the draft — input/collection shapes, map-over wiring, output promotion, post-job actions. Agents have seen far more legacy .ga JSON than gxformat2 YAML in training, so surface the converted view rather than leaving it as prose.
Once the nearest exemplar is chosen (High or Medium confidence):
- Convert it with convert (
--to format2 --compact).
- Write the cleaned gxformat2 of the relevant subgraph to the
iwc-exemplar.gxwf.yml sibling artifact — the slice that matches the briefs’ structure, not the whole workflow.
- Inline a bounded excerpt (~10–40 lines) of that subgraph under a labeled section in
iwc-comparison-notes.md, and cite the abstract IWC workflow ID (e.g. transcriptomics/rnaseq-pe/rnaseq-pe) plus the step labels it covers. Cross-reference the sibling file for the fuller view.
Keep it size-bounded — a whole large workflow is noise; the relevant subgraph is the signal. When the result is “no nearest exemplar,” emit neither the sibling file nor the excerpt. A Low-confidence cross-domain match may surface a short excerpt for pattern comparison but should be labeled as such, not as a domain exemplar.
Non-goals
- No tool discovery. Do not replace discover-shed-tool.
- No automatic rewrite. This Mold emits structural diff guidance; the harness or user decides which changes to apply.
- No forced nearest. A no-match result is valid when IWC lacks a close exemplar.