debug-galaxy-workflow-output
Triage a failing Galaxy workflow test. Take the structured handoff from run-workflow-test, classify the failure surface before proposing any repair, and capture the reference evidence the surface requires. When the failure cannot be classified from existing references, recommend a focused follow-up rather than converting uncertainty into a guessed fix.
Classify before repairing. The same red output can be a tool/job failure, a workflow invocation failure, a collection-output mismatch, a missing workflow output, or an assertion mismatch — and each routes to a different reference surface and a different fix. Locate where the evidence lives first (planemo-workflow-test-architecture).
Sequence
- Classify the first failure surface. From the run’s structured result, decide whether the first failure is a tool/job failure, a workflow invocation failure, a collection-output mismatch, a missing workflow output, or an assertion mismatch. Classify before proposing repairs.
- Capture job-failure evidence. When a job is in
error/failed/stopped, record job id, tool id, exit code, job messages, the stdout/stderr distinction, and output dataset state per galaxy-tool-job-failure-reference; check whether the wrapper’s failure semantics already explain it.
- Capture invocation-failure evidence. When the invocation state or messages indicate scheduling, materialization, cancellation, conditional, or output-resolution failure, record invocation state, the structured message reason, the affected step, any subworkflow path, and the jobs summary per galaxy-workflow-invocation-failure-reference; note whether Planemo surfaced or hid the relevant Galaxy API detail.
- Trace collection mismatches. When a failing output is a collection or mapped output, diagnose shape, mapping, reduction, and element-identifier mismatches with galaxy-collection-semantics; for workflows translated from Nextflow, trace wrong nesting / missing elements / bad joins back to possibly-lossy operator translations via nextflow-operators-to-galaxy-collection-recipes.
- Read assertion failures honestly. When the failure is an assertion, use planemo-asserts-idioms to decide whether it is an assertion-choice/tolerance problem or a real output regression. Before weakening an assertion, widening a delta, or switching to an existence check, confirm against iwc-shortcuts-anti-patterns that the relaxation is an accepted IWC shortcut and not masking a real failure.
- Discover reference gaps. When the failure cannot be classified confidently from the references above, recommend a focused follow-up — reference documentation, pattern capture, API verification, or eval coverage — rather than emitting a repair recipe built on a guess.