Interval: consensus regions by multi-intersect
Use this recipe when you have one feature set per replicate (per-replicate MACS2 peaks, per-sample regions) and want the features that are reproducible across enough replicates. Three IWC consensus-peaks workflows implement it identically; it is the most reusable interval construct in the corpus.
The move has three stages plus a computed threshold:
1. Multi-intersect the replicate collection
Feed the collection of per-replicate feature sets into bedtools_multiintersectbed (“compute multi intersect”). It returns one row per distinct genomic segment, annotated with which replicates cover it (and a count). Corpus state: tag_select: tag, cluster: false, filler: "0", header: false, with the collection connected to tag.inputs.
tool_id: toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_multiintersectbed/2.31.1
tool_state:
tag: { tag_select: tag, inputs: { __class__: ConnectedValue } }
cluster: false
filler: "0"
header: false
2. Compute the replicate threshold
The “at least X replicates” count is not hard-coded — it is derived from the number of inputs. The corpus counts replicates with wc_gnu over the assembled per-replicate table and reduces with table_compute (matrixapply, vector_op: min) to get the threshold integer (get nb of replicates, get min value). See derive-parameter-from-file for the file→parameter mechanics.
3. Threshold by replicate count
Filter1 keeps the multi-intersect rows whose replicate-count column meets the threshold (filter multi intersect). This is a tabular filter on an interval-derived table — the count column is opaque to coordinates, so tabular-filter-by-column-value is the right tool, not a coordinate op.
4. Intersect back against the merged-sample call
Finally, intersect the peaks called on the pooled/merged sample against the thresholded consensus regions with bedtools_intersectbed in iterate mode, overlap_mode: [-wa, -wb] (get merged peaks overlapping at least x replicates). This promotes the merged-sample peak coordinates while keeping only those backed by enough replicates. See interval-overlap-filter for the intersect parameters.
Why this shape
Multi-intersect alone tells you where replicates agree but fragments the genome into overlap segments; the final intersect-back recovers clean, biologically meaningful peak boundaries (from the merged call) filtered to the reproducible set. Doing it the other way — intersecting every replicate pair — does not scale and loses the count semantics.
Pitfalls
- The threshold must track replicate count. Hard-coding “≥2” breaks when the workflow runs on a different number of replicates. The compute-the-threshold step (stage 2) is load-bearing, not incidental.
- Stage 3 is tabular, stage 4 is coordinate-aware. Mixing them up — trying to threshold with an interval tool, or recover boundaries with a tabular filter — loses information. The recipe deliberately crosses the interval↔tabular seam twice.
- Collection axis. The replicate sets are a Galaxy collection; multi-intersect consumes the whole collection at once (not map-over), while the final intersect runs in
iteratemode. See galaxy-collection-patterns.
See also
- galaxy-interval-patterns — the interval pattern map.
- interval-overlap-filter — the intersect operations (stages 1 and 4).
- tabular-filter-by-column-value — the replicate-count threshold (stage 3).
- iwc-interval-operations-survey — corpus evidence.