Home Pattern

Interval: consensus regions by multi-intersect

Find features reproducible across replicates: multi-intersect per-replicate sets, threshold by replicate count, then intersect back against the merged call.

Revised
2026-06-10
Rev
1

Pattern health

warn
  • IWC exemplar anchors

    3 abstract workflow anchors declared.

  • Foundry verification fixture

    No structural verification fixture yet.

  • Pattern map coverage

    1 pattern map link here.

  • Metadata contract

    Pattern frontmatter matches the site contract.

Interval: consensus regions by multi-intersect

Use this recipe when you have one feature set per replicate (per-replicate MACS2 peaks, per-sample regions) and want the features that are reproducible across enough replicates. Three IWC consensus-peaks workflows implement it identically; it is the most reusable interval construct in the corpus.

The move has three stages plus a computed threshold:

1. Multi-intersect the replicate collection

Feed the collection of per-replicate feature sets into bedtools_multiintersectbed (“compute multi intersect”). It returns one row per distinct genomic segment, annotated with which replicates cover it (and a count). Corpus state: tag_select: tag, cluster: false, filler: "0", header: false, with the collection connected to tag.inputs.

tool_id: toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_multiintersectbed/2.31.1
tool_state:
  tag: { tag_select: tag, inputs: { __class__: ConnectedValue } }
  cluster: false
  filler: "0"
  header: false

2. Compute the replicate threshold

The “at least X replicates” count is not hard-coded — it is derived from the number of inputs. The corpus counts replicates with wc_gnu over the assembled per-replicate table and reduces with table_compute (matrixapply, vector_op: min) to get the threshold integer (get nb of replicates, get min value). See derive-parameter-from-file for the file→parameter mechanics.

3. Threshold by replicate count

Filter1 keeps the multi-intersect rows whose replicate-count column meets the threshold (filter multi intersect). This is a tabular filter on an interval-derived table — the count column is opaque to coordinates, so tabular-filter-by-column-value is the right tool, not a coordinate op.

4. Intersect back against the merged-sample call

Finally, intersect the peaks called on the pooled/merged sample against the thresholded consensus regions with bedtools_intersectbed in iterate mode, overlap_mode: [-wa, -wb] (get merged peaks overlapping at least x replicates). This promotes the merged-sample peak coordinates while keeping only those backed by enough replicates. See interval-overlap-filter for the intersect parameters.

Why this shape

Multi-intersect alone tells you where replicates agree but fragments the genome into overlap segments; the final intersect-back recovers clean, biologically meaningful peak boundaries (from the merged call) filtered to the reproducible set. Doing it the other way — intersecting every replicate pair — does not scale and loses the count semantics.

Pitfalls

  • The threshold must track replicate count. Hard-coding “≥2” breaks when the workflow runs on a different number of replicates. The compute-the-threshold step (stage 2) is load-bearing, not incidental.
  • Stage 3 is tabular, stage 4 is coordinate-aware. Mixing them up — trying to threshold with an interval tool, or recover boundaries with a tabular filter — loses information. The recipe deliberately crosses the interval↔tabular seam twice.
  • Collection axis. The replicate sets are a Galaxy collection; multi-intersect consumes the whole collection at once (not map-over), while the final intersect runs in iterate mode. See galaxy-collection-patterns.

See also

IWC exemplars3 anchors

IWC Exemplars

epigenetics/consensus-peaks/consensus-peaks-atac-cutandrunhigh

Full multi-intersect -> count-threshold -> intersect-back consensus shape with a computed replicate threshold.

  • compute multi intersect
  • filter multi intersect
  • get merged peaks overlapping at least x replicates
epigenetics/consensus-peaks/consensus-peaks-chip-pehigh

Same recipe in the ChIP paired-end sibling.

epigenetics/consensus-peaks/consensus-peaks-chip-srhigh

Same recipe in the ChIP single-read sibling.

Incoming References (3)

  • Galaxy: genomic interval patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy genomic interval operations and recipes on coordinate features.
  • Interval: filter or annotate by overlaprelated pattern— Keep, drop, or annotate coordinate features by overlap with a second feature set; bedtools intersect (BED) or vcfvcfintersect (VCF), mapped over a collection.
  • Iwc Interval Operations Surveyrelated note— IWC corpus survey of coordinate-aware genomic interval operations; sizing and candidate boundaries for a galaxy-interval-patterns MOC, with hold-if-thin gate.