Home Pattern

Interval: merge overlapping features

Collapse overlapping or book-ended intervals within one set into single spans; bedtools mergebed or the gops_merge Operate-on-Genomic-Intervals tool.

Revised
2026-06-10
Rev
1

Pattern health

warn
  • IWC exemplar anchors

    3 abstract workflow anchors declared.

  • Foundry verification fixture

    No structural verification fixture yet.

  • Pattern map coverage

    1 pattern map link here.

  • Metadata contract

    Pattern frontmatter matches the site contract.

Interval: merge overlapping features

Operation

Collapse overlapping or book-ended intervals within a single set into single spans. This is the within-set companion to interval-overlap-filter (which compares two sets). Two tool families do it, and the corpus uses both as co-equals — pick by which family the surrounding workflow already uses.

  • toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_mergebed (“bedtools Merge”) — the modern path; epigenetics and VGP.
  • toolshed.g2.bx.psu.edu/repos/devteam/merge/gops_merge_1 (“Merge the overlapping intervals of a dataset”, part of “Operate on Genomic Intervals”) — the SARS-CoV-2 masking pipeline.

Parameter names are corpus-inferred from tool_state.

When to reach for it

  • Deduplicate overlapping windows before quantifying (after interval-windowed-coverage‘s slop step, so a region isn’t counted twice).
  • Union a concatenated set of regions into non-overlapping spans (the merge step of interval-mask-by-set-algebra).
  • Reduce a noisy feature set to its covered footprint.

This merges a set with itself. To combine and compare two different sets, use interval-overlap-filter.

Parameters (bedtools mergebed)

  • input: connected interval set.
  • distance: -d gap tolerance. "0" (corpus) merges only truly overlapping/book-ended features; a positive value also merges features within that many bp.
  • strand: "" (corpus) merges regardless of strand.
  • c_and_o_argument_repeat: list of {column, operation} pairs to carry/aggregate attribute columns through the merge (bedtools -c/-o). Empty [] in the corpus — coordinates only.
  • header: false in the corpus.
tool_id: toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_mergebed/2.31.1
tool_state:
  input: { __class__: ConnectedValue }
  distance: "0"
  strand: ""
  c_and_o_argument_repeat: []
  header: false

Parameters (gops_merge — Operate on Genomic Intervals)

  • input1: connected interval set.
  • returntype: true returns the merged intervals as 1-based interval records (the corpus value in consensus-from-variation).
tool_id: toolshed.g2.bx.psu.edu/repos/devteam/merge/gops_merge_1/1.0.0
tool_state:
  input1: { __class__: ConnectedValue }
  returntype: true

The gops_* “Operate on Genomic Intervals” family (gops_merge, gops_subtract, gops_concat) is co-equal with bedtools in the corpus, not deprecated — the SARS-CoV-2 masking recipe uses it throughout, and gops_subtract/gops_concat have no bedtools equivalent in the corpus at all. See iwc-interval-operations-survey for the open redundancy question.

Pitfalls

  • Sort before merge. Both tools assume coordinate-sorted input to merge correctly; unsorted input merges only adjacent-in-file overlaps. IWC pipelines arrive pre-sorted from upstream peak/coverage steps, so the corpus rarely shows an explicit sort — but a hand-built set may need one. (Coordinate-aware sort is itself absent from the IWC corpus; see galaxy-interval-patterns §Gaps.)
  • distance: 0 is not “no merge.” 0 still merges overlapping and book-ended (touching) features; it is the correct default for “collapse true overlaps.” Only a negative value would require an actual overlap of ≥1 bp.
  • Attribute columns are dropped unless you ask. c_and_o_argument_repeat: [] keeps coordinates only; score/name columns vanish. Populate the repeat with {column, operation} pairs to carry them.
  • gops_merge returntype. Leaving it unset changes the output coordinate convention; the corpus sets true.

See also

IWC exemplars3 anchors

IWC Exemplars

epigenetics/atacseq/atacseqhigh

bedtools mergebed collapses overlapping summit windows before coverage.

  • Merge summits +/-500kb
VGP-assembly-v2/hi-c-contact-map-for-assembly-manual-curation/hi-c-map-for-assembly-manual-curationmedium

bedtools mergebed in an assembly-curation context.

sars-cov-2-variant-calling/sars-cov-2-consensus-from-variation/consensus-from-variationhigh

gops_merge (Operate on Genomic Intervals) collapses a concatenated interval set inside the masking recipe.

  • Combined low-coverage regions and filter-failed sites

Incoming References (7)

  • Galaxy: genomic interval patternsrelated pattern— Use this MOC to choose corpus-grounded Galaxy genomic interval operations and recipes on coordinate features.
  • Interval: consensus regions by multi-intersectrelated pattern— Find features reproducible across replicates: multi-intersect per-replicate sets, threshold by replicate count, then intersect back against the merged call.
  • Interval: build a mask by set algebrarelated pattern— Compute regions from regions: concatenate candidate intervals, merge into non-overlapping spans, then subtract the set to keep. The gops_* set-algebra recipe.
  • Interval: filter or annotate by overlaprelated pattern— Keep, drop, or annotate coordinate features by overlap with a second feature set; bedtools intersect (BED) or vcfvcfintersect (VCF), mapped over a collection.
  • Interval: window or flank featuresrelated pattern— Extend features by a fixed or fractional amount to build neighborhood windows, clamped to chromosome ends; bedtools slopbed with a genome file.
  • Interval: windowed coverage around featuresrelated pattern— Quantify signal in fixed neighborhoods around point features: window the features (slop), collapse overlaps (merge), then count reads in each window (coverage).
  • Iwc Interval Operations Surveyrelated note— IWC corpus survey of coordinate-aware genomic interval operations; sizing and candidate boundaries for a galaxy-interval-patterns MOC, with hold-if-thin gate.