CHOP: haplotype-aware path indexing in population graphs

Tom Mokveld; Jasper Linthorst; Zaid Al-Ars; Henne Holstege; Marcel Reinders

doi:https://doi.org/10.1186/s13059-020-01963-y

CHOP: haplotype-aware path indexing in population graphs

Tom Mokveld, Jasper Linthorst, Zaid Al-Ars, Henne Holstege, Marcel Reinders

Research output: Contribution to journal › Article › Academic › peer-review

6 Citations (Scopus)

Abstract

The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.

Original language	English
Article number	65
Number of pages	1
Journal	Genome Biology
Volume	21
Issue number	1
DOIs	https://doi.org/10.1186/s13059-020-01963-y
Publication status	Published - 11 Mar 2020

Keywords

Graph-based reference genomes
Haplotype-aware graph indexes
Read alignment

Access to Document

https://doi.org/10.1186/s13059-020-01963-y

Cite this

@article{9985a80fe4794e83a8dc9d33cfe9054d,

title = "CHOP: haplotype-aware path indexing in population graphs",

abstract = "The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.",

keywords = "Graph-based reference genomes, Haplotype-aware graph indexes, Read alignment",

author = "Tom Mokveld and Jasper Linthorst and Zaid Al-Ars and Henne Holstege and Marcel Reinders",

year = "2020",

month = mar,

day = "11",

doi = "https://doi.org/10.1186/s13059-020-01963-y",

language = "English",

volume = "21",

journal = "Genome Biology",

issn = "1465-6906",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - CHOP

T2 - haplotype-aware path indexing in population graphs

AU - Mokveld, Tom

AU - Linthorst, Jasper

AU - Al-Ars, Zaid

AU - Holstege, Henne

AU - Reinders, Marcel

PY - 2020/3/11

Y1 - 2020/3/11

N2 - The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.

AB - The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.

KW - Graph-based reference genomes

KW - Haplotype-aware graph indexes

KW - Read alignment

UR - http://www.scopus.com/inward/record.url?scp=85081655139&partnerID=8YFLogxK

U2 - https://doi.org/10.1186/s13059-020-01963-y

DO - https://doi.org/10.1186/s13059-020-01963-y

M3 - Article

C2 - 32160922

SN - 1465-6906

VL - 21

JO - Genome Biology

JF - Genome Biology

IS - 1

M1 - 65

ER -

CHOP: haplotype-aware path indexing in population graphs

Abstract

Keywords

Access to Document

Other files and links

Cite this