Computational pan-genomics: Status, promises and challenges

Tobias Marschall; Manja Marz; TEPMF Abeel; Louis Dijkstra; Bas E. Dutilh; Ali Ghaffaari; Paul Kersey; Wigard P. Kloosterman; Jeroen de Ridder; Lodewyk Wessels; null More Authors

doi:10.1093/bib/bbw089

Computational pan-genomics: Status, promises and challenges

Tobias Marschall, Manja Marz, TEPMF Abeel, Louis Dijkstra, Bas E. Dutilh, Ali Ghaffaari, Paul Kersey, Wigard P. Kloosterman, Jeroen de Ridder, Lodewyk Wessels, More Authors

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

18 Citations (Scopus)

Abstract

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational
methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pangenomics can help address many of the problems currently faced in various domains.

Original language	English
Pages (from-to)	1-18
Number of pages	18
Journal	Briefings in Bioinformatics
DOIs	https://doi.org/10.1093/bib/bbw089
Publication status	Published - 2016

Keywords

pan-genome
sequence graph
read mapping
haplotypes
data structures

Access to Document

10.1093/bib/bbw089

Cite this

@article{28d9920cc1284d898abacb6f429fe088,

title = "Computational pan-genomics: Status, promises and challenges",

abstract = "Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computationalmethods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pangenomics can help address many of the problems currently faced in various domains.",

keywords = "pan-genome, sequence graph, read mapping, haplotypes, data structures",

author = "Tobias Marschall and Manja Marz and TEPMF Abeel and Louis Dijkstra and Dutilh, {Bas E.} and Ali Ghaffaari and Paul Kersey and Kloosterman, {Wigard P.} and {de Ridder}, Jeroen and Lodewyk Wessels and {More Authors}",

year = "2016",

doi = "10.1093/bib/bbw089",

language = "English",

pages = "1--18",

journal = "Briefings in Bioinformatics",

issn = "1467-5463",

publisher = "Oxford University Press",

}

TY - JOUR

T1 - Computational pan-genomics

T2 - Status, promises and challenges

AU - Marschall, Tobias

AU - Marz, Manja

AU - Abeel, TEPMF

AU - Dijkstra, Louis

AU - Dutilh, Bas E.

AU - Ghaffaari, Ali

AU - Kersey, Paul

AU - Kloosterman, Wigard P.

AU - de Ridder, Jeroen

AU - Wessels, Lodewyk

AU - More Authors, null

PY - 2016

Y1 - 2016

N2 - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computationalmethods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pangenomics can help address many of the problems currently faced in various domains.

AB - Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computationalmethods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pangenome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pangenomics can help address many of the problems currently faced in various domains.

KW - pan-genome

KW - sequence graph

KW - read mapping

KW - haplotypes

KW - data structures

U2 - 10.1093/bib/bbw089

DO - 10.1093/bib/bbw089

M3 - Article

SN - 1467-5463

SP - 1

EP - 18

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

ER -

Computational pan-genomics: Status, promises and challenges

Abstract

Keywords

Access to Document

Fingerprint

Cite this