Accelerating DNA Variant Calling Algorithms on High Performance Computing Systems

Shanshan Ren

doi:10.4233/uuid:1752b8ce-631b-4127-91c9-92538e34a13b

Accelerating DNA Variant Calling Algorithms on High Performance Computing Systems

Shanshan Ren

Computer Engineering

Research output: Thesis › Dissertation (TU Delft)

182 Downloads (Pure)

Abstract

Next generation sequencing (NGS) technologies have transformed the landscape of genomic research. With the significant advances in NGS technologies, DNA sequencing is more affordable and accessible than ever before. Meanwhile, many DNA sequence analysis tools have been developed to derive useful information from the raw sequencing data produced by NGS platforms. However, the massive amount of generated sequencing data poses a great computational challenge, thereby shifting the bottleneck towards the efficiency of the DNA sequence analysis tools. Due to the high computational needs, high performance systems are playing an important role for DNA sequence analysis. Moreover, dedicated hardware, including graphics processing units (GPUs) and field programmable gate arrays (FPGAs), have become important computational resources in many high performance systems.
In this thesis, we use GPUs and FPGAs to accelerate a number of important bioinformatics algorithms. These represent the most computationally intensive algorithms of the GATK HaplotypeCaller (HC), which we use to improve its performance. GATK HC is a widely used DNA sequence analysis tool. By investigating GATK HC, three computationally intensive algorithms are selected, including the de Buijn graph (DBG) construction algorithm for micro-assembly, the pair-HMMs forward algorithm and the semi-global pairwise alignment algorithm. We first propose a novel GPU-based implementation of the DBG construction algorithm for micro-assembly. Compared with the software-only implementation, it achieves a speedup of up to 3x using synthetic datasets and a speedup of up to 2.66x using human genome datasets. We then propose a systolic array design to accelerate the pair-HMMs forward algorithm on FPGAs. Experimental results show that the FPGA-based implementation is up to 67x faster than the software-only implementation. In order to fully utilize the computing resources on FPGAs, we present a model to describe the performance characteristics of the systolic array design. Based on the analysis, we propose a novel architecture to better utilize the computing resources on FPGAs. The implementation achieves up to 90\% of the theoretical throughput for a real dataset. Next, we propose several GPU-based implementations of the pair-HMMs forward algorithm. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47x over existing GPU-based implementations. Finally, we propose to accelerate the semi-global pairwise sequence alignment algorithm with traceback to obtain the optimal alignment on GPUs. Experimental results show that the GPU-based implementation is up to 14.14x faster than the software-only implementation.
After accelerating these algorithms on GPUs and FPGAs, we integrate two GPU-based implementations into GATK HC. We first integrate the GPU-based implementation of the pair-HMMs forward algorithm into GATK HC. In single-threaded mode, the GPU-based GATK HC implementation is 1.71x faster than the baseline GATK HC implementation. For multi-process mode, a load-balanced multi-process optimization is proposed to ensure a more equal distribution of computation load between different processes. The GPU-based GATK HC implementation achieves up to 2.04x in load-balanced multi-process mode over the baseline GATK HC implementation in non-load-balanced multi-process mode. Next, we additionally integrated the GPU-based implementation of the semi-global alignment algorithm into the GATK HC. Experimental results shown that this implementation is 2.3x faster than the baseline GATK HC implementation in single-thread mode.

Original language	English
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Al-Ars, Z., Supervisor Bertels, Koen, Supervisor
Award date	17 Dec 2018
Print ISBNs	978-94-028-1318-0
DOIs	https://doi.org/10.4233/uuid:1752b8ce-631b-4127-91c9-92538e34a13b
Publication status	Published - 2018

Keywords

Pair-HMMs forward
sequence alignment with traceback
de Brujin graph construction
GPU acceleration
FPGA acceleration

Access to Document

10.4233/uuid:1752b8ce-631b-4127-91c9-92538e34a13b

ThesisFinal published version, 5.61 MB

Cite this

@phdthesis{1752b8ce631b412791c992538e34a13b,

title = "Accelerating DNA Variant Calling Algorithms on High Performance Computing Systems",

abstract = "Next generation sequencing (NGS) technologies have transformed the landscape of genomic research. With the significant advances in NGS technologies, DNA sequencing is more affordable and accessible than ever before. Meanwhile, many DNA sequence analysis tools have been developed to derive useful information from the raw sequencing data produced by NGS platforms. However, the massive amount of generated sequencing data poses a great computational challenge, thereby shifting the bottleneck towards the efficiency of the DNA sequence analysis tools. Due to the high computational needs, high performance systems are playing an important role for DNA sequence analysis. Moreover, dedicated hardware, including graphics processing units (GPUs) and field programmable gate arrays (FPGAs), have become important computational resources in many high performance systems.In this thesis, we use GPUs and FPGAs to accelerate a number of important bioinformatics algorithms. These represent the most computationally intensive algorithms of the GATK HaplotypeCaller (HC), which we use to improve its performance. GATK HC is a widely used DNA sequence analysis tool. By investigating GATK HC, three computationally intensive algorithms are selected, including the de Buijn graph (DBG) construction algorithm for micro-assembly, the pair-HMMs forward algorithm and the semi-global pairwise alignment algorithm. We first propose a novel GPU-based implementation of the DBG construction algorithm for micro-assembly. Compared with the software-only implementation, it achieves a speedup of up to 3x using synthetic datasets and a speedup of up to 2.66x using human genome datasets. We then propose a systolic array design to accelerate the pair-HMMs forward algorithm on FPGAs. Experimental results show that the FPGA-based implementation is up to 67x faster than the software-only implementation. In order to fully utilize the computing resources on FPGAs, we present a model to describe the performance characteristics of the systolic array design. Based on the analysis, we propose a novel architecture to better utilize the computing resources on FPGAs. The implementation achieves up to 90\% of the theoretical throughput for a real dataset. Next, we propose several GPU-based implementations of the pair-HMMs forward algorithm. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47x over existing GPU-based implementations. Finally, we propose to accelerate the semi-global pairwise sequence alignment algorithm with traceback to obtain the optimal alignment on GPUs. Experimental results show that the GPU-based implementation is up to 14.14x faster than the software-only implementation. After accelerating these algorithms on GPUs and FPGAs, we integrate two GPU-based implementations into GATK HC. We first integrate the GPU-based implementation of the pair-HMMs forward algorithm into GATK HC. In single-threaded mode, the GPU-based GATK HC implementation is 1.71x faster than the baseline GATK HC implementation. For multi-process mode, a load-balanced multi-process optimization is proposed to ensure a more equal distribution of computation load between different processes. The GPU-based GATK HC implementation achieves up to 2.04x in load-balanced multi-process mode over the baseline GATK HC implementation in non-load-balanced multi-process mode. Next, we additionally integrated the GPU-based implementation of the semi-global alignment algorithm into the GATK HC. Experimental results shown that this implementation is 2.3x faster than the baseline GATK HC implementation in single-thread mode. ",

keywords = "Pair-HMMs forward, sequence alignment with traceback, de Brujin graph construction, GPU acceleration, FPGA acceleration",

author = "Shanshan Ren",

year = "2018",

doi = "10.4233/uuid:1752b8ce-631b-4127-91c9-92538e34a13b",

language = "English",

isbn = "978-94-028-1318-0",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Accelerating DNA Variant Calling Algorithms on High Performance Computing Systems

AU - Ren, Shanshan

PY - 2018

Y1 - 2018

N2 - Next generation sequencing (NGS) technologies have transformed the landscape of genomic research. With the significant advances in NGS technologies, DNA sequencing is more affordable and accessible than ever before. Meanwhile, many DNA sequence analysis tools have been developed to derive useful information from the raw sequencing data produced by NGS platforms. However, the massive amount of generated sequencing data poses a great computational challenge, thereby shifting the bottleneck towards the efficiency of the DNA sequence analysis tools. Due to the high computational needs, high performance systems are playing an important role for DNA sequence analysis. Moreover, dedicated hardware, including graphics processing units (GPUs) and field programmable gate arrays (FPGAs), have become important computational resources in many high performance systems.In this thesis, we use GPUs and FPGAs to accelerate a number of important bioinformatics algorithms. These represent the most computationally intensive algorithms of the GATK HaplotypeCaller (HC), which we use to improve its performance. GATK HC is a widely used DNA sequence analysis tool. By investigating GATK HC, three computationally intensive algorithms are selected, including the de Buijn graph (DBG) construction algorithm for micro-assembly, the pair-HMMs forward algorithm and the semi-global pairwise alignment algorithm. We first propose a novel GPU-based implementation of the DBG construction algorithm for micro-assembly. Compared with the software-only implementation, it achieves a speedup of up to 3x using synthetic datasets and a speedup of up to 2.66x using human genome datasets. We then propose a systolic array design to accelerate the pair-HMMs forward algorithm on FPGAs. Experimental results show that the FPGA-based implementation is up to 67x faster than the software-only implementation. In order to fully utilize the computing resources on FPGAs, we present a model to describe the performance characteristics of the systolic array design. Based on the analysis, we propose a novel architecture to better utilize the computing resources on FPGAs. The implementation achieves up to 90\% of the theoretical throughput for a real dataset. Next, we propose several GPU-based implementations of the pair-HMMs forward algorithm. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47x over existing GPU-based implementations. Finally, we propose to accelerate the semi-global pairwise sequence alignment algorithm with traceback to obtain the optimal alignment on GPUs. Experimental results show that the GPU-based implementation is up to 14.14x faster than the software-only implementation. After accelerating these algorithms on GPUs and FPGAs, we integrate two GPU-based implementations into GATK HC. We first integrate the GPU-based implementation of the pair-HMMs forward algorithm into GATK HC. In single-threaded mode, the GPU-based GATK HC implementation is 1.71x faster than the baseline GATK HC implementation. For multi-process mode, a load-balanced multi-process optimization is proposed to ensure a more equal distribution of computation load between different processes. The GPU-based GATK HC implementation achieves up to 2.04x in load-balanced multi-process mode over the baseline GATK HC implementation in non-load-balanced multi-process mode. Next, we additionally integrated the GPU-based implementation of the semi-global alignment algorithm into the GATK HC. Experimental results shown that this implementation is 2.3x faster than the baseline GATK HC implementation in single-thread mode.

AB - Next generation sequencing (NGS) technologies have transformed the landscape of genomic research. With the significant advances in NGS technologies, DNA sequencing is more affordable and accessible than ever before. Meanwhile, many DNA sequence analysis tools have been developed to derive useful information from the raw sequencing data produced by NGS platforms. However, the massive amount of generated sequencing data poses a great computational challenge, thereby shifting the bottleneck towards the efficiency of the DNA sequence analysis tools. Due to the high computational needs, high performance systems are playing an important role for DNA sequence analysis. Moreover, dedicated hardware, including graphics processing units (GPUs) and field programmable gate arrays (FPGAs), have become important computational resources in many high performance systems.In this thesis, we use GPUs and FPGAs to accelerate a number of important bioinformatics algorithms. These represent the most computationally intensive algorithms of the GATK HaplotypeCaller (HC), which we use to improve its performance. GATK HC is a widely used DNA sequence analysis tool. By investigating GATK HC, three computationally intensive algorithms are selected, including the de Buijn graph (DBG) construction algorithm for micro-assembly, the pair-HMMs forward algorithm and the semi-global pairwise alignment algorithm. We first propose a novel GPU-based implementation of the DBG construction algorithm for micro-assembly. Compared with the software-only implementation, it achieves a speedup of up to 3x using synthetic datasets and a speedup of up to 2.66x using human genome datasets. We then propose a systolic array design to accelerate the pair-HMMs forward algorithm on FPGAs. Experimental results show that the FPGA-based implementation is up to 67x faster than the software-only implementation. In order to fully utilize the computing resources on FPGAs, we present a model to describe the performance characteristics of the systolic array design. Based on the analysis, we propose a novel architecture to better utilize the computing resources on FPGAs. The implementation achieves up to 90\% of the theoretical throughput for a real dataset. Next, we propose several GPU-based implementations of the pair-HMMs forward algorithm. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47x over existing GPU-based implementations. Finally, we propose to accelerate the semi-global pairwise sequence alignment algorithm with traceback to obtain the optimal alignment on GPUs. Experimental results show that the GPU-based implementation is up to 14.14x faster than the software-only implementation. After accelerating these algorithms on GPUs and FPGAs, we integrate two GPU-based implementations into GATK HC. We first integrate the GPU-based implementation of the pair-HMMs forward algorithm into GATK HC. In single-threaded mode, the GPU-based GATK HC implementation is 1.71x faster than the baseline GATK HC implementation. For multi-process mode, a load-balanced multi-process optimization is proposed to ensure a more equal distribution of computation load between different processes. The GPU-based GATK HC implementation achieves up to 2.04x in load-balanced multi-process mode over the baseline GATK HC implementation in non-load-balanced multi-process mode. Next, we additionally integrated the GPU-based implementation of the semi-global alignment algorithm into the GATK HC. Experimental results shown that this implementation is 2.3x faster than the baseline GATK HC implementation in single-thread mode.

KW - Pair-HMMs forward

KW - sequence alignment with traceback

KW - de Brujin graph construction

KW - GPU acceleration

KW - FPGA acceleration

U2 - 10.4233/uuid:1752b8ce-631b-4127-91c9-92538e34a13b

DO - 10.4233/uuid:1752b8ce-631b-4127-91c9-92538e34a13b

M3 - Dissertation (TU Delft)

SN - 978-94-028-1318-0

ER -

Accelerating DNA Variant Calling Algorithms on High Performance Computing Systems

Abstract

Keywords

Access to Document

Fingerprint

Cite this