Documents

DOI

DNA read alignment is a major step in genome analysis. However, as DNA reads continue to become longer, new approaches need to be developed to effectively use these longer reads in the alignment process. Modern aligners commonly use a two-step approach for read alignment: 1. seeding, 2. extension. In this paper, we have investigated various seeding and extension techniques used in modern DNA read alignment algorithms to find the best seeding and extension combinations. We developed an open source generic DNA read aligner that can be used to compare the alignment accuracy and total execution time of different combinations of seeding and extension algorithms. For extension, our results show that local alignment is the best extension approach, achieving up to 3.6x more accuracy than other extension techniques, for longer reads. For seeding, if BLAST-like seed extension is used, the best seeding approach is identifying all SMEMs in the DNA read (e.g., approach used by BWA-MEM). This combination is up to 6x more accurate than other seeding techniques, for longer reads. With local alignment, we observed that the seeding technique does not impact the alignment accuracy. Furthermore, we showed that an optimized implementation of local alignment using vector instructions, enabling 4.5x speedup, makes it the fastest of all extension techniques. Overall, we show that using local alignment with non-overlapping maximal exact matching seeds is the best seeding-extension combination due to its high accuracy and higher potential for optimization/acceleration for future DNA reads.
Original languageEnglish
Title of host publication 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
EditorsTianhai Tian, Qinghua Jiang, Yunlong Liu, Kevin Burrage, Jiangning Song, Yadong Wang, Xiaohua Hu, Shinichi Morishita, Qian Zhu, Guohua Wang
Place of PublicationPiscataway, NJ
PublisherIEEE
Pages1421-1428
Number of pages8
ISBN (Electronic)978-1-5090-1611-2
ISBN (Print)978-1-5090-1612-9
DOIs
Publication statusPublished - Dec 2016
EventIEEE International Conference on Bioinformatics and Biomedicine 2016 - Kylin Villa Hotel, Shenzhen, China
Duration: 15 Dec 201618 Dec 2016
https://cci.drexel.edu/ieeebibm/bibm2016/

Conference

ConferenceIEEE International Conference on Bioinformatics and Biomedicine 2016
Abbreviated titleBIBM 2016
CountryChina
CityShenzhen
Period15/12/1618/12/16
Internet address

    Research areas

  • DNA, Computers, Indexes, Genomics, Bioinformatics, Irrigation

ID: 13682526