DOI

Much research has been dedicated to reducing the computational time associated with the analysis of genome data, which resulted in shifting the bottleneck from the time needed for the computational analysis part to the actual time needed for sequencing of DNA information. DNA sequencing is a time consuming process, and all existing DNA analysis methods have to wait for the DNA sequencing to completely finish before starting the analysis. In this paper, we propose a new DNA analysis approach where we start the genome analysis before the DNA sequencing is completely finished. The genome analysis is started when the DNA reads are still in the process of being sequenced. We use algorithms to predict the unknown bases and their corresponding base quality scores of the incomplete read. Results show that our method of predicting the unknown bases and quality scores achieves more than 90% similarity with the full dataset for 50 unknown bases (slashing more than a day of sequencing time). We also show that our base quality value prediction scheme is highly accurate, only reducing the similarity of the detected variants by 0.45%. However, there is still room to introduce more accurate prediction schemes for the unknown bases to increase the effectiveness of the analysis by up to 5.8%.
Original languageEnglish
Title of host publication2017 IEEE 17th International Conference on BioInformatics and BioEngineering (BIBE)
Place of PublicationPiscataway
PublisherIEEE
Pages119-124
Number of pages6
ISBN (Electronic)978-1-5386-1324-5
ISBN (Print)978-1-5386-1325-2
DOIs
Publication statusPublished - 2017
EventBIBE 2017: 17th IEEE International Conference on BioInformatics and BioEngineering - Washington DC, United States
Duration: 23 Oct 201725 Oct 2017
http://bibe2017.com/index.html

Conference

ConferenceBIBE 2017
Abbreviated titleBIBE 2017
CountryUnited States
CityWashington DC
Period23/10/1725/10/17
Internet address

    Research areas

  • DNA Sequencing delay, Prediction, GATK

ID: 28679020