HaShRECA: Hadoop Based Short Read Error Correction Algorithm for Genome Assembly

Muhammad      Tahir; Muhammad      Sardaraz; Ataul   Aziz   Ikram; Hassan      Bajwa

Current Bioinformatics

Author(s): Muhammad Tahir, Muhammad Sardaraz, Ataul Aziz Ikram and Hassan Bajwa

DOI: 10.2174/157489361004150922151409

Download PDF Flyer Cite As

HaShRECA: Hadoop Based Short Read Error Correction Algorithm for Genome Assembly

Page: [469 - 475] Pages: 7

* (Excluding Mailing and Handling)

Explore Articles

About Journal

For Authors

For Editors

For Reviewers

Abstract

Next-generation high-throughput sequencing technologies have opened up new and challenging research opportunities. In particular, Next-generation sequencers produce a massive amount of short-reads data in a single run. However, the large amount of short-reads data produced is highly susceptible to errors, as compared to shotgun sequencing. Therefore, there is a peremptory demand to design fast and more accurate statistical and computational tools to analyze this data. We present HaShRECA, a new short-reads error correction algorithm based on probabilistic analysis of potential read errors that utilizes the Hadoop MapReduce framework. Experimental results show that HaShRECA is more accurate, as well as time and space efficient as compared to previous algorithms.

Keywords: Algorithm, genome, mapreduce, next generation sequencing, short read errors.

Graphical Abstract

Cite As