skip to main content
10.1145/2506583.2506644acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial
Open Access

Suffix-Tree Based Error Correction of NGS Reads Using Multiple Manifestations of an Error

Authors Info & Claims
Published:22 September 2013Publication History

ABSTRACT

Next Generation Sequencing (NGS) technologies produce large quantities of short length reads with higher error rates. Erroneous reads that cannot be aligned, are either ignored during de-novo sequencing, or must be suitably corrected. Such reads pose problems for mapping as well, since it is difficult to distinguish errors from true variants. Methods for detection and correction of errors typically rely on frequencies of substrings of the reads. Suffix trees are often utilized for this purpose, since they can be used to index and count the frequencies of substrings of all lengths. Existing suffix-tree based methods detect errors by identifying statistically under-represented branches (suffixes) and fix them. However, they do not refer back to the reads to put the correction in context. Since an error in a single read manifests itself at multiple nodes of a suffix tree, a read-driven approach that relies on its multiple manifestations is expected to perform better. Based on this observation, we develop an algorithm, PLURIBUS, which reconciles corrections suggested by multiple manifestations of an error using a voting scheme. We compare the accuracy of PLURIBUS in detecting and correcting errors against existing error correction techniques using simulated sequencing data. We also assess the impact of error correction on the performance of sequence assembly. Our results show that PLURIBUS corrects errors with improved precision and enables the assembler to generate longer contigs, particularly when the genome is longer, or coverage is lower. PLURIBUS is freely available at http://compbio.case.edu/pluribus/.

References

  1. Keith R Bradnam, Joseph N Fass, Anton Alexandrov, Paul Baranay, Michael Bechner, İnanç Birol, Sébastien Boisvert10, Jarrod A Chapman, Guillaume Chapuis, Rayan Chikhi, et al.Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. arXiv preprint arXiv:1301.5406, 2013.Google ScholarGoogle Scholar
  2. Weichun Huang, Leping Li, Jason R. Myers, and Gabor T. Marth. Art: a next-generation sequencing read simulator. Bioinformatics, 28(4):593--594, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lucian Ilie, Farideh Fazayeli, and Silvana Ilie. Hitec: accurate error correction in high-throughput sequencing data. Bioinformatics, 27(3):295--302, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David Kelley, Michael Schatz, and Steven Salzberg. Quake: quality-aware detection and correction of sequencing errors. Genome Biology, 11(11):R116, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An eulerian path approach to dna fragment assembly. Proceedings of the National Academy of Sciences, 98(17):9748--9753, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. Matthew Ruffalo, Thomas LaFramboise, and Mehmet Koyutürk. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics, 27(20):2790--2796, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leena Salmela. Correction of sequencing errors in a mixed set of reads. Bioinformatics, 26(10):1284--1290, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jan Schröder, Heiko Schröder, Simon J. Puglisi, Ranjan Sinha, and Bertil Schmidt. Shrec: a short-read error correction method. Bioinformatics, 25(17):2157--2163, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jay Shendure and Hanlee Ji. Next-generation dna sequencing. Nature biotechnology, 26(10):1135--1145, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  10. Xiao Yang, Karin S. Dorman, and Srinivas Aluru. Reptile: representative tiling for short read error correction. Bioinformatics, 26(20):2526--2533, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel R Zerbino and Ewan Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821--829, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Suffix-Tree Based Error Correction of NGS Reads Using Multiple Manifestations of an Error

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
        September 2013
        987 pages
        ISBN:9781450324342
        DOI:10.1145/2506583

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 September 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • tutorial
        • Research
        • Refereed limited

        Acceptance Rates

        BCB'13 Paper Acceptance Rate43of148submissions,29%Overall Acceptance Rate254of885submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader