skip to main content
10.1145/2382936.2382983acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads

Published:07 October 2012Publication History

ABSTRACT

Massively parallel whole transcriptome sequencing, commonly referred to as RNA-Seq, has become the technology of choice for performing gene expression profiling. However, reconstruction of full-length novel transcripts from RNA-Seq data remains challenging due to the short read length delivered by most existing sequencing technologies. We propose a novel statistical genome-guided method called "Transcriptome Reconstruction using Integer Programming" (TRIP) that incorporates fragment length distribution into novel transcript reconstruction from paired-end RNA-Seq reads. TRIP creates a splice graph based on aligned RNA-Seq reads and enumerates all maximal paths corresponding to putative transcripts. The problem of selecting true transcripts is formulated as an integer program (IP) which minimizes the set of selected transcripts yielding a good statistical fit between the fragment length distribution (empirically determined during library preparation) and fragment lengths implied by mapped read pairs. Experimental results on both real and synthetic datasets show that TRIP is more accurate than methods ignoring fragment length distribution information. The software is available at: http://www.cs.gsu.edu/serghei/?q=trip

References

  1. I. Astrovskaya, B. Tork, S. Mangul, K. Westbrooks, I. Mandoiu, P. Balfe, and A. Zelikovsky. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics, 12(Suppl 6):S1, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  2. K. F. Au, H. Jiang, L. Lin, Y. Xing, and W. H. Wong. Detection of splice junctions from paired-end rna-seq data by splicemap. Nucleic Acids Research, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Derti, P. Garrett-Engele, K. D. MacIsaac, R. C. Stevens, S. Sriram, R. Chen, C. A. Rohl, J. M. Johnson, and T. Babak. A quantitative atlas of polyadenylation in five mammals. Genome Research, 22(6):1173--1183, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Feng, W. Li, and T. Jiang. Inference of isoforms from short sequence reads. In Proc. RECOMB, pages 138--157, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Garber, M. G. Grabherr, M. Guttman, and C. Trapnell. Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Methods, 8(6):469--477, May 2011.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Grabherr. Full-length transcriptome assembly from rna-seq data without a reference genome. Nature biotechnology, 29(7):644--652, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Guttman, M. Garber, J. Levin, J. Donaghey, J. Robinson, X. Adiconis, L. Fan, M. Koziol, A. Gnirke, C. Nusbaum, J. Rinn, E. Lander, and A. Regev. Ab initio reconstruction of cell type--specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnology, 28(5):503--510, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Li, V. Ruotti, R. Stewart, J. Thomson, and C. Dewey. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26(4):493--500, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Li, J. Feng, and T. Jiang. IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly. Lecture Notes in Computer Science, 6577:168--+, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Y. Lin, P. Dao, F. Hach, M. Bakhshi, F. Mo, A. Lapuk, C. Collins, and S. C. Sahinalp. Cliiq: Accurate comparative detection and quantification of expressed isoforms in a population. Proc. 12th Workshop on Algorithms in Bioinformatics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Mangul, A. Caciula, I. Mandoiu, and A. Zelikovsky. Rna-seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes. In Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on, pages 118--123, nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. R. Mercer, D. J. Gerhardt, M. E. Dinger, J. Crawford, C. Trapnell, J. A. Jeddeloh, J. S. Mattick, and J. L. Rinn. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nature Biotechnology, 30(1):99--104, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Mortazavi, B. Williams, K. McCue, L. Schaeffer, and B. Wold. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods, 2008.Google ScholarGoogle Scholar
  14. M. Nicolae, S. Mangul, I. Mandoiu, and A. Zelikovsky. Estimation of alternative splicing isoform frequencies from rna-seq data. Algorithms for Molecular Biology, 6:9, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Pal, R. Gupta, H. Kim, P. Wickramasinghe, V. Baubet, L. C. Showe, N. Dahmane, and R. V. Davuluri. Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development. Genome Research, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. A. Pevzner. 1-Tuple DNA sequencing: computer analysis. J Biomol Struct Dyn, 7(1):63--73, Aug. 1989.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Roberts, H. Pimentel, C. Trapnell, and L. Pachter. Identification of novel transcripts in annotated genomes using rna-seq. Bioinformatics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Robertson, J. Schein, R. Chiu, R. Corbett, M. Field, S. D. Jackman, K. Mungall, S. Lee, H. M. Okada, J. Q. Qian, and et al. De novo assembly and analysis of rna-seq data. Nature Methods, 7(11):909--912, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. M. Rothberg, W. Hinz, T. M. Rearick, J. Schultz, W. Mileski, M. Davey, J. H. Leamon, K. Johnson, M. J. Milgrew, M. Edwards, J. Hoon, J. F. Simons, D. Marran, J. W. Myers, J. F. Davidson, A. Branting, J. R. Nobile, B. P. Puc, D. Light, T. A. Clark, M. Huber, J. T. Branciforte, I. B. Stoner, S. E. Cawley, M. Lyons, Y. Fu, N. Homer, M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman, R. Kasinskas, T. Sokolsky, J. A. Fidanza, E. Namsaraev, K. J. McKernan, A. Williams, G. T. Roth, and J. Bustillo. An integrated semiconductor device enabling non-optical genome sequencing. Nature, 475(7356):348--352, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. Trapnell, L. Pachter, and S. Salzberg. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25(9):1105--1111, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Trapnell, B. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. van Baren, S. Salzberg, B. Wold, and L. Pachter. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology, 28(5):511--515, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. E. Wang, R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S. Kingsmore, G. Schroth, and C. Burge. Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221):470--476, 2008.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
                October 2012
                725 pages
                ISBN:9781450316705
                DOI:10.1145/2382936

                Copyright © 2012 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 7 October 2012

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • short-paper

                Acceptance Rates

                BCB '12 Paper Acceptance Rate33of159submissions,21%Overall Acceptance Rate254of885submissions,29%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader