skip to main content
article

Fast and Practical Algorithms for Planted (l, d) Motif Search

Published:01 October 2007Publication History
Skip Abstract Section

Abstract

We consider the planted (l, d) motif search problem, which consists of finding a substring of length l that occurs in a set of input sequences {s1, . . . , sn} with up to d errors, a problem that arises from the need to find transcription factor-binding sites in genomic information. We propose a sequence of practical algorithms, which start based on the ideas considered in PMS1. These algorithms are exact, have little space requirements, and are able to tackle challenging instances with bigger d, taking less time in the instances reported solved by exact algorithms. In particular, one of the proposed algorithms, PMSprune, is able to solve the challenging instances, such as (17, 6) and (19, 7), which were not previously reported as solved in the literature.

References

  1. M. Blanchette, “Algorithms for Phylogenetic Footprinting,” Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB '01), Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB '01), Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.-F. Sagot, “A Highly Scalable Algorithm for the Extraction of CIS-Regulatory Regions,” Proc. Third Asia Pacific Bioinformatics Conf., Y.-P.P. Chen and L. Wong, eds., vol. 1, pp. 273-282, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. F.Y.L. Chin and C.M. Leung, “Voting Algorithms for Discovering Long Motifs,” Proc. Third Asia-Pacific Bioinformatics Conf. (APBC '05), pp. 261-271, Jan. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  5. F.Y.L. Chin and C.M. Leung, “An Efficient Algorithm for String Motif Discovery,” Proc. Fourth Asia-Pacific Bioinformatics Conf. (APBC '06), pp. 79-88, Feb. 2006.Google ScholarGoogle Scholar
  6. J. Davila, S. Balla, and S. Rajasekaran, “Space and Time Efficient Algorithms for Planted Motif Search,” Proc. Second Int'l Workshop Bioinformatics Research and Applications (IWBRA '06), May 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Eskin and P. Pevzner, “Finding Composite Regulatory Patterns in DNA Sequences,” Bioinformatics, vol. S1, pp. 354-363, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. P.A. Evans and A. Smith, “Toward Optimal Motif Enumeration,” Proc. Eighth Int'l Workshop Algorithms and Data Structures (WADS '03), pp. 47-58, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  9. P.A. Evans, A. Smith, and H.T. Wareham, “On the Complexity of Finding Common Approximate Substrings,” Theoretical Computer Science, vol. 306, pp. 407-430, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Li and M. Tompa, “Analysis of Computational Approaches for Motif Discovery,” Algorithms for Molecular Biology, vol. 1, no. 8, May 2006.Google ScholarGoogle Scholar
  11. M. Li, B. Ma, and L. Wang, “On the Closest String and Substring Problems,” J. ACM, vol. 49, no. 2, pp. 157-171, Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Marsan and M.-F. Sagot, “Extracting Structured Motifs Using a Suffix-Tree—Algorithms and Application to Promoter Consensus Identification,” Proc. Int'l Conf. Computational Molecular Biology (RECOMB '00). 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Pavesi, G. Mauri, and G. Pesole, “An Algorithm for Finding Signals of Unknown Length in DNA Sequences,” Bioinformatics, vol. 17, supplement 1, pp. 207-214, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. P. Pevzner and S.-H. Sze, “Combinatorial Approaches to Finding Subtle Signals in DNA Sequences,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 269-278, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Pisanti, A.M. Carvalho, L. Marsan, and M.-F. Sagot, “RISOTTO: Fast Extraction of Motifs with Mismatches,” Proc. Seventh Latin Am. Theoretical Informatics Symp., J.R. Correa, A. Hevia, and M.Kiwi, eds., pp. 757-768, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Price, S. Ramabhadran, and P. Pevzner, “Finding Subtle Motifs by Branching from Sample Strings,” Proc. Second European Conf. Computational Biology (ECCB '03), Bioinformatics, supplementary ed., 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Rajasekaran, S. Balla, and C.-H. Huang, “Exact Algorithms for the Planted Motif Problem,” J. Computational Biology, vol. 12, no. 8, pp. 1117-1128, Oct. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Rajasekaran, “Motif Search Algorithms,” Handbook of Computational Molecular Biology, CRC Press, 2005.Google ScholarGoogle Scholar
  19. M.F. Sagot, “Spelling Approximate Repeated or Common Motifs Using a Suffix Tree,” Proc. Latin American Symp. Theoretical Informatics (LATIN '98), C.L. Lucchesi and A.V. Moura, eds., pp.111-127, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Sinha and M. Tompa, “A Statistical Method for Finding Transcription Factor Binding Sites,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 344-354, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Sze, S. Lu, and J. Chen, “Integrating Sample-Driven and Pattern-Driven Approaches in Motif Finding,” Proc. Fourth Int'l Workshop Algorithms in Bioinformatics (WABI '04), 2004.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Tompa, N. Li, T.L. Bailey, G.M. Church, B. De Moor, E. Eskin, A.V. Favorov, M.C. Frith, Y. Fu, W.J. Kent, V.J. Makeev, A.A. Mironov, W.S. Noble, G. Pavesi, G. Pesole, M. Regnier, N. Simonis, S. Sinha, G. Thijs, J. van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, and Z. Zhu, “Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites,” Nature Biotechnology, vol. 23, no. 1, pp. 137-144, Jan. 2005.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Fast and Practical Algorithms for Planted (l, d) Motif Search

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader