Abstract
We consider the planted (l, d) motif search problem, which consists of finding a substring of length l that occurs in a set of input sequences {s1, . . . , sn} with up to d errors, a problem that arises from the need to find transcription factor-binding sites in genomic information. We propose a sequence of practical algorithms, which start based on the ideas considered in PMS1. These algorithms are exact, have little space requirements, and are able to tackle challenging instances with bigger d, taking less time in the instances reported solved by exact algorithms. In particular, one of the proposed algorithms, PMSprune, is able to solve the challenging instances, such as (17, 6) and (19, 7), which were not previously reported as solved in the literature.
- M. Blanchette, “Algorithms for Phylogenetic Footprinting,” Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB '01), Apr. 2001. Google ScholarDigital Library
- J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB '01), Apr. 2001. Google ScholarDigital Library
- A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.-F. Sagot, “A Highly Scalable Algorithm for the Extraction of CIS-Regulatory Regions,” Proc. Third Asia Pacific Bioinformatics Conf., Y.-P.P. Chen and L. Wong, eds., vol. 1, pp. 273-282, 2005.Google ScholarCross Ref
- F.Y.L. Chin and C.M. Leung, “Voting Algorithms for Discovering Long Motifs,” Proc. Third Asia-Pacific Bioinformatics Conf. (APBC '05), pp. 261-271, Jan. 2005.Google ScholarCross Ref
- F.Y.L. Chin and C.M. Leung, “An Efficient Algorithm for String Motif Discovery,” Proc. Fourth Asia-Pacific Bioinformatics Conf. (APBC '06), pp. 79-88, Feb. 2006.Google Scholar
- J. Davila, S. Balla, and S. Rajasekaran, “Space and Time Efficient Algorithms for Planted Motif Search,” Proc. Second Int'l Workshop Bioinformatics Research and Applications (IWBRA '06), May 2006.Google ScholarDigital Library
- E. Eskin and P. Pevzner, “Finding Composite Regulatory Patterns in DNA Sequences,” Bioinformatics, vol. S1, pp. 354-363, 2002.Google ScholarCross Ref
- P.A. Evans and A. Smith, “Toward Optimal Motif Enumeration,” Proc. Eighth Int'l Workshop Algorithms and Data Structures (WADS '03), pp. 47-58, 2003.Google ScholarCross Ref
- P.A. Evans, A. Smith, and H.T. Wareham, “On the Complexity of Finding Common Approximate Substrings,” Theoretical Computer Science, vol. 306, pp. 407-430, 2003. Google ScholarDigital Library
- N. Li and M. Tompa, “Analysis of Computational Approaches for Motif Discovery,” Algorithms for Molecular Biology, vol. 1, no. 8, May 2006.Google Scholar
- M. Li, B. Ma, and L. Wang, “On the Closest String and Substring Problems,” J. ACM, vol. 49, no. 2, pp. 157-171, Mar. 2002. Google ScholarDigital Library
- L. Marsan and M.-F. Sagot, “Extracting Structured Motifs Using a Suffix-Tree—Algorithms and Application to Promoter Consensus Identification,” Proc. Int'l Conf. Computational Molecular Biology (RECOMB '00). 2000. Google ScholarDigital Library
- G. Pavesi, G. Mauri, and G. Pesole, “An Algorithm for Finding Signals of Unknown Length in DNA Sequences,” Bioinformatics, vol. 17, supplement 1, pp. 207-214, 2001.Google ScholarCross Ref
- P. Pevzner and S.-H. Sze, “Combinatorial Approaches to Finding Subtle Signals in DNA Sequences,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 269-278, 2000. Google ScholarDigital Library
- N. Pisanti, A.M. Carvalho, L. Marsan, and M.-F. Sagot, “RISOTTO: Fast Extraction of Motifs with Mismatches,” Proc. Seventh Latin Am. Theoretical Informatics Symp., J.R. Correa, A. Hevia, and M.Kiwi, eds., pp. 757-768, 2006. Google ScholarDigital Library
- A. Price, S. Ramabhadran, and P. Pevzner, “Finding Subtle Motifs by Branching from Sample Strings,” Proc. Second European Conf. Computational Biology (ECCB '03), Bioinformatics, supplementary ed., 2003.Google ScholarCross Ref
- S. Rajasekaran, S. Balla, and C.-H. Huang, “Exact Algorithms for the Planted Motif Problem,” J. Computational Biology, vol. 12, no. 8, pp. 1117-1128, Oct. 2005.Google ScholarCross Ref
- S. Rajasekaran, “Motif Search Algorithms,” Handbook of Computational Molecular Biology, CRC Press, 2005.Google Scholar
- M.F. Sagot, “Spelling Approximate Repeated or Common Motifs Using a Suffix Tree,” Proc. Latin American Symp. Theoretical Informatics (LATIN '98), C.L. Lucchesi and A.V. Moura, eds., pp.111-127, 1998. Google ScholarDigital Library
- S. Sinha and M. Tompa, “A Statistical Method for Finding Transcription Factor Binding Sites,” Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 344-354, Aug. 2000. Google ScholarDigital Library
- S. Sze, S. Lu, and J. Chen, “Integrating Sample-Driven and Pattern-Driven Approaches in Motif Finding,” Proc. Fourth Int'l Workshop Algorithms in Bioinformatics (WABI '04), 2004.Google ScholarCross Ref
- M. Tompa, N. Li, T.L. Bailey, G.M. Church, B. De Moor, E. Eskin, A.V. Favorov, M.C. Frith, Y. Fu, W.J. Kent, V.J. Makeev, A.A. Mironov, W.S. Noble, G. Pavesi, G. Pesole, M. Regnier, N. Simonis, S. Sinha, G. Thijs, J. van Helden, M. Vandenbogaert, Z. Weng, C. Workman, C. Ye, and Z. Zhu, “Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites,” Nature Biotechnology, vol. 23, no. 1, pp. 137-144, Jan. 2005.Google ScholarCross Ref
Index Terms
- Fast and Practical Algorithms for Planted (l, d) Motif Search
Recommendations
An improved voting algorithm for planted (l,d) motif search
The planted motif search problem is a classical problem in bioinformatics that seeks to identify meaningful patterns in biological sequences. As an NP-complete problem, current algorithms focus on improving the average time complexity and solving ...
An Improved Heuristic Algorithm for Finding Motif Signals in DNA Sequences
The planted (l,d)-motif search problem is a mathematical abstraction of the DNA functional site discovery task. In this paper, we propose a heuristic algorithm that can find planted (l,d)-signals in a given set of DNA sequences. Evaluations on simulated ...
Comments