skip to main content
Computational approaches for motif-finding in dna sequences
Publisher:
  • University of South Dakota
  • 414 East Clark St. Vermillion, SD
  • United States
ISBN:978-0-549-95285-5
Order Number:AAI3340611
Pages:
89
Bibliometrics
Skip Abstract Section
Abstract

Identification of DNA motifs is one of the core problems in computational biology. Significant advances have been made and many different motif discovery approaches have been developed. However, various problems exist in these approaches.

In this dissertation, we investigate, develop, implement, and analyze new computational approaches for motif-finding in DNA sequences from three different perspectives. First, we design a single motif model for identifying and refining individual regulatory elements; second, we develop a composite motif model to study pairs of motifs; finally, we develop a computational algorithm for identifying motif clusters.

In the single motif model, we formulate several new statistical measures for the characterization of motifs with high density and positional bias. We develop, implement, and test a workflow that integrates multiple algorithms and additional sources of information for motif-finding, motif-selection, validation, and clustering. By applying this workflow to human promoter sequences, we obtain 12 candidate motifs with high density and positional bias, including 6 previously known transcription factor binding sites and 6 new motifs.

In the composite motif model, we study the occurrence patterns of motifs using a binary matrix representation and some new ideas that describe relationships between motifs. Although the non-coexistence of motifs has the same important contextual effect as the co-occurrence, it has not been studied thus far by current researchers. Our model considers the coexistence as well as the non-coexistence of motifs. Our results provide a genomic view of the relationship among pairs of motifs.

In the study of clusters of motifs, we choose to explore CpG islands (CGIs). CGIs play a fundamental role in genome analysis and annotation, and contribute to improving the accuracy of promoter prediction. We propose a novel algorithm called CpG Island Finder (CpGIF). Five existing public tools for CpG island searching are used to compare with CpGIF for the assessment of accuracy and computational efficiency. According to the length and accuracy of CGIs predicted and the running time needed, our algorithm is superior to other existing CpG island detection programs.

Contributors
  • University of South Dakota

Recommendations