ABSTRACT
The exponential growth of the sequence data produced by the genome projects motivates the development of better ways of inferring structural and functional information about those newly sequenced proteins. Looking for homologies between these probe protein sequences and other protein sequences in the database has proved to be one of the most useful current techniques. This procedure, known as sequence comparison, relies on the use of an appropriate score function that discriminates homologs from non-homologs. Current score functions have difficulty identifying distantly-related homologs with low sequence similarity. As a result, there is an increased demand for a new score function that yields statistically-significant higher scores for all the pairs of homologous protein sequences including such distantly-related homologs. We present a new method for generating a score function by optimizing it for successful discrimination between homologous and unrelated proteins. The new score function (OPTIMA) out-performs other commonly used substitution matrices for the detection of distantly related protein sequences.
- 1.S.F. Altschul and W. G ish. Local alignment statistics. Methods Enzymol., 266:460-480, 1996.Google ScholarCross Ref
- 2.S.F. Altschul, W. Gish, W.Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403-410, 1990.Google ScholarCross Ref
- 3.S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W.Miller, and D.J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res., 25:3389-3402, 1997.Google ScholarCross Ref
- 4.M. O. Dayhoff, R. M Schwaxtz, and B. C. Orcutt. A model of evolutionary change in proteins. In M. O. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, suppl. 3, page 345. National Biomedical Research Foundation, Washington, D.C., 1978.Google Scholar
- 5.A. Dembo, S. Karlin, and O. ~Zeitouni. Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob., 22:2022, 1994.Google ScholarCross Ref
- 6.G. H. Gonnet, M. A. Cohen, and S. A. Benner. Exhaustive matching of the entire protein database. Science, 256:1443-1445, 1992.Google ScholarCross Ref
- 7.E. J. Gumbel. Statistics o.f Extremes. Columbia University Press, New York, 1958.Google Scholar
- 8.E.J. Gumbel. Statistics Theory of Extreme Values and Some Practical Applications. National Bureau of Standards Applied Mathematics Series 33. Washington: U.S. Government Printing Office.Google Scholar
- 9.S. Henikoff and J. G. Henikoff. Aminacid substitution matrices from protein blocks. Proc. Nat. Acad. Sci., U.S.A., 89:10915- 10919, 1992.Google ScholarCross Ref
- 10.D. T. Jones, W. R. Taylor, and j. M Thornton. The rapid generation of mutation data matrices from protein sequences. CA B{OS~ 8:275-282: 1992.Google Scholar
- 11.S. Karlin and S. F. Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. $ci., U.S.A., 87:2264-2268, 1990.Google ScholarCross Ref
- 12.E.V. Koonin, R.L. Tatusov, and M.Y. Galperin. Beyond complete genomes: from sequence to structure and function. Curt. Op. Struc. Bio., 3:355,363, 1998.Google Scholar
- 13.D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 227:1435-1441, 1985.Google ScholarCross Ref
- 14.J. D. Do~~lly, M. S. Jo~o,, Andrej Salt, and T. L. Blundell. Environmentspecific amino-acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci., 1:216-226, 1992.Google Scholar
- 15.W. R. Pearson and D. J. Lipman. Improved tools for biological sequence analysis. Proc. Nat. Acad. Sci., U.S.A., 85:2444-2448, 1988.Google ScholarCross Ref
- 16.J.E. Dennis Jr.and R.B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Pren~ice-Hall, New York, 1983. Google ScholarDigital Library
- 17.T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195-197, 1981.Google ScholarCross Ref
- 18.R.L. Tatusov, E.V. Koonin, and D.J. Lipman. A genomic perspective on protein families. Science, 278:631,637, 1997.Google Scholar
- Optimizing for success: a new score function for distantly related protein sequence comparison
Recommendations
Optimizing ethanol production selectivity
Lactococcus lactis metabolizes glucose homofermentatively to lactate. However, after disruption of the gene coding for lactate dehydrogenase, LDH, a key enzyme in NAD^+ regeneration, the glycolytic flux shifts from homolactic to mixed-acid fermentation ...
Comments