Optimizing for success: a new score function for distantly related protein sequence comparison

Authors:
Maricel Kann

Department of Chemistry, University of Michigan, Ann Arbor, MI

Department of Chemistry, University of Michigan, Ann Arbor, MI
View Profile

,
Richard A. Goldstein

Biophysics Research Division, University of Michigan, Ann Arbor, MI

Biophysics Research Division, University of Michigan, Ann Arbor, MI
View Profile

RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biologyApril 2000Pages 177–182https://doi.org/10.1145/332306.332536

Published:08 April 2000Publication History

RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biology

Pages 177–182

ABSTRACT

The exponential growth of the sequence data produced by the genome projects motivates the development of better ways of inferring structural and functional information about those newly sequenced proteins. Looking for homologies between these probe protein sequences and other protein sequences in the database has proved to be one of the most useful current techniques. This procedure, known as sequence comparison, relies on the use of an appropriate score function that discriminates homologs from non-homologs. Current score functions have difficulty identifying distantly-related homologs with low sequence similarity. As a result, there is an increased demand for a new score function that yields statistically-significant higher scores for all the pairs of homologous protein sequences including such distantly-related homologs. We present a new method for generating a score function by optimizing it for successful discrimination between homologous and unrelated proteins. The new score function (OPTIMA) out-performs other commonly used substitution matrices for the detection of distantly related protein sequences.

References

1.S.F. Altschul and W. G ish. Local alignment statistics. Methods Enzymol., 266:460-480, 1996.Google ScholarCross Ref
2.S.F. Altschul, W. Gish, W.Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403-410, 1990.Google ScholarCross Ref
3.S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W.Miller, and D.J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res., 25:3389-3402, 1997.Google ScholarCross Ref
4.M. O. Dayhoff, R. M Schwaxtz, and B. C. Orcutt. A model of evolutionary change in proteins. In M. O. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, suppl. 3, page 345. National Biomedical Research Foundation, Washington, D.C., 1978.Google Scholar
5.A. Dembo, S. Karlin, and O. ~Zeitouni. Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob., 22:2022, 1994.Google ScholarCross Ref
6.G. H. Gonnet, M. A. Cohen, and S. A. Benner. Exhaustive matching of the entire protein database. Science, 256:1443-1445, 1992.Google ScholarCross Ref
7.E. J. Gumbel. Statistics o.f Extremes. Columbia University Press, New York, 1958.Google Scholar
8.E.J. Gumbel. Statistics Theory of Extreme Values and Some Practical Applications. National Bureau of Standards Applied Mathematics Series 33. Washington: U.S. Government Printing Office.Google Scholar
9.S. Henikoff and J. G. Henikoff. Aminacid substitution matrices from protein blocks. Proc. Nat. Acad. Sci., U.S.A., 89:10915- 10919, 1992.Google ScholarCross Ref
10.D. T. Jones, W. R. Taylor, and j. M Thornton. The rapid generation of mutation data matrices from protein sequences. CA B{OS~ 8:275-282: 1992.Google Scholar
11.S. Karlin and S. F. Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. $ci., U.S.A., 87:2264-2268, 1990.Google ScholarCross Ref
12.E.V. Koonin, R.L. Tatusov, and M.Y. Galperin. Beyond complete genomes: from sequence to structure and function. Curt. Op. Struc. Bio., 3:355,363, 1998.Google Scholar
13.D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 227:1435-1441, 1985.Google ScholarCross Ref
14.J. D. Do~~lly, M. S. Jo~o,, Andrej Salt, and T. L. Blundell. Environmentspecific amino-acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci., 1:216-226, 1992.Google Scholar
15.W. R. Pearson and D. J. Lipman. Improved tools for biological sequence analysis. Proc. Nat. Acad. Sci., U.S.A., 85:2444-2448, 1988.Google ScholarCross Ref
16.J.E. Dennis Jr.and R.B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Pren~ice-Hall, New York, 1983. Google ScholarDigital Library
17.T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195-197, 1981.Google ScholarCross Ref
18.R.L. Tatusov, E.V. Koonin, and D.J. Lipman. A genomic perspective on protein families. Science, 278:631,637, 1997.Google Scholar

Optimizing for success: a new score function for distantly related protein sequence comparison
1. Applied computing
  1. Life and medical sciences

Recommendations

In Silico Tools to Aid Medicinal Chemistry : Optimising Bromodomain Inhibitors
Read More
Optimising Solvent Production in Clostridium Saccharoperbutylacetonicum N1-4(Hmt)
Read More
Optimizing ethanol production selectivity

Lactococcus lactis metabolizes glucose homofermentatively to lactate. However, after disruption of the gene coding for lactate dehydrogenase, LDH, a key enzyme in NAD^+ regeneration, the glycolytic flux shifts from homolactic to mixed-acid fermentation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biology
April 2000
329 pages
ISBN:1581131860
DOI:10.1145/332306
Editors:
Ron Shamir
Tel-Aviv Univ., Israel
,
Satoru Miyano
Univ. of Tokyo, Tokyo, Japan
,
Sorin Istrail
Sandia National Labs
,
Pavel Pevzner
Univ. of Southern California
,
Michael Waterman
Univ. of Southern California
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 April 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate148of538submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 265
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing for success: a new score function for distantly related protein sequence comparison

RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biology

ABSTRACT

References

Cited By

Recommendations

In Silico Tools to Aid Medicinal Chemistry : Optimising Bromodomain Inhibitors

Optimising Solvent Production in Clostridium Saccharoperbutylacetonicum N1-4(Hmt)

Optimizing ethanol production selectivity