skip to main content
10.1145/332306.332536acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article
Free Access

Optimizing for success: a new score function for distantly related protein sequence comparison

Authors Info & Claims
Published:08 April 2000Publication History

ABSTRACT

The exponential growth of the sequence data produced by the genome projects motivates the development of better ways of inferring structural and functional information about those newly sequenced proteins. Looking for homologies between these probe protein sequences and other protein sequences in the database has proved to be one of the most useful current techniques. This procedure, known as sequence comparison, relies on the use of an appropriate score function that discriminates homologs from non-homologs. Current score functions have difficulty identifying distantly-related homologs with low sequence similarity. As a result, there is an increased demand for a new score function that yields statistically-significant higher scores for all the pairs of homologous protein sequences including such distantly-related homologs. We present a new method for generating a score function by optimizing it for successful discrimination between homologous and unrelated proteins. The new score function (OPTIMA) out-performs other commonly used substitution matrices for the detection of distantly related protein sequences.

References

  1. 1.S.F. Altschul and W. G ish. Local alignment statistics. Methods Enzymol., 266:460-480, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2.S.F. Altschul, W. Gish, W.Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403-410, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W.Miller, and D.J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res., 25:3389-3402, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  4. 4.M. O. Dayhoff, R. M Schwaxtz, and B. C. Orcutt. A model of evolutionary change in proteins. In M. O. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, suppl. 3, page 345. National Biomedical Research Foundation, Washington, D.C., 1978.Google ScholarGoogle Scholar
  5. 5.A. Dembo, S. Karlin, and O. ~Zeitouni. Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob., 22:2022, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  6. 6.G. H. Gonnet, M. A. Cohen, and S. A. Benner. Exhaustive matching of the entire protein database. Science, 256:1443-1445, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7.E. J. Gumbel. Statistics o.f Extremes. Columbia University Press, New York, 1958.Google ScholarGoogle Scholar
  8. 8.E.J. Gumbel. Statistics Theory of Extreme Values and Some Practical Applications. National Bureau of Standards Applied Mathematics Series 33. Washington: U.S. Government Printing Office.Google ScholarGoogle Scholar
  9. 9.S. Henikoff and J. G. Henikoff. Aminacid substitution matrices from protein blocks. Proc. Nat. Acad. Sci., U.S.A., 89:10915- 10919, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10.D. T. Jones, W. R. Taylor, and j. M Thornton. The rapid generation of mutation data matrices from protein sequences. CA B{OS~ 8:275-282: 1992.Google ScholarGoogle Scholar
  11. 11.S. Karlin and S. F. Altschul. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. $ci., U.S.A., 87:2264-2268, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  12. 12.E.V. Koonin, R.L. Tatusov, and M.Y. Galperin. Beyond complete genomes: from sequence to structure and function. Curt. Op. Struc. Bio., 3:355,363, 1998.Google ScholarGoogle Scholar
  13. 13.D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 227:1435-1441, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  14. 14.J. D. Do~~lly, M. S. Jo~o,, Andrej Salt, and T. L. Blundell. Environmentspecific amino-acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci., 1:216-226, 1992.Google ScholarGoogle Scholar
  15. 15.W. R. Pearson and D. J. Lipman. Improved tools for biological sequence analysis. Proc. Nat. Acad. Sci., U.S.A., 85:2444-2448, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  16. 16.J.E. Dennis Jr.and R.B. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Pren~ice-Hall, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195-197, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  18. 18.R.L. Tatusov, E.V. Koonin, and D.J. Lipman. A genomic perspective on protein families. Science, 278:631,637, 1997.Google ScholarGoogle Scholar
  1. Optimizing for success: a new score function for distantly related protein sequence comparison

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biology
          April 2000
          329 pages
          ISBN:1581131860
          DOI:10.1145/332306

          Copyright © 2000 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 April 2000

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate148of538submissions,28%
        • Article Metrics

          • Downloads (Last 12 months)5
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader