skip to main content
10.5555/976520.976566dlproceedingsArticle/Chapter ViewAbstractPublication PagesapbcConference Proceedingsconference-collections
Article
Free Access

Classification comparison of prediction of solvent accessibility from protein sequences

Published:01 January 2004Publication History

ABSTRACT

The prediction of residue solvent accessibility from protein sequences has been studied by various methods. The direct comparison of these methods is impossible due to the variety of datasets used and the difference in structure definition. In this paper we choose 5 classification approaches (decision tree (DT), Support Vector Machine (SVM), Bayesian Statistics (BS), Neural Network (NN) and Multiple Linear Regression (MLR)) for predicting solvent accessibility based on the same dataset and using the same structure definition so that we can directly compare different methods. We evaluate these methods in a cross-validation test on 2148 unique proteins using single sequences and multiple sequences approaches with a cutoff of 20% for two-state definition of solvent accessibility. According to the experiment results, SVM and NN are both the best predictors with accuracy 79%, correlation coefficient 0.59, 2~4% superior to other three methods on multiple sequences prediction. A further test result on a blind test set from Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP5) is consistent with this result. On single sequence prediction, DT, BS and MLR perform about the same at 71~72% with correlation coefficient 0.43. The improvement over the baseline model that use only the identity of target residue is small. Local sequence seems embed very little information on accessibility. Separate training according to protein size improves the prediction when there are sufficiently large dataset available. The consensus prediction combining the 5 approaches is not significantly better than the best single method.

References

  1. Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Nucleic Acids Res. 25:3389--3402.Google ScholarGoogle ScholarCross RefCross Ref
  2. Andrade M. A., O'Donoghue S. I., Rost B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol. 1998; 276:517--525.Google ScholarGoogle Scholar
  3. Anfinsen C. B. Principles that govern the folding of protein chains. Science 181:223--230, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  4. Chan, H. S. and Dill, K. A. (1990) Proc. Natl. Acad. Sci. USA, 87, 6388--6392.Google ScholarGoogle ScholarCross RefCross Ref
  5. Delamarche C., Guerdoux-Jamet P., Gras R., Nicolas J. Biochimie 81: 1065--1072 (1999).Google ScholarGoogle ScholarCross RefCross Ref
  6. Ding CHQ, Dubchak I. Bioinformatics 2001; 17:349--358.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ehrlich, L., Reczko, M., Bohr, H. and Wade, R. C. (1998) Protein Engng, 11, 11--19.Google ScholarGoogle Scholar
  8. Furey T. S., Cristianini N., Duffy N., Bednarski D. W., Schummer M., Haussler D. Bioinformatics 2000; 16:906--914.Google ScholarGoogle ScholarCross RefCross Ref
  9. Holm L. and Sander C. (1996) Mapping the protein universe. Science 273:595--602.Google ScholarGoogle ScholarCross RefCross Ref
  10. Horton P., Nakai K. Intelligent Systems in Molecular Biology 5:147--152(1997).Google ScholarGoogle Scholar
  11. Hua S., Sun Z. J Mol Biol 2001;308:397--407.Google ScholarGoogle ScholarCross RefCross Ref
  12. Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jones D. Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol (1999); 292:195--202.Google ScholarGoogle Scholar
  14. Kabsch W., Sander C. Dictionary of protein secondary structures: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22: 2577--2637 (1983).Google ScholarGoogle ScholarCross RefCross Ref
  15. Kyte J. & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105--132.Google ScholarGoogle Scholar
  16. Li X., Pan X-M. New methods for accurate prediction of solvent accessibility from protein sequence. Proteins 42:1--5 (2001)Google ScholarGoogle ScholarCross RefCross Ref
  17. Naderi-Manesh H., Sadeghi M., Arab S., Movahedi AAM. Prediction of protein surface accessibility with information theory. Proteins 42: 452--459 (2001).Google ScholarGoogle ScholarCross RefCross Ref
  18. Nelson D., Cox M. Lehninger Principles of Biochemistry (3rd ed.) Page 118.Google ScholarGoogle Scholar
  19. Ooi T., Oobatake M., Nemethy G., Scheraga H. A., (1987) Accessible suface areas as a measure of the thermodynamics parameters of hydration of peptides. Proc. Natl. Acad. Sci USA 84:3086--3090.Google ScholarGoogle Scholar
  20. Richardson C. J., Barlow D. J. The bottom line for prediction of residue solvent accessibility. Protein Eng 12: 1051--1054 (1999).Google ScholarGoogle ScholarCross RefCross Ref
  21. Rost B., and Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216--226, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  22. Rost B., Sander C. Progress of 1D protein structure prediction at last. Proteins 23:295--300 (1995).Google ScholarGoogle Scholar
  23. Salzberg S. L., Delcher A. L., Fasman K. H., Henderson J. A decision tree system for finding genes in DNA. J. Comput. Biol. 5: 667--680 (1998).Google ScholarGoogle ScholarCross RefCross Ref
  24. Selbig J., Mevissen T., Lengauer T. Decision tree-based formation of consensus protein secondary structure prediction. Bioinformatics 15: 1039--1046 (1999)Google ScholarGoogle ScholarCross RefCross Ref
  25. Shan Y., Wang G., Zhou H. Fold recognition and accurate query-template alignment by a combination of Psi-Blast and threading. Proteins 42: 23--37 (2001).Google ScholarGoogle ScholarCross RefCross Ref
  26. Shrake A., Rupley JA. Environment and exposure to solvent of protein atoms: lysozyme and insulin. J Mol Biol 79: 351--371 (1973).Google ScholarGoogle ScholarCross RefCross Ref
  27. Thompson M. J., Goldstein RA. Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 25: 38--47 (1996).Google ScholarGoogle ScholarCross RefCross Ref
  28. Varadarajan R., Nagarajaram H. A. and Ramakrishnan C(1996) Proc. Natl. Acad. Sci. USA 93, 13908--13913.Google ScholarGoogle Scholar
  29. Yuan Z., Burrage K., Mattick J. Prediction of protein solvent accessibility using support vector machines. Proteins 48:566--570 (2002).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Classification comparison of prediction of solvent accessibility from protein sequences

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader