ABSTRACT
The prediction of residue solvent accessibility from protein sequences has been studied by various methods. The direct comparison of these methods is impossible due to the variety of datasets used and the difference in structure definition. In this paper we choose 5 classification approaches (decision tree (DT), Support Vector Machine (SVM), Bayesian Statistics (BS), Neural Network (NN) and Multiple Linear Regression (MLR)) for predicting solvent accessibility based on the same dataset and using the same structure definition so that we can directly compare different methods. We evaluate these methods in a cross-validation test on 2148 unique proteins using single sequences and multiple sequences approaches with a cutoff of 20% for two-state definition of solvent accessibility. According to the experiment results, SVM and NN are both the best predictors with accuracy 79%, correlation coefficient 0.59, 2~4% superior to other three methods on multiple sequences prediction. A further test result on a blind test set from Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP5) is consistent with this result. On single sequence prediction, DT, BS and MLR perform about the same at 71~72% with correlation coefficient 0.43. The improvement over the baseline model that use only the identity of target residue is small. Local sequence seems embed very little information on accessibility. Separate training according to protein size improves the prediction when there are sufficiently large dataset available. The consensus prediction combining the 5 approaches is not significantly better than the best single method.
- Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Nucleic Acids Res. 25:3389--3402.Google ScholarCross Ref
- Andrade M. A., O'Donoghue S. I., Rost B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol. 1998; 276:517--525.Google Scholar
- Anfinsen C. B. Principles that govern the folding of protein chains. Science 181:223--230, 1973.Google ScholarCross Ref
- Chan, H. S. and Dill, K. A. (1990) Proc. Natl. Acad. Sci. USA, 87, 6388--6392.Google ScholarCross Ref
- Delamarche C., Guerdoux-Jamet P., Gras R., Nicolas J. Biochimie 81: 1065--1072 (1999).Google ScholarCross Ref
- Ding CHQ, Dubchak I. Bioinformatics 2001; 17:349--358.Google ScholarCross Ref
- Ehrlich, L., Reczko, M., Bohr, H. and Wade, R. C. (1998) Protein Engng, 11, 11--19.Google Scholar
- Furey T. S., Cristianini N., Duffy N., Bednarski D. W., Schummer M., Haussler D. Bioinformatics 2000; 16:906--914.Google ScholarCross Ref
- Holm L. and Sander C. (1996) Mapping the protein universe. Science 273:595--602.Google ScholarCross Ref
- Horton P., Nakai K. Intelligent Systems in Molecular Biology 5:147--152(1997).Google Scholar
- Hua S., Sun Z. J Mol Biol 2001;308:397--407.Google ScholarCross Ref
- Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999. Google ScholarDigital Library
- Jones D. Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol (1999); 292:195--202.Google Scholar
- Kabsch W., Sander C. Dictionary of protein secondary structures: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22: 2577--2637 (1983).Google ScholarCross Ref
- Kyte J. & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105--132.Google Scholar
- Li X., Pan X-M. New methods for accurate prediction of solvent accessibility from protein sequence. Proteins 42:1--5 (2001)Google ScholarCross Ref
- Naderi-Manesh H., Sadeghi M., Arab S., Movahedi AAM. Prediction of protein surface accessibility with information theory. Proteins 42: 452--459 (2001).Google ScholarCross Ref
- Nelson D., Cox M. Lehninger Principles of Biochemistry (3rd ed.) Page 118.Google Scholar
- Ooi T., Oobatake M., Nemethy G., Scheraga H. A., (1987) Accessible suface areas as a measure of the thermodynamics parameters of hydration of peptides. Proc. Natl. Acad. Sci USA 84:3086--3090.Google Scholar
- Richardson C. J., Barlow D. J. The bottom line for prediction of residue solvent accessibility. Protein Eng 12: 1051--1054 (1999).Google ScholarCross Ref
- Rost B., and Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216--226, 1994.Google ScholarCross Ref
- Rost B., Sander C. Progress of 1D protein structure prediction at last. Proteins 23:295--300 (1995).Google Scholar
- Salzberg S. L., Delcher A. L., Fasman K. H., Henderson J. A decision tree system for finding genes in DNA. J. Comput. Biol. 5: 667--680 (1998).Google ScholarCross Ref
- Selbig J., Mevissen T., Lengauer T. Decision tree-based formation of consensus protein secondary structure prediction. Bioinformatics 15: 1039--1046 (1999)Google ScholarCross Ref
- Shan Y., Wang G., Zhou H. Fold recognition and accurate query-template alignment by a combination of Psi-Blast and threading. Proteins 42: 23--37 (2001).Google ScholarCross Ref
- Shrake A., Rupley JA. Environment and exposure to solvent of protein atoms: lysozyme and insulin. J Mol Biol 79: 351--371 (1973).Google ScholarCross Ref
- Thompson M. J., Goldstein RA. Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 25: 38--47 (1996).Google ScholarCross Ref
- Varadarajan R., Nagarajaram H. A. and Ramakrishnan C(1996) Proc. Natl. Acad. Sci. USA 93, 13908--13913.Google Scholar
- Yuan Z., Burrage K., Mattick J. Prediction of protein solvent accessibility using support vector machines. Proteins 48:566--570 (2002).Google ScholarCross Ref
Index Terms
- Classification comparison of prediction of solvent accessibility from protein sequences
Recommendations
Weave amino acid sequences for protein secondary structure prediction
DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discoveryGiven a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the ...
Computational analysis of N-H…π interactions and its impact on the structural stability of β-lactamases
Studies on intra-protein interactions provide valuable information on protein conformation. The aim of our study is to explore the functional importance of residues participating in N-H...@p hydrogen bonds in maintaining the conformational stability of @...
Comments