Article

Free Access

Classification comparison of prediction of solvent accessibility from protein sequences

Authors:
Huiling Chen

Drexel University, Philadelphia, Pennsylvania

Drexel University, Philadelphia, Pennsylvania
View Profile

,
Huan-Xiang Zhou

Florida State University, Tallahassee, Florida

Florida State University, Tallahassee, Florida
View Profile

,
Xiaohua Hu

Drexel University, Philadelphia, Pennsylvania

Drexel University, Philadelphia, Pennsylvania
View Profile

,
Illhoi Yoo

Drexel University, Philadelphia, Pennsylvania

Drexel University, Philadelphia, Pennsylvania
View Profile

Authors Info & Claims

APBC '04: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29January 2004Pages 333–338

Published:01 January 2004Publication History

APBC '04: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29

Pages 333–338

ABSTRACT

The prediction of residue solvent accessibility from protein sequences has been studied by various methods. The direct comparison of these methods is impossible due to the variety of datasets used and the difference in structure definition. In this paper we choose 5 classification approaches (decision tree (DT), Support Vector Machine (SVM), Bayesian Statistics (BS), Neural Network (NN) and Multiple Linear Regression (MLR)) for predicting solvent accessibility based on the same dataset and using the same structure definition so that we can directly compare different methods. We evaluate these methods in a cross-validation test on 2148 unique proteins using single sequences and multiple sequences approaches with a cutoff of 20% for two-state definition of solvent accessibility. According to the experiment results, SVM and NN are both the best predictors with accuracy 79%, correlation coefficient 0.59, 2~4% superior to other three methods on multiple sequences prediction. A further test result on a blind test set from Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP5) is consistent with this result. On single sequence prediction, DT, BS and MLR perform about the same at 71~72% with correlation coefficient 0.43. The improvement over the baseline model that use only the identity of target residue is small. Local sequence seems embed very little information on accessibility. Separate training according to protein size improves the prediction when there are sufficiently large dataset available. The consensus prediction combining the 5 approaches is not significantly better than the best single method.

References

Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Nucleic Acids Res. 25:3389--3402.Google ScholarCross Ref
Andrade M. A., O'Donoghue S. I., Rost B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol. 1998; 276:517--525.Google Scholar
Anfinsen C. B. Principles that govern the folding of protein chains. Science 181:223--230, 1973.Google ScholarCross Ref
Chan, H. S. and Dill, K. A. (1990) Proc. Natl. Acad. Sci. USA, 87, 6388--6392.Google ScholarCross Ref
Delamarche C., Guerdoux-Jamet P., Gras R., Nicolas J. Biochimie 81: 1065--1072 (1999).Google ScholarCross Ref
Ding CHQ, Dubchak I. Bioinformatics 2001; 17:349--358.Google ScholarCross Ref
Ehrlich, L., Reczko, M., Bohr, H. and Wade, R. C. (1998) Protein Engng, 11, 11--19.Google Scholar
Furey T. S., Cristianini N., Duffy N., Bednarski D. W., Schummer M., Haussler D. Bioinformatics 2000; 16:906--914.Google ScholarCross Ref
Holm L. and Sander C. (1996) Mapping the protein universe. Science 273:595--602.Google ScholarCross Ref
Horton P., Nakai K. Intelligent Systems in Molecular Biology 5:147--152(1997).Google Scholar
Hua S., Sun Z. J Mol Biol 2001;308:397--407.Google ScholarCross Ref
Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999. Google ScholarDigital Library
Jones D. Protein secondary structure prediction based on position-specific scoring matrices, J. Mol. Biol (1999); 292:195--202.Google Scholar
Kabsch W., Sander C. Dictionary of protein secondary structures: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22: 2577--2637 (1983).Google ScholarCross Ref
Kyte J. & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105--132.Google Scholar
Li X., Pan X-M. New methods for accurate prediction of solvent accessibility from protein sequence. Proteins 42:1--5 (2001)Google ScholarCross Ref
Naderi-Manesh H., Sadeghi M., Arab S., Movahedi AAM. Prediction of protein surface accessibility with information theory. Proteins 42: 452--459 (2001).Google ScholarCross Ref
Nelson D., Cox M. Lehninger Principles of Biochemistry (3rd ed.) Page 118.Google Scholar
Ooi T., Oobatake M., Nemethy G., Scheraga H. A., (1987) Accessible suface areas as a measure of the thermodynamics parameters of hydration of peptides. Proc. Natl. Acad. Sci USA 84:3086--3090.Google Scholar
Richardson C. J., Barlow D. J. The bottom line for prediction of residue solvent accessibility. Protein Eng 12: 1051--1054 (1999).Google ScholarCross Ref
Rost B., and Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216--226, 1994.Google ScholarCross Ref
Rost B., Sander C. Progress of 1D protein structure prediction at last. Proteins 23:295--300 (1995).Google Scholar
Salzberg S. L., Delcher A. L., Fasman K. H., Henderson J. A decision tree system for finding genes in DNA. J. Comput. Biol. 5: 667--680 (1998).Google ScholarCross Ref
Selbig J., Mevissen T., Lengauer T. Decision tree-based formation of consensus protein secondary structure prediction. Bioinformatics 15: 1039--1046 (1999)Google ScholarCross Ref
Shan Y., Wang G., Zhou H. Fold recognition and accurate query-template alignment by a combination of Psi-Blast and threading. Proteins 42: 23--37 (2001).Google ScholarCross Ref
Shrake A., Rupley JA. Environment and exposure to solvent of protein atoms: lysozyme and insulin. J Mol Biol 79: 351--371 (1973).Google ScholarCross Ref
Thompson M. J., Goldstein RA. Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 25: 38--47 (1996).Google ScholarCross Ref
Varadarajan R., Nagarajaram H. A. and Ramakrishnan C(1996) Proc. Natl. Acad. Sci. USA 93, 13908--13913.Google Scholar
Yuan Z., Burrage K., Mattick J. Prediction of protein solvent accessibility using support vector machines. Proteins 48:566--570 (2002).Google ScholarCross Ref

Index Terms

Classification comparison of prediction of solvent accessibility from protein sequences

Recommendations

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction
Read More
Weave amino acid sequences for protein secondary structure prediction
DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Given a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the ...
Read More
Computational analysis of N-H…π interactions and its impact on the structural stability of β-lactamases

Studies on intra-protein interactions provide valuable information on protein conformation. The aim of our study is to explore the functional importance of residues participating in N-H...@p hydrogen bonds in maintaining the conformational stability of @...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
APBC '04: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
January 2004
340 pages
Editor:
Yi-Ping Phoebe Chen
Sponsors
In-Cooperation
Publisher
Australian Computer Society, Inc.
Australia
Publication History
- Published: 1 January 2004
Author Tags
classification comparison
ensemble prediction
protein structure prediction
solvent accessibility
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 334
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Classification comparison of prediction of solvent accessibility from protein sequences

APBC '04: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

Weave amino acid sequences for protein secondary structure prediction

Computational analysis of N-H…π interactions and its impact on the structural stability of β-lactamases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Classification comparison of prediction of solvent accessibility from protein sequences

APBC '04: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

Weave amino acid sequences for protein secondary structure prediction

Computational analysis of N-H…π interactions and its impact on the structural stability of β-lactamases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media