skip to main content
Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction
Publisher:
  • Iowa State University
  • 2121 S. State Ave. Ames, IA
  • United States
ISBN:978-1-124-70120-2
Order Number:AAI3458335
Pages:
218
Bibliometrics
Skip Abstract Section
Abstract

Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein sequences, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures.

Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately. A novel method called Fast Learning Optimized PREDiction ( FLOPRED ) Methodology is proposed for predicting protein secondary structures and other features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine ( ELM ) and advanced Particle Swarm Optimization ( PSO ) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93:33% and a testing accuracy of 92:24% with a standard deviation of 0:48% obtained for a small group of 84 proteins. We have a Matthew’s correlation-coefficient ranging between 80:58% and 84:30% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0:3% and 2:9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86:5% with a standard deviation of 1:38%. These results are significantly higher than those found in the literature.

Prediction of protein secondary structure based on amino acid sequence is a common technique used to predict its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm ( GA ) for feature selection, Principal Component Analysis ( PCA ) for feature reduction and FLOPRED for classification give promising results.

Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of ý-helix, β-strand and coil.

A study on RSA prediction using ELM shows large gains in processing speed compared to using support vector machines for classification. This indicates that ELM yields a distinct advantage in terms of processing speed and performance for RSA. Additional gains in accuracies are possible when the more advanced FLOPRED algorithm and PSO optimization are implemented.

Phosphorylation is a post-translational modification on proteins often controls and regulates their activities. It is an important mechanism for regulation. Phosphorylated sites are known to be present often in intrinsically disordered regions of proteins lacking unique tertiary structures, and thus less information is available about the structures of phosphorylated sites. It is important to be able to computationally predict phosphorylation sites in protein sequences obtained from mass-scale sequencing of genomes. Phosphorylation sites may aid in the determination of the functions of a protein and to better understanding the mechanisms of protein functions in healthy and diseased states. FLOPRED is used to model and predict experimentally determined phosphorylation sites in protein sequences. Our new PSO optimization included in FLOPRED enable the prediction of phosphorylation sites with higher accuracy and with better generalization.Our preliminary studies on 984 sequences demonstrate that this model can predict phosphorylation sites with a training accuracy of 92:53%, a testing accuracy 91:42% and Matthew’s correlation coefficient of 83:9%. (Abstract shortened by UMI.)

Contributors
  • Iowa State University
  • Iowa State University

Recommendations