ABSTRACT
Structural alignments are the most widely used tools for comparing proteins with low sequence similarity. The main contribution of this paper is to derive various kernels on proteins from structural alignments, which do not use sequence information. Central to the kernels is a novel alignment algorithm which matches substructures of fixed size using spectral graph matching techniques. We derive positive semi-definite kernels which capture the notion of similarity between substructures. Using these as base more sophisticated kernels on protein structures are proposed. To empirically evaluate the kernels we used a 40% sequence non-redundant structures from 15 different SCOP superfamilies. The kernels when used with SVMs show competitive performance with CE, a state of the art structure comparison program.
- Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., & Bourne, P. E. (2000). The protein data bank. Nucleic Acids Research, 28, 235--242.Google ScholarCross Ref
- Bertsimas, D., & Tsitsiklis, J. (1997). Introduction to linear optimization. Athena Scientific. Google ScholarDigital Library
- Bhattacharya, S., Bhattacharyya, C., & Chandra, N. (2006). Projections for fast protein structure retrieval. BMC Bioinformatics, 7 suppl., 5:S5.Google Scholar
- Bourne, P. E., & Shindyalov, I. N. (1998). Protein structure alignment by incremental combinatorial extension of optimal path. Protein Engineering, 11, 739--747.Google ScholarCross Ref
- Bourne, P. E., & Shindyalov, I. N. (2003). Protein structure comparison and alignment. In P. E. Bourne and H. Weissig (Eds.), Structural bioinformatics, 321--337. Wiley-Liss.Google Scholar
- Eidhammer, I., Jonassen, I., & Taylor, W. R. (2000). Structure comparison and structure patterns. Journal of Computational Biology, 7, 685--716.Google ScholarCross Ref
- Haussler, D. (1999). Convolution kernels on discrete structures (Technical Report). University of California, Santa Cruz.Google Scholar
- Holm, L., & Sander, C. (1993). Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 233, 123--138.Google ScholarCross Ref
- Horn, B. K. P. (1987). Closed form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America, 4, 629--642.Google ScholarCross Ref
- J.-P. Vert, H. Saigo, T. A. (2004). Kernel methods in computational biology, chapter Local alignment kernels for biological sequences, 131--154. MIT Press.Google Scholar
- Jaakkola, T., Diekhaus, M., & Haussler, D. (1999). Using the fisher kernel method to detect remote protein homologies. 7th Intell. Sys. Mol. Biol., 149--158. Google ScholarDigital Library
- Leslie, C., & Kwang, R. (2004). Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research, 5, 1435 -- 1455. Google ScholarDigital Library
- Murzin, A. G., Brenner, S. E., Hubbard, T., & Chothia, C. (1995). Scop: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247, 536--540.Google ScholarCross Ref
- Pardalos, P., Rendl, F., & Wolkowicz, H. (1994). The quadratic assignment problem: a survey and recent developments. In P. Pardalos and H. Wolkowicz (Eds.), Quadratic assignment and related problems (new brunswick, NJ, 1993), 1--42. Providence, RI: Amer. Math. Soc.Google Scholar
- Schölkopf, B., Weston, J., Eskin, E., Leslie, C. S., & Noble, W. S. (2002). A kernel approach for learning from almost orthogonal patterns. ECML (pp. 511--528). Google ScholarDigital Library
- Umeyama, S. (1988). An eigendecomposition approach to weighted graph matching problems. IEEE transactions on pattern analysis and machine intelligence, 10, 695--703. Google ScholarDigital Library
- Wang, C., & Scott, S. D. (2005). New kernels for protein structural notif discovery and function classification. International Conference on Machine Learning. Google ScholarDigital Library
- Structural alignment based kernels for protein structure classification
Recommendations
Protein Structure Classification Based on Conserved Hydrophobic Residues
Protein folding is frequently guided by local residue interactions that form clusters in the protein core. The interactions between residue clusters serve as potential nucleation sites in the folding process. Evidence postulates that the residue ...
Protein structure classification by structural transformatio
IJSIS '96: Proceedings of the 1996 IEEE International Joint Symposia on Intelligence and SystemsProtein structure classification plays an important role in understanding the relationships among structure and sequence. Recently, as the number of known protein structure are increasing steeply, automatic classification is highly required. This paper ...
Comments