ABSTRACT
The function of a protein depends on its three-dimensional structure. Current approaches based on homology for predicting a given protein's function do not work well at scale. In this work, we propose a representation of proteins that explicitly encodes secondary and tertiary structure into fix-sized images. In addition, we present a neural network architecture that exploits our data representation to perform protein function prediction. We validate the effectiveness of our encoding method and the strength of our neural network architecture through a 5-fold cross validation over roughly 63 thousand images, achieving an accuracy of 80% across 8 distinct classes. Our novel approach of encoding and classifying proteins is suitable for real-time processing, leading to high-throughput analysis.
- M. Ashburner, CA. Ball, JA. Blake, D. Botstein, H. Butler, JM. Cherry, AP. Davis, K. Dolinski, SS. Dwight, JT. Eppig, MA. Harris, DP. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, JC. Matese, JE. Richardson, M. Ringwald, GM. Rubin, and G. Sherlock . 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet., Vol. 25, 1 (2000).Google ScholarCross Ref
- David Barkan . 2002. A Parallel Implementation of the Needleman-Wunsch Algorithm for Global Gapped Pair-wise Alignment. J. Comput. Sci. Coll. Vol. 17, 6 (May . 2002), 238--239. http://dl.acm.org/citation.cfm"id=775742.775778 Google ScholarDigital Library
- Helen Berman, Kim Henrick, and Haruki Nakamura . 2003. Announcing the worldwide Protein Data Bank. Nature Structural Biology Vol. 980, 10 (2003).Google Scholar
- Helen Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne . 2000. Nucleic Acids Research. Nature Structural Biology Vol. 28, 1 (2000).Google Scholar
- Marenglen Biba, Floriana Esposito, Stefano Ferilli, Teresa M. A. Basile, and Nicola Di Mauro . 2007. Multi-class Protein Fold Recognition Through a Symbolic-Statistical Framework. Springer Berlin Heidelberg, Berlin, Heidelberg, 666--673. Google ScholarDigital Library
- Renzhi Cao, Colton Freitas, Leong Chan, Miao Sun, Haiqing Jiang, and Zhangxin Chen . 2017. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. arXiv:1710.07016 {cs, q-bio} (Oct. . 2017). http://arxiv.org/abs/1710.07016 arXiv: 1710.07016.Google Scholar
- Hongming Chen, Ola Engkvist, Yinhai Wang, Marcus Olivecrona, and Thomas Blaschke . 2018. The rise of deep learning in drug discovery. Drug Discovery Today (Jan. . 2018).Google Scholar
- The Gene Ontology Consortium Gene Ontology Consortium. http://www.geneontology.org/. (. ????).Google Scholar
- The Gene Ontology Consortium . 2017. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res., Vol. 4, 45 (2017).Google ScholarCross Ref
- Isaac Elias . 2006. Settling the intractability of multiple alignment. J Comput Biol, Vol. 13, 7 (2006), 1323--1339.Google ScholarCross Ref
- Leif Ellingson and Jinfeng Zhang . 2011. An Efficient Algorithm for Matching Protein Binding Sites for Protein Function Prediction Proceedings of the 2Nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB '11). ACM, New York, NY, USA, 289--293. Google ScholarDigital Library
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (04 . 2017).Google Scholar
- Michael R. Garey and David S. Johnson . 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA. Google ScholarDigital Library
- Apostol Gramada and Philip E. Bourne . 2006. Multipolar representation of protein structure. BMC Bioinformatics, Vol. 67, 242 (2006).Google Scholar
- Jie Hou, Badri Adhikari, and Jianlin Cheng . 2018. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, Vol. 34, 8 (April . 2018), 1295--1303.Google ScholarCross Ref
- Jingtong Hou, Gregory E. Sims, Chao Zhang, and Sung-Hou Kim . 2002. A global representation of the protein fold space. PNAS, Vol. 100, 5 (2002).Google Scholar
- Eugene Ie, Jason Weston, William Stafford Noble, and Christina Leslie . 2005. Multi-class Protein Fold Recognition Using Adaptive Codes Proceedings of the 22Nd International Conference on Machine Learning (ICML '05). ACM, New York, NY, USA, 329--336. Google ScholarDigital Library
- Sungchul Kim, Sael Lee, and Hwanjo Yu . 2012. Indexing Methods for Efficient Protein 3D Surface Search Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO '12). ACM, New York, NY, USA, 41--48. Google ScholarDigital Library
- N. Kolker, R. Higdon, W. Broomall, L. Stanberry, D. Welch, W. Lu, W. Haynes, R. Barga, and E. Kolker . 2011. Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins. OMICS, Vol. 15, 513 (2011).Google Scholar
- Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf, and Jonathan Wren . 2018. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, Vol. 34, 4 (Feb. . 2018), 660--668.Google ScholarCross Ref
- Haiou Li, Jie Hou, Badri Adhikari, Qiang Lyu, and Jianlin Cheng . 2017. Deep learning methods for protein torsion angle prediction. BMC Bioinformatics Vol. 18 (Sept. . 2017), 417.Google Scholar
- Zhen Li and Yizhou Yu . 2016. Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks (IJCAI'16). AAAI Press, New York, New York, USA, 2560--2567. http://dl.acm.org/citation.cfm?id=3060832.3060979 Google ScholarDigital Library
- Michael N. Liebman, Carol A. Venanzi, and Harel Weinstein . 1985. Structural analysis of carboxypeptidase A and its complexes with inhibitors as a basis for modeling enzyme recognition and specificity. Biopolymers, Vol. 24, 9 (1985), 1721--1758.Google ScholarCross Ref
- Xueliang Liu . 2017. Deep Recurrent Neural Network for Protein Function Prediction from Sequence. arXiv:1701.08318 {cs, q-bio, stat} (Jan. . 2017). http://arxiv.org/abs/1701.08318 arXiv: 1701.08318.Google Scholar
- Saeed Maleki, Madanlal Musuvathi, and Todd Mytkowicz . 2016. Low-Rank Methods for Parallelizing Dynamic Programming Algorithms. ACM Trans. Parallel Comput. Vol. 2, 4, Article bibinfoarticleno26 (Feb. . 2016), 32 pages. Google ScholarDigital Library
- Richard J. Morris, Rafael J. Najmanovich, Abdullah Kahraman, and Janet M. Thornton . 2005. Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics, Vol. 21, 10 (2005). Google ScholarDigital Library
- Yukari Nakamura, Ayaka Kaneko, and Takayuki Itoh . 2011. An Accelerated Pocket Extraction and Evaluation Technique for Druggability Analysis with Protein Surfaces. In SIGGRAPH Asia 2011 Posters (SA '11). ACM, New York, NY, USA, Article bibinfoarticleno31, 1 pages. Google ScholarDigital Library
- Saul B. Needleman and Christian D. Wunsch . 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology Vol. 48, 3 (1970), 443 -- 453.Google ScholarCross Ref
- S. P. Nguyen, Z. Li, D. Xu, and Y. Shang . 2017. New Deep Learning Methods for Protein Loop Modeling. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2017), 1--1.Google Scholar
- M Novic and M Randic . 2008. Representation of proteins as walks in 20-D space. SAR QSAR Environ Res Vol. 19, 3 (2008).Google Scholar
- T. Ooi and K. Nishikawa . 1973. Conformation of Biological Molecules and Polymers. E. D. and Pullman, B., Eds. (1973), 173--187.Google Scholar
- Margarita Osadchy and Rachel Kolodny . 2011. Maps of protein structure space reveal a fundamental relationship between protein structure and function. Biophysics and Computational Biology Vol. 108, 30 (2011).Google Scholar
- Kuldip Paliwal, James Lyons, and Rhys Heffernan . 2015. A Short Review of Deep Learning Neural Networks in Protein Structure Prediction Problems. Advanced Techniques in Biology & Medicine Vol. 3, 3 (Sept. . 2015), 1--2.Google ScholarCross Ref
- G.N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan . 1963. Multipolar representation of protein structure. Journal of Molecular Biology Vol. 7, 95 (1963).Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei . 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), Vol. 115, 3 (2015), 211--252. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. (2014).Google Scholar
- T.F. Smith and M.S. Waterman . 1981. Identification of common molecular subsequences. Journal of Molecular Biology Vol. 147, 1 (1981), 195 -- 197.Google ScholarCross Ref
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna . 2015. Rethinking the Inception Architecture for Computer Vision. CoRR Vol. abs/1512.00567 (2015).Google Scholar
- Sheng Wang and Jinbo Xu . 2017. De Novo Protein Structure Prediction by Big Data and Deep Learning. Biophysical Journal Vol. 112, 3 (Feb. . 2017), 55a.Google Scholar
- Yong Wang, Wu Ling-Yun, Ji-Hong Zhang, Zhong-Wei Zhan, Zhang Xiang-Sun, and Chen Luonan . 2009. Evaluating Protein Similarity from Coarse Structures. IEEE/ACM Trans. Comput. Biol. Bioinformatics, Vol. 6, 4 (Oct. . 2009), 583--593. Google ScholarDigital Library
- J.C. Whisstock and A.M. Lesk . 2003. Prediction of protein function from protein sequence and structure. Q Rev Biophys, Vol. 36, 3 (2003).Google Scholar
- B. Zhang, T. Estrada, P. Cicotti, P. Balaji, and M. Taufer . 2015. Accurate Scoring of Drug Conformations at the Extreme Scale 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 817--822.Google Scholar
- Boyu Zhang, Trilce Estrada, Pietro Cicotti, Pavan Balaji, and Michela Taufer . 2017 a. Enabling scalable and accurate clustering of distributed ligand geometries on supercomputers. Parallel Comput. Vol. 63 (2017), 38 -- 60. Google ScholarDigital Library
- Mengying Zhang, Qiang Su, Yi Lu, Manman Zhao, and Bing Niu . 2017 b. Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Medicinal Chemistry (Shariqah (United Arab Emirates)), Vol. 13, 6 (2017), 506--514.Google Scholar
Index Terms
- Graphic Encoding of Macromolecules for Efficient High-Throughput Analysis
Recommendations
A protein sequence meta-functional signature for calcium binding residue prediction
The diversity of characterized protein functions found amongst experimentally interrogated proteins suggests that a vast array of unknown functions remains undiscovered. These protein functions are imparted by specific geometric distributions of amino ...
MEGADOCK-GPU: Acceleration of Protein-Protein Docking Calculation on GPUs
BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical InformaticsProtein-protein docking is a method for predicting the protein complex structure from monomeric protein structures. Because protein structural information has been increased and the application field has been expanded to more difficult ones such as ...
An efficient algorithm for matching protein binding sites for protein function prediction
BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and BiomedicineComparing the binding sites of proteins is effective for predicting protein functions based on their structure information. However, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. In this ...
Comments