skip to main content
10.1145/3233547.3233607acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Graphic Encoding of Macromolecules for Efficient High-Throughput Analysis

Authors Info & Claims
Published:15 August 2018Publication History

ABSTRACT

The function of a protein depends on its three-dimensional structure. Current approaches based on homology for predicting a given protein's function do not work well at scale. In this work, we propose a representation of proteins that explicitly encodes secondary and tertiary structure into fix-sized images. In addition, we present a neural network architecture that exploits our data representation to perform protein function prediction. We validate the effectiveness of our encoding method and the strength of our neural network architecture through a 5-fold cross validation over roughly 63 thousand images, achieving an accuracy of 80% across 8 distinct classes. Our novel approach of encoding and classifying proteins is suitable for real-time processing, leading to high-throughput analysis.

References

  1. M. Ashburner, CA. Ball, JA. Blake, D. Botstein, H. Butler, JM. Cherry, AP. Davis, K. Dolinski, SS. Dwight, JT. Eppig, MA. Harris, DP. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, JC. Matese, JE. Richardson, M. Ringwald, GM. Rubin, and G. Sherlock . 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet., Vol. 25, 1 (2000).Google ScholarGoogle ScholarCross RefCross Ref
  2. David Barkan . 2002. A Parallel Implementation of the Needleman-Wunsch Algorithm for Global Gapped Pair-wise Alignment. J. Comput. Sci. Coll. Vol. 17, 6 (May . 2002), 238--239. http://dl.acm.org/citation.cfm"id=775742.775778 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Helen Berman, Kim Henrick, and Haruki Nakamura . 2003. Announcing the worldwide Protein Data Bank. Nature Structural Biology Vol. 980, 10 (2003).Google ScholarGoogle Scholar
  4. Helen Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne . 2000. Nucleic Acids Research. Nature Structural Biology Vol. 28, 1 (2000).Google ScholarGoogle Scholar
  5. Marenglen Biba, Floriana Esposito, Stefano Ferilli, Teresa M. A. Basile, and Nicola Di Mauro . 2007. Multi-class Protein Fold Recognition Through a Symbolic-Statistical Framework. Springer Berlin Heidelberg, Berlin, Heidelberg, 666--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Renzhi Cao, Colton Freitas, Leong Chan, Miao Sun, Haiqing Jiang, and Zhangxin Chen . 2017. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. arXiv:1710.07016 {cs, q-bio} (Oct. . 2017). http://arxiv.org/abs/1710.07016 arXiv: 1710.07016.Google ScholarGoogle Scholar
  7. Hongming Chen, Ola Engkvist, Yinhai Wang, Marcus Olivecrona, and Thomas Blaschke . 2018. The rise of deep learning in drug discovery. Drug Discovery Today (Jan. . 2018).Google ScholarGoogle Scholar
  8. The Gene Ontology Consortium Gene Ontology Consortium. http://www.geneontology.org/. (. ????).Google ScholarGoogle Scholar
  9. The Gene Ontology Consortium . 2017. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res., Vol. 4, 45 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  10. Isaac Elias . 2006. Settling the intractability of multiple alignment. J Comput Biol, Vol. 13, 7 (2006), 1323--1339.Google ScholarGoogle ScholarCross RefCross Ref
  11. Leif Ellingson and Jinfeng Zhang . 2011. An Efficient Algorithm for Matching Protein Binding Sites for Protein Function Prediction Proceedings of the 2Nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB '11). ACM, New York, NY, USA, 289--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (04 . 2017).Google ScholarGoogle Scholar
  13. Michael R. Garey and David S. Johnson . 1990. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Apostol Gramada and Philip E. Bourne . 2006. Multipolar representation of protein structure. BMC Bioinformatics, Vol. 67, 242 (2006).Google ScholarGoogle Scholar
  15. Jie Hou, Badri Adhikari, and Jianlin Cheng . 2018. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, Vol. 34, 8 (April . 2018), 1295--1303.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jingtong Hou, Gregory E. Sims, Chao Zhang, and Sung-Hou Kim . 2002. A global representation of the protein fold space. PNAS, Vol. 100, 5 (2002).Google ScholarGoogle Scholar
  17. Eugene Ie, Jason Weston, William Stafford Noble, and Christina Leslie . 2005. Multi-class Protein Fold Recognition Using Adaptive Codes Proceedings of the 22Nd International Conference on Machine Learning (ICML '05). ACM, New York, NY, USA, 329--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sungchul Kim, Sael Lee, and Hwanjo Yu . 2012. Indexing Methods for Efficient Protein 3D Surface Search Proceedings of the ACM Sixth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO '12). ACM, New York, NY, USA, 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Kolker, R. Higdon, W. Broomall, L. Stanberry, D. Welch, W. Lu, W. Haynes, R. Barga, and E. Kolker . 2011. Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins. OMICS, Vol. 15, 513 (2011).Google ScholarGoogle Scholar
  20. Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf, and Jonathan Wren . 2018. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, Vol. 34, 4 (Feb. . 2018), 660--668.Google ScholarGoogle ScholarCross RefCross Ref
  21. Haiou Li, Jie Hou, Badri Adhikari, Qiang Lyu, and Jianlin Cheng . 2017. Deep learning methods for protein torsion angle prediction. BMC Bioinformatics Vol. 18 (Sept. . 2017), 417.Google ScholarGoogle Scholar
  22. Zhen Li and Yizhou Yu . 2016. Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks (IJCAI'16). AAAI Press, New York, New York, USA, 2560--2567. http://dl.acm.org/citation.cfm?id=3060832.3060979 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael N. Liebman, Carol A. Venanzi, and Harel Weinstein . 1985. Structural analysis of carboxypeptidase A and its complexes with inhibitors as a basis for modeling enzyme recognition and specificity. Biopolymers, Vol. 24, 9 (1985), 1721--1758.Google ScholarGoogle ScholarCross RefCross Ref
  24. Xueliang Liu . 2017. Deep Recurrent Neural Network for Protein Function Prediction from Sequence. arXiv:1701.08318 {cs, q-bio, stat} (Jan. . 2017). http://arxiv.org/abs/1701.08318 arXiv: 1701.08318.Google ScholarGoogle Scholar
  25. Saeed Maleki, Madanlal Musuvathi, and Todd Mytkowicz . 2016. Low-Rank Methods for Parallelizing Dynamic Programming Algorithms. ACM Trans. Parallel Comput. Vol. 2, 4, Article bibinfoarticleno26 (Feb. . 2016), 32 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Richard J. Morris, Rafael J. Najmanovich, Abdullah Kahraman, and Janet M. Thornton . 2005. Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons. Bioinformatics, Vol. 21, 10 (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yukari Nakamura, Ayaka Kaneko, and Takayuki Itoh . 2011. An Accelerated Pocket Extraction and Evaluation Technique for Druggability Analysis with Protein Surfaces. In SIGGRAPH Asia 2011 Posters (SA '11). ACM, New York, NY, USA, Article bibinfoarticleno31, 1 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Saul B. Needleman and Christian D. Wunsch . 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology Vol. 48, 3 (1970), 443 -- 453.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. P. Nguyen, Z. Li, D. Xu, and Y. Shang . 2017. New Deep Learning Methods for Protein Loop Modeling. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2017), 1--1.Google ScholarGoogle Scholar
  30. M Novic and M Randic . 2008. Representation of proteins as walks in 20-D space. SAR QSAR Environ Res Vol. 19, 3 (2008).Google ScholarGoogle Scholar
  31. T. Ooi and K. Nishikawa . 1973. Conformation of Biological Molecules and Polymers. E. D. and Pullman, B., Eds. (1973), 173--187.Google ScholarGoogle Scholar
  32. Margarita Osadchy and Rachel Kolodny . 2011. Maps of protein structure space reveal a fundamental relationship between protein structure and function. Biophysics and Computational Biology Vol. 108, 30 (2011).Google ScholarGoogle Scholar
  33. Kuldip Paliwal, James Lyons, and Rhys Heffernan . 2015. A Short Review of Deep Learning Neural Networks in Protein Structure Prediction Problems. Advanced Techniques in Biology & Medicine Vol. 3, 3 (Sept. . 2015), 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  34. G.N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan . 1963. Multipolar representation of protein structure. Journal of Molecular Biology Vol. 7, 95 (1963).Google ScholarGoogle Scholar
  35. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei . 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), Vol. 115, 3 (2015), 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. (2014).Google ScholarGoogle Scholar
  37. T.F. Smith and M.S. Waterman . 1981. Identification of common molecular subsequences. Journal of Molecular Biology Vol. 147, 1 (1981), 195 -- 197.Google ScholarGoogle ScholarCross RefCross Ref
  38. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna . 2015. Rethinking the Inception Architecture for Computer Vision. CoRR Vol. abs/1512.00567 (2015).Google ScholarGoogle Scholar
  39. Sheng Wang and Jinbo Xu . 2017. De Novo Protein Structure Prediction by Big Data and Deep Learning. Biophysical Journal Vol. 112, 3 (Feb. . 2017), 55a.Google ScholarGoogle Scholar
  40. Yong Wang, Wu Ling-Yun, Ji-Hong Zhang, Zhong-Wei Zhan, Zhang Xiang-Sun, and Chen Luonan . 2009. Evaluating Protein Similarity from Coarse Structures. IEEE/ACM Trans. Comput. Biol. Bioinformatics, Vol. 6, 4 (Oct. . 2009), 583--593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J.C. Whisstock and A.M. Lesk . 2003. Prediction of protein function from protein sequence and structure. Q Rev Biophys, Vol. 36, 3 (2003).Google ScholarGoogle Scholar
  42. B. Zhang, T. Estrada, P. Cicotti, P. Balaji, and M. Taufer . 2015. Accurate Scoring of Drug Conformations at the Extreme Scale 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 817--822.Google ScholarGoogle Scholar
  43. Boyu Zhang, Trilce Estrada, Pietro Cicotti, Pavan Balaji, and Michela Taufer . 2017 a. Enabling scalable and accurate clustering of distributed ligand geometries on supercomputers. Parallel Comput. Vol. 63 (2017), 38 -- 60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mengying Zhang, Qiang Su, Yi Lu, Manman Zhao, and Bing Niu . 2017 b. Application of Machine Learning Approaches for Protein-protein Interactions Prediction. Medicinal Chemistry (Shariqah (United Arab Emirates)), Vol. 13, 6 (2017), 506--514.Google ScholarGoogle Scholar

Index Terms

  1. Graphic Encoding of Macromolecules for Efficient High-Throughput Analysis

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
                August 2018
                727 pages
                ISBN:9781450357944
                DOI:10.1145/3233547

                Copyright © 2018 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 15 August 2018

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                BCB '18 Paper Acceptance Rate46of148submissions,31%Overall Acceptance Rate254of885submissions,29%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader