skip to main content
10.1145/3292500.3330897acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Graph Representation Learning via Hard and Channel-Wise Attention Networks

Published:25 July 2019Publication History

ABSTRACT

Attention operators have been widely applied in various fields, including computer vision, natural language processing, and network embedding learning. Attention operators on graph data enables learnable weights when aggregating information from neighboring nodes. However, graph attention operators (GAOs) consume excessive computational resources, preventing their applications on large graphs. In addition, GAOs belong to the family of soft attention, instead of hard attention, which has been shown to yield better performance. In this work, we propose novel hard graph attention operator~(hGAO) and channel-wise graph attention operator~(cGAO). hGAO uses the hard attention mechanism by attending to only important nodes. Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes. To further reduce the requirements on computational resources, we propose the cGAO that performs attention operations along channels. cGAO avoids the dependency on the adjacency matrix, leading to dramatic reductions in computational resource requirements. Experimental results demonstrate that our proposed deep models with the new operators achieve consistently better performance. Comparison results also indicates that hGAO achieves significantly better performance than GAO on both node and graph embedding tasks. Efficiency comparison shows that our cGAO leads to dramatic savings in computational resources, making them applicable to large graphs.

References

  1. Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. 2005. Protein function prediction via graph kernels. Bioinformatics , Vol. 21, suppl_1 (2005), i47--i56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, Vol. 2, 3 (2011), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844--3852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  6. Paul D Dobson and Andrew J Doig. 2003. Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology , Vol. 330, 4 (2003), 771--783.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1416--1424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.Google ScholarGoogle Scholar
  9. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In International Conference on Machine Learning. 1462--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation , Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Max Jaderberg, Karen Simonyan, Andrew Zisserman, et almbox. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Felix Juefei-Xu, Eshan Verma, Parag Goel, Anisha Cherodian, and Marios Savvides. 2016. Deepgender: Occlusion and low resolution robust facial gender classification via progressively trained convolutional neural networks with attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops . 68--77.Google ScholarGoogle ScholarCross RefCross Ref
  14. Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. The International Conference on Learning Representations (2015).Google ScholarGoogle Scholar
  15. Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (2017).Google ScholarGoogle Scholar
  16. Yann LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2012. Efficient backprop. In Neural networks: Tricks of the trade . Springer, 9--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guanbin Li, Xiang He, Wei Zhang, Huiyou Chang, Le Dong, and Liang Lin. 2018. Non-locally enhanced encoder-decoder network for single image de-raining. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1056--1064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeffrey Ling and Alexander Rush. 2017. Coarse-to-fine attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Summarization. 33--42.Google ScholarGoogle ScholarCross RefCross Ref
  19. Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . 1412--1421.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mateusz Malinowski, Carl Doersch, Adam Santoro, and Peter Battaglia. 2018. Learning visual question answering by bootstrapping hard attention. In European Conference on Computer Vision. Springer, 3--20.Google ScholarGoogle ScholarCross RefCross Ref
  21. Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International Conference on Machine Learning . 2014--2023. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Attention-aware deep reinforcement learning for video face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3931--3940.Google ScholarGoogle ScholarCross RefCross Ref
  24. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Magazine , Vol. 29, 3 (2008), 93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shiv Shankar, Siddhant Garg, and Sunita Sarawagi. 2018. Surprisingly easy hard-attention for sequence to sequence learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing . 640--645.Google ScholarGoogle ScholarCross RefCross Ref
  26. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research , Vol. 15, 1 (2014), 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000--6010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Petar Velivc ković , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph attention networks. In International Conference on Learning Representations .Google ScholarGoogle Scholar
  29. Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order matters: Sequence to sequence for sets. International Conference on Learning Representations (2016).Google ScholarGoogle Scholar
  30. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 4.Google ScholarGoogle ScholarCross RefCross Ref
  31. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pinar Yanardag and SVN Vishwanathan. 2015. A structural smoothing framework for robust graph comparison. In Advances in Neural Information Processing Systems. 2134--2142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhilin Yang, William Cohen, and Ruslan Salakhudinov. 2016. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning. 40--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems. 4800--4810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2019. ST-UNet: A spatio-temporal U-network for graph-structured time series modeling. arXiv preprint arXiv:1903.05631 (2019).Google ScholarGoogle Scholar
  36. Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In Proceedings of AAAI Conference on Artificial Inteligence .Google ScholarGoogle Scholar
  37. Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, and Jiaya Jia. 2018. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision. 267--283.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Graph Representation Learning via Hard and Channel-Wise Attention Networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2019
          3305 pages
          ISBN:9781450362016
          DOI:10.1145/3292500

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader