research-article

Public Access

Graph Representation Learning via Hard and Channel-Wise Attention Networks

Authors:
Hongyang Gao

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

,
Shuiwang Ji

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 741–749https://doi.org/10.1145/3292500.3330897

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 741–749

ABSTRACT

Attention operators have been widely applied in various fields, including computer vision, natural language processing, and network embedding learning. Attention operators on graph data enables learnable weights when aggregating information from neighboring nodes. However, graph attention operators (GAOs) consume excessive computational resources, preventing their applications on large graphs. In addition, GAOs belong to the family of soft attention, instead of hard attention, which has been shown to yield better performance. In this work, we propose novel hard graph attention operator~(hGAO) and channel-wise graph attention operator~(cGAO). hGAO uses the hard attention mechanism by attending to only important nodes. Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes. To further reduce the requirements on computational resources, we propose the cGAO that performs attention operations along channels. cGAO avoids the dependency on the adjacency matrix, leading to dramatic reductions in computational resource requirements. Experimental results demonstrate that our proposed deep models with the new operators achieve consistently better performance. Comparison results also indicates that hGAO achieves significantly better performance than GAO on both node and graph embedding tasks. Efficiency comparison shows that our cGAO leads to dramatic savings in computational resources, making them applicable to large graphs.

References

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283. Google ScholarDigital Library
Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. 2005. Protein function prediction via graph kernels. Bioinformatics , Vol. 21, suppl_1 (2005), i47--i56. Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, Vol. 2, 3 (2011), 27. Google ScholarDigital Library
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844--3852. Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Paul D Dobson and Andrew J Doig. 2003. Distinguishing enzyme structures from non-enzymes without alignments. Journal of Molecular Biology , Vol. 330, 4 (2003), 771--783.Google ScholarCross Ref
Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1416--1424. Google ScholarDigital Library
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.Google Scholar
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In International Conference on Machine Learning. 1462--1471. Google ScholarDigital Library
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024--1034. Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation , Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Max Jaderberg, Karen Simonyan, Andrew Zisserman, et almbox. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025. Google ScholarDigital Library
Felix Juefei-Xu, Eshan Verma, Parag Goel, Anisha Cherodian, and Marios Savvides. 2016. Deepgender: Occlusion and low resolution robust facial gender classification via progressively trained convolutional neural networks with attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops . 68--77.Google ScholarCross Ref
Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. The International Conference on Learning Representations (2015).Google Scholar
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (2017).Google Scholar
Yann LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2012. Efficient backprop. In Neural networks: Tricks of the trade . Springer, 9--48. Google ScholarDigital Library
Guanbin Li, Xiang He, Wei Zhang, Huiyou Chang, Le Dong, and Liang Lin. 2018. Non-locally enhanced encoder-decoder network for single image de-raining. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1056--1064. Google ScholarDigital Library
Jeffrey Ling and Alexander Rush. 2017. Coarse-to-fine attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Summarization. 33--42.Google ScholarCross Ref
Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . 1412--1421.Google ScholarCross Ref
Mateusz Malinowski, Carl Doersch, Adam Santoro, and Peter Battaglia. 2018. Learning visual question answering by bootstrapping hard attention. In European Conference on Computer Vision. Springer, 3--20.Google ScholarCross Ref
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International Conference on Machine Learning . 2014--2023. Google ScholarDigital Library
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 701--710. Google ScholarDigital Library
Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Attention-aware deep reinforcement learning for video face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3931--3940.Google ScholarCross Ref
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Magazine , Vol. 29, 3 (2008), 93.Google ScholarDigital Library
Shiv Shankar, Siddhant Garg, and Sunita Sarawagi. 2018. Surprisingly easy hard-attention for sequence to sequence learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing . 640--645.Google ScholarCross Ref
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research , Vol. 15, 1 (2014), 1929--1958. Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000--6010. Google ScholarDigital Library
Petar Velivc ković , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2017. Graph attention networks. In International Conference on Learning Representations .Google Scholar
Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order matters: Sequence to sequence for sets. International Conference on Learning Representations (2016).Google Scholar
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 4.Google ScholarCross Ref
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057. Google ScholarDigital Library
Pinar Yanardag and SVN Vishwanathan. 2015. A structural smoothing framework for robust graph comparison. In Advances in Neural Information Processing Systems. 2134--2142. Google ScholarDigital Library
Zhilin Yang, William Cohen, and Ruslan Salakhudinov. 2016. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning. 40--48. Google ScholarDigital Library
Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems. 4800--4810. Google ScholarDigital Library
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2019. ST-UNet: A spatio-temporal U-network for graph-structured time series modeling. arXiv preprint arXiv:1903.05631 (2019).Google Scholar
Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In Proceedings of AAAI Conference on Artificial Inteligence .Google Scholar
Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, and Jiaya Jia. 2018. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision. 267--283.Google ScholarCross Ref

Index Terms

Graph Representation Learning via Hard and Channel-Wise Attention Networks
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Structured outputs
    2. Machine learning approaches
      1. Neural networks

Recommendations

Enhance Temporal Knowledge Graph Completion via Time-Aware Attention Graph Convolutional Network
Machine Learning and Knowledge Discovery in Databases
Abstract
Previous works on knowledge graph representation learning focus on static knowledge graph and get fully developed. However, task on temporal knowledge graph is far from consummation because of its late start. Recent researches have shifted to the ...
Read More
Word and graph attention networks for semi-supervised classification
Abstract
Graph attention networks are effective graph neural networks that perform graph embedding for semi-supervised learning, which considers the neighbors of a node when learning its features. This paper presents a novel attention-based graph neural ...
Read More
Graph colouring via the discharging method
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
channel-wise attention
graph neural networks
hard attention
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 1,706
  Total Downloads
- Downloads (Last 12 months)185
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Graph Representation Learning via Hard and Channel-Wise Attention Networks

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Enhance Temporal Knowledge Graph Completion via Time-Aware Attention Graph Convolutional Network

Word and graph attention networks for semi-supervised classification

Graph colouring via the discharging method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Graph Representation Learning via Hard and Channel-Wise Attention Networks

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Enhance Temporal Knowledge Graph Completion via Time-Aware Attention Graph Convolutional Network

Word and graph attention networks for semi-supervised classification

Graph colouring via the discharging method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media