research-article

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

Authors:
Yinwei Wei

Shandong University, Qingdao, China

Shandong University, Qingdao, China
View Profile

,
Xiang Wang

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Liqiang Nie

Shandong University, Qingdao, China

Shandong University, Qingdao, China
View Profile

,
Xiangnan He

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Richang Hong

Hefei University of Technology, Hefei, China

Hefei University of Technology, Hefei, China
View Profile

,
Tat-Seng Chua

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

MM '19: Proceedings of the 27th ACM International Conference on MultimediaOctober 2019Pages 1437–1445https://doi.org/10.1145/3343031.3351034

Published:15 October 2019Publication History

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 1437–1445

ABSTRACT

Personalized recommendation plays a central role in many online content sharing platforms. To provide quality micro-video recommendation service, it is of crucial importance to consider the interactions between users and items (i.e. micro-videos) as well as the item contents from various modalities (e.g. visual, acoustic, and textual). Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities. In this paper, we propose to exploit user-item interactions to guide the representation learning in each modality, and further personalized micro-video recommendation. We design a Multi-modal Graph Convolution Network (MMGCN) framework built upon the message-passing idea of graph neural networks, which can yield modal-specific representations of users and micro-videos to better capture user preferences. Specifically, we construct a user-item bipartite graph in each modality, and enrich the representation of each node with the topological structure and features of its neighbors. Through extensive experiments on three publicly available datasets, Tiktok, Kwai, and MovieLens, we demonstrate that our proposed model is able to significantly outperform state-of-the-art multi-modal recommendation methods.

References

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2016. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of International Conference on Learning Representations. 1--16.Google Scholar
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. PP, 99 (2017), 2481--2495.Google Scholar
Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 41, 2 (2019), 423--443.Google ScholarDigital Library
Tadas Baltruvs aitis, Ntombikayise Banda, and Peter Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. IEEE, 1--8.Google Scholar
Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolutional matrix completion. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2017).Google Scholar
Yixin Cao, Xiang Wang, Xiangnan He, Zikun Hu, and Tat-Seng Chua. 2019. Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. In The World Wide Web Conference. ACM, 151--161.Google ScholarDigital Library
Bisheng Chen, Jingdong Wang, Qinghua Huang, and Tao Mei. 2012. Personalized video recommendation through tripartite graph propagation. In Proceedings of ACM international conference on Multimedia. ACM, 1133--1136.Google ScholarDigital Library
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 335--344.Google ScholarDigital Library
Yuxiao Chen, Jianbo Yuan, Quanzeng You, and Jiebo Luo. 2018. Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM. In Proceedings of ACM Multimedia Conference on Multimedia Conference. ACM, 117--125.Google ScholarDigital Library
Zhiyong Cheng, Xiaojun Chang, Lei Zhu, Rose C Kanjirathinkal, and Mohan Kankanhalli. 2019. MMALFM: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems (TOIS) , Vol. 37, 2 (2019), 16.Google ScholarDigital Library
Zhiyong Cheng, Shen Jialie, and Steven CH Hoi. 2016. On effective personalized music retrieval by exploring online user behaviors. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 125--134.Google ScholarDigital Library
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et almbox. 2013. Devise: A deep visual-semantic embedding model. In Proceedings of International Conference on Neural Information Processing Systems. 2121--2129.Google Scholar
Mihai Gurban, Jean-Philippe Thiran, Thomas Drugman, and Thierry Dutoit. 2008. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition. In Proceedings of International Conference on Multimodal Interfaces. ACM, 237--240.Google ScholarDigital Library
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of International Conference on Neural Information Processing Systems. 1024--1034.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence. 1--8.Google Scholar
Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial personalized ranking for recommendation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 355--364.Google ScholarDigital Library
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173--182.Google ScholarDigital Library
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et almbox. 2017. CNN architectures for large-scale audio classification. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 131--135.Google ScholarCross Ref
Shintami Chusnul Hidayati, Cheng-Chun Hsu, Yu-Ting Chang, Kai-Lung Hua, Jianlong Fu, and Wen-Huang Cheng. 2018. What Dress Fits Me Best?: Fashion Recommendation on the Clothing Style for Personal Body Shape. In Proceedings of ACM Multimedia Conference on Multimedia Conference. ACM, 438--446.Google ScholarDigital Library
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Proceedings of International Conference on Learning Representations, 1--14.Google Scholar
Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, and Xiaochun Cao. 2019. Single image deraining: A comprehensive benchmark analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3838--3847.Google ScholarCross Ref
Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards Micro-video Understanding by Joint Sequential-Sparse Modeling. In Proceedings of ACM Multimedia Conference on Multimedia Conference. 970--978.Google ScholarDigital Library
Meng Liu, Liqiang Nie, Xiang Wang, Qi Tian, and Baoquan Chen. 2019. Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning. IEEE Transactions on Image Processing , Vol. 28, 3 (2019), 1235--1247.Google ScholarDigital Library
Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In Proceedings of International Conference on Neural Information Processing Systems . 3697--3707.Google Scholar
Liqiang Nie, Xuemeng Song, and Tat-Seng Chua. 2016. Learning from multiple social networks. Synthesis Lectures on Information Concepts, Retrieval, and Services , Vol. 8, 2 (2016), 1--118.Google ScholarCross Ref
Liqiang Nie, Xiang Wang, Jianglong Zhang, Xiangnan He, Hanwang Zhang, Richang Hong, and Qi Tian. 2017. Enhancing Micro-video Understanding by Harnessing External Sounds. In Proceedings of ACM Multimedia Conference on Multimedia Conference. 1192--1200.Google ScholarDigital Library
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In Proceedings of International conference on machine learning. 2014--2023.Google Scholar
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the conference on uncertainty in artificial intelligence. AUAI Press, 452--461.Google ScholarDigital Library
Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, and Tat-Seng Chua. 2019. Annotating Objects and Relations in User-Generated Videos. In ICMR. 279--287.Google Scholar
Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black Holes and White Rabbits: Metaphor Identification with Visual Features. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics . 160--170.Google ScholarCross Ref
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, and Yoshua Bengio. 2017. Graph Attention Networks. In Proceedings of International Conference on Learning Representations. 1--12.Google Scholar
Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013.Google ScholarCross Ref
Meng Wang, Richang Hong, Guangda Li, Zheng-Jun Zha, Shuicheng Yan, and Tat-Seng Chua. 2012. Event driven web video summarization by tag localization and key-shot identification. IEEE Transactions on Multimedia , Vol. 14, 4 (2012), 975--985.Google ScholarDigital Library
Meng Wang, Changzhi Luo, Bingbing Ni, Jun Yuan, Jianfeng Wang, and Shuicheng Yan. 2017b. First-person daily activity recognition with manipulated object proposals and non-linear feature fusion. IEEE Transactions on Circuits and Systems for Video Technology , Vol. 28, 10 (2017), 2946--2955.Google ScholarDigital Library
Shuhui Wang, Yangyu Chen, Junbao Zhuo, Qingming Huang, and Qi Tian. 2018a. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval. In Proceedings of ACM Multimedia Conference on Multimedia Conference. ACM, 1398--1406.Google ScholarDigital Library
Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019 a. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining .Google ScholarDigital Library
Xiang Wang, Xiangnan He, Fuli Feng, Liqiang Nie, and Tat-Seng Chua. 2018b. Tem: Tree-enhanced embedding model for explainable recommendation. In Proceedings of the 2018 World Wide Web Conference. 1543--1552.Google ScholarDigital Library
Xiang Wang, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2017a. Item silk road: Recommending items from information domains to social users. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 185--194.Google ScholarDigital Library
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019 b. Neural Graph Collaborative Filtering. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval . 165--174.Google ScholarDigital Library
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 974--983.Google ScholarDigital Library
Zheng Zhang, Lizi Liao, Minlie Huang, Xiaoyan Zhu, and Tat-Seng Chua. 2019. Neural Multimodal Belief Tracker with Adaptive Attention for Dialogue Systems. In The World Wide Web Conference. ACM, 2401--2412.Google Scholar

Index Terms

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. World Wide Web
    1. Web searching and information discovery
      1. Personalization

Recommendations

Preference-Aware Modality Representation and Fusion for Micro-video Recommendation
Pattern Recognition and Computer Vision
Abstract
Personalized multi-modal micro-video recommendation has attracted increasing research interests recently. Despite existing methods have achieved much progress, they ignore the importance of the user’s modality preference for micro-video ...
Read More
Bootstrap Latent Representations for Multi-modal Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023

This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-...
Read More
Personalized Hashtag Recommendation for Micro-videos
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Personalized hashtag recommendation methods aim to suggest users hashtags to annotate, categorize, and describe their posts. The hashtags, that a user provides to a post (e.g., a micro-video), are the ones which in her mind can well describe the post ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph convolution network
micro-video understanding
multi-modal recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
MM '19 Paper Acceptance Rate252of936submissions,27%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 274
  Total Citations
  View Citations
- 3,373
  Total Downloads
- Downloads (Last 12 months)687
- Downloads (Last 6 weeks)101
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Preference-Aware Modality Representation and Fusion for Micro-video Recommendation

Bootstrap Latent Representations for Multi-modal Recommendation

Personalized Hashtag Recommendation for Micro-videos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Preference-Aware Modality Representation and Fusion for Micro-video Recommendation

Bootstrap Latent Representations for Multi-modal Recommendation

Personalized Hashtag Recommendation for Micro-videos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media