skip to main content
10.1145/3343031.3351034acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

Authors Info & Claims
Published:15 October 2019Publication History

ABSTRACT

Personalized recommendation plays a central role in many online content sharing platforms. To provide quality micro-video recommendation service, it is of crucial importance to consider the interactions between users and items (i.e. micro-videos) as well as the item contents from various modalities (e.g. visual, acoustic, and textual). Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities. In this paper, we propose to exploit user-item interactions to guide the representation learning in each modality, and further personalized micro-video recommendation. We design a Multi-modal Graph Convolution Network (MMGCN) framework built upon the message-passing idea of graph neural networks, which can yield modal-specific representations of users and micro-videos to better capture user preferences. Specifically, we construct a user-item bipartite graph in each modality, and enrich the representation of each node with the topological structure and features of its neighbors. Through extensive experiments on three publicly available datasets, Tiktok, Kwai, and MovieLens, we demonstrate that our proposed model is able to significantly outperform state-of-the-art multi-modal recommendation methods.

References

  1. Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2016. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of International Conference on Learning Representations. 1--16.Google ScholarGoogle Scholar
  2. Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. PP, 99 (2017), 2481--2495.Google ScholarGoogle Scholar
  3. Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence , Vol. 41, 2 (2019), 423--443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tadas Baltruvs aitis, Ntombikayise Banda, and Peter Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of IEEE International Conference and Workshops on Automatic Face and Gesture Recognition. IEEE, 1--8.Google ScholarGoogle Scholar
  5. Rianne van den Berg, Thomas N Kipf, and Max Welling. 2017. Graph convolutional matrix completion. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2017).Google ScholarGoogle Scholar
  6. Yixin Cao, Xiang Wang, Xiangnan He, Zikun Hu, and Tat-Seng Chua. 2019. Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. In The World Wide Web Conference. ACM, 151--161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bisheng Chen, Jingdong Wang, Qinghua Huang, and Tao Mei. 2012. Personalized video recommendation through tripartite graph propagation. In Proceedings of ACM international conference on Multimedia. ACM, 1133--1136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 335--344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yuxiao Chen, Jianbo Yuan, Quanzeng You, and Jiebo Luo. 2018. Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM. In Proceedings of ACM Multimedia Conference on Multimedia Conference. ACM, 117--125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zhiyong Cheng, Xiaojun Chang, Lei Zhu, Rose C Kanjirathinkal, and Mohan Kankanhalli. 2019. MMALFM: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems (TOIS) , Vol. 37, 2 (2019), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhiyong Cheng, Shen Jialie, and Steven CH Hoi. 2016. On effective personalized music retrieval by exploring online user behaviors. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 125--134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et almbox. 2013. Devise: A deep visual-semantic embedding model. In Proceedings of International Conference on Neural Information Processing Systems. 2121--2129.Google ScholarGoogle Scholar
  13. Mihai Gurban, Jean-Philippe Thiran, Thomas Drugman, and Thierry Dutoit. 2008. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition. In Proceedings of International Conference on Multimodal Interfaces. ACM, 237--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of International Conference on Neural Information Processing Systems. 1024--1034.Google ScholarGoogle Scholar
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle Scholar
  16. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence. 1--8.Google ScholarGoogle Scholar
  18. Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial personalized ranking for recommendation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 355--364.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et almbox. 2017. CNN architectures for large-scale audio classification. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 131--135.Google ScholarGoogle ScholarCross RefCross Ref
  21. Shintami Chusnul Hidayati, Cheng-Chun Hsu, Yu-Ting Chang, Kai-Lung Hua, Jianlong Fu, and Wen-Huang Cheng. 2018. What Dress Fits Me Best?: Fashion Recommendation on the Clothing Style for Personal Body Shape. In Proceedings of ACM Multimedia Conference on Multimedia Conference. ACM, 438--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Proceedings of International Conference on Learning Representations, 1--14.Google ScholarGoogle Scholar
  23. Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, and Xiaochun Cao. 2019. Single image deraining: A comprehensive benchmark analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3838--3847.Google ScholarGoogle ScholarCross RefCross Ref
  24. Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards Micro-video Understanding by Joint Sequential-Sparse Modeling. In Proceedings of ACM Multimedia Conference on Multimedia Conference. 970--978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Meng Liu, Liqiang Nie, Xiang Wang, Qi Tian, and Baoquan Chen. 2019. Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning. IEEE Transactions on Image Processing , Vol. 28, 3 (2019), 1235--1247.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In Proceedings of International Conference on Neural Information Processing Systems . 3697--3707.Google ScholarGoogle Scholar
  27. Liqiang Nie, Xuemeng Song, and Tat-Seng Chua. 2016. Learning from multiple social networks. Synthesis Lectures on Information Concepts, Retrieval, and Services , Vol. 8, 2 (2016), 1--118.Google ScholarGoogle ScholarCross RefCross Ref
  28. Liqiang Nie, Xiang Wang, Jianglong Zhang, Xiangnan He, Hanwang Zhang, Richang Hong, and Qi Tian. 2017. Enhancing Micro-video Understanding by Harnessing External Sounds. In Proceedings of ACM Multimedia Conference on Multimedia Conference. 1192--1200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In Proceedings of International conference on machine learning. 2014--2023.Google ScholarGoogle Scholar
  30. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the conference on uncertainty in artificial intelligence. AUAI Press, 452--461.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xindi Shang, Donglin Di, Junbin Xiao, Yu Cao, Xun Yang, and Tat-Seng Chua. 2019. Annotating Objects and Relations in User-Generated Videos. In ICMR. 279--287.Google ScholarGoogle Scholar
  32. Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black Holes and White Rabbits: Metaphor Identification with Visual Features. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics . 160--170.Google ScholarGoogle ScholarCross RefCross Ref
  33. Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, and Yoshua Bengio. 2017. Graph Attention Networks. In Proceedings of International Conference on Learning Representations. 1--12.Google ScholarGoogle Scholar
  34. Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013.Google ScholarGoogle ScholarCross RefCross Ref
  35. Meng Wang, Richang Hong, Guangda Li, Zheng-Jun Zha, Shuicheng Yan, and Tat-Seng Chua. 2012. Event driven web video summarization by tag localization and key-shot identification. IEEE Transactions on Multimedia , Vol. 14, 4 (2012), 975--985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Meng Wang, Changzhi Luo, Bingbing Ni, Jun Yuan, Jianfeng Wang, and Shuicheng Yan. 2017b. First-person daily activity recognition with manipulated object proposals and non-linear feature fusion. IEEE Transactions on Circuits and Systems for Video Technology , Vol. 28, 10 (2017), 2946--2955.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shuhui Wang, Yangyu Chen, Junbao Zhuo, Qingming Huang, and Qi Tian. 2018a. Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval. In Proceedings of ACM Multimedia Conference on Multimedia Conference. ACM, 1398--1406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019 a. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining .Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xiang Wang, Xiangnan He, Fuli Feng, Liqiang Nie, and Tat-Seng Chua. 2018b. Tem: Tree-enhanced embedding model for explainable recommendation. In Proceedings of the 2018 World Wide Web Conference. 1543--1552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xiang Wang, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2017a. Item silk road: Recommending items from information domains to social users. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 185--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019 b. Neural Graph Collaborative Filtering. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval . 165--174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 974--983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zheng Zhang, Lizi Liao, Minlie Huang, Xiaoyan Zhu, and Tat-Seng Chua. 2019. Neural Multimodal Belief Tracker with Adaptive Attention for Dialogue Systems. In The World Wide Web Conference. ACM, 2401--2412.Google ScholarGoogle Scholar

Index Terms

  1. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MM '19: Proceedings of the 27th ACM International Conference on Multimedia
            October 2019
            2794 pages
            ISBN:9781450368896
            DOI:10.1145/3343031

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 October 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            MM '19 Paper Acceptance Rate252of936submissions,27%Overall Acceptance Rate995of4,171submissions,24%

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader