skip to main content
10.1145/3240508.3240541acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-modal Preference Modeling for Product Search

Published:15 October 2018Publication History

ABSTRACT

The visual preference of users for products has been largely ignored by the existing product search methods. In this work, we propose a multi-modal personalized product search method, which aims to search products which not only are relevant to the submitted textual query, but also match the user preferences from both textual and visual modalities. To achieve the goal, we first leverage the also_view and buy_after_viewing products to construct the visual and textual latent spaces, which are expected to preserve the visual similarity and semantic similarity of products, respectively. We then propose a translation-based search model (TranSearch ) to 1) learn a multi-modal latent space based on the pre-trained visual and textual latent spaces; and 2) map the users, queries and products into this space for direct matching. The TranSearch model is trained based on a comparative learning strategy, such that the multi-modal latent space is oriented to personalized ranking in the training stage. Experiments have been conducted on real-world datasets to validate the effectiveness of our method. The results demonstrate that our method outperforms the state-of-the-art method by a large margin.

References

  1. Qingyao Ai, Yongfeng Zhang, Keping Bi, Xu Chen, and W Bruce Croft. 2017. Learning a hierarchical embedding model for personalized product search. In SIGIR. ACM, 645--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Saeid Balaneshin-kordan and Alexander Kotov. 2018. Deep neural architecture for multi-Modal retrieval based on joint embedding space for text and images. In WSDM. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S Yu. 2016. Deep visual-semantic hashing for cross-modal retrieval. In SIGKDD. ACM, 1445--1454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Qiang Yang. 2017. Transitive hashing network for heterogeneous multimedia retrieval. In AAAI. AAAI, 81--87.Google ScholarGoogle Scholar
  5. Zhiyong Cheng, Ying Ding, Xiangnan He, Lei Zhu, Xuemeng Song, and Mohan S Kankanhalli. 2018. A^ 3NCF: An adaptive aspect attention model for rating prediction. In IJCAI. Morgan Kaufmann, 3748--3754.Google ScholarGoogle Scholar
  6. Zhiyong Cheng, Jialie Shen, Lei Zhu, Mohan S Kankanhalli, and Liqiang Nie. 2017. Exploiting music play sequence for music recommendation. In IJCAI. Morgan Kaufmann, 3654--3660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Huizhong Duan and ChengXiang Zhai. 2015. Mining coordinated intent representation for entity search and recommendation. In CIKM. ACM, 333--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Huizhong Duan, ChengXiang Zhai, Jinxing Cheng, and Abhishek Gattani. 2013a. A probabilistic mixture model for mining and analyzing product search log. In CIKM. ACM, 2179--2188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Huizhong Duan, ChengXiang Zhai, Jinxing Cheng, and Abhishek Gattani. 2013b. Supporting keyword search in product database: A probabilistic approach. VLDB, Vol. 6, 14 (2013), 1786--1797. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Golnoosh Farnadi, Jie Tang, Martine De Cock, and Marie-Francine Moens. 2018. User Profiling through Deep Multimodal Fusion. In WSDM. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS. 249--256.Google ScholarGoogle Scholar
  12. Ruining He, Chen Fang, Zhaowen Wang, and Julian McAuley. 2016. Vista: a visually, socially, and temporally-aware model for artistic recommendation. In RecSys. ACM, 309--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ruining He and Julian McAuley. 2016. VBPR: Visual bayesian personalized ranking from implicit feedback. In AAAI. AAAI, 144--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. ACM, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In MM. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Peiguang Jing, Yuting Su, Liqiang Nie, Xu Bai, Jing Liu, and Meng Wang. 2017. Low-rank multi-view embedding learning for micro-video popularity prediction. TKDE (2017).Google ScholarGoogle Scholar
  17. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  18. Katrien Laenen, Susana Zoghbi, and Marie-Francine Moens. 2018. Web search of fashion items with multimodal querying. In WSDM. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In ICML. ACM, 1188--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, Vol. 7, 1 (2003), 76--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards micro-video understanding by joint sequential-sparse modeling. In MM. ACM, 970--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Meng Liu, Xiang Wang, Liqiang Nie, Xiangnan He, Baoquan Chen, and Tat-Seng Chua. 2018. Attentive moment retrieval in videos. In SIGIR. ACM, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2014. Deep captioning with multimodal recurrent neural networks (m-rnn). In arXiv preprint arXiv:1412.6632 .Google ScholarGoogle Scholar
  24. Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015a. Inferring networks of substitutable and complementary products. In SIGKDD. ACM, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015b. Image-based recommendations on styles and substitutes. In SIGIR. ACM, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In arXiv preprint arXiv:1301.3781 .Google ScholarGoogle Scholar
  27. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In ICML. ACM, 689--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Liqiang Nie, Xiang Wang, Jianglong Zhang, Xiangnan He, Hanwang Zhang, Richang Hong, and Qi Tian. 2017. Enhancing Micro-video Understanding by Harnessing External Sounds. In MM. ACM, 1192--1200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jennifer Rowley. 2000. Product search in e-shopping: a review and research propositions. Journal of consumer marketing, Vol. 17, 1 (2000), 20--35.Google ScholarGoogle ScholarCross RefCross Ref
  30. Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural compatibility modeling for clothing matching. In MM. ACM, 753--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nitish Srivastava and Ruslan Salakhutdinov. 2012a. Learning representations for multimodal data with deep belief nets. In ICML workshop, Vol. 79. ACM.Google ScholarGoogle Scholar
  32. Nitish Srivastava and Ruslan R Salakhutdinov. 2012b. Multimodal learning with deep boltzmann machines. In NIPS. MIT Press, 2222--2230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ning Su, Jiyin He, Yiqun Liu, Min Zhang, and Shaoping Ma. 2018. User intent, behaviour, and perceived satisfaction in product search. In WSDM. ACM, 547--555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christophe Van Gysel, Maarten de Rijke, and Evangelos Kanoulas. 2016. Learning latent vector spaces for product search. In CIKM. ACM, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Daixin Wang, Peng Cui, and Wenwu Zhu. 2016a. Structural deep network embedding. In SIGKDD. ACM, 1225--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2016b. Effective deep learning-based multi-modal retrieval. The VLDB Journal, Vol. 25, 1 (2016), 79--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In MM. ACM, 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. TOIS, Vol. 22, 2 (2004), 179--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, and Tat-Seng Chua. 2017b. Visual translation embedding network for visual relation detection. In CVPR. IEEE, 3107--3115.Google ScholarGoogle Scholar
  40. Hanwang Zhang, Yulei Niu, and Shih-Fu Chang. 2018. Grounding referring expressions in images by variational context. In CVPR. IEEE, 4158--4166.Google ScholarGoogle Scholar
  41. Hanwang Zhang, Yang Yang, Huanbo Luan, Shuicheng Yang, and Tat-Seng Chua. 2014. Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In MM. ACM, 187--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat-Seng Chua. 2013. Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In MM. ACM, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. 2017a. Joint representation learning for top-n recommendation with heterogeneous information sources. In CIKM. ACM, 1449--1458. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-modal Preference Modeling for Product Search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '18: Proceedings of the 26th ACM international conference on Multimedia
          October 2018
          2167 pages
          ISBN:9781450356657
          DOI:10.1145/3240508

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader