ABSTRACT
The multi-armed bandit is an important framework for balancing exploration with exploitation in recommendation. Exploitation recommends content (e.g., products, movies, music playlists) with the highest predicted user engagement and has traditionally been the focus of recommender systems. Exploration recommends content with uncertain predicted user engagement for the purpose of gathering more information. The importance of exploration has been recognized in recent years, particularly in settings with new users, new items, non-stationary preferences and attributes. In parallel, explaining recommendations ("recsplanations") is crucial if users are to understand their recommendations. Existing work has looked at bandits and explanations independently. We provide the first method that combines both in a principled manner. In particular, our method is able to jointly (1) learn which explanations each user responds to; (2) learn the best content to recommend for each user; and (3) balance exploration with exploitation to deal with uncertainty. Experiments with historical log data and tests with live production traffic in a large-scale music recommendation service show a significant improvement in user engagement.
Supplemental Material
- Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning. 1638--1646. Google ScholarDigital Library
- Svetlin Bostandjiev, John O'Donovan, and Tobias Höllerer. 2012. TasteWeights: a visual interactive hybrid recommender system. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 35--42. Google ScholarDigital Library
- Allison J. B. Chaney, Brandon Stewart, and Barbara Engelhardt. 2017. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. arXiv preprint arXiv:1710.11214 (2017).Google Scholar
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for YouTube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198. Google ScholarDigital Library
- Miroslav Dudik, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011). Google ScholarDigital Library
- Gerhard Friedrich and Markus Zanker. 2011. A taxonomy for generating explanations in recommender systems. AI Magazine 32, 3 (2011), 90--98.Google ScholarDigital Library
- Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 198--206. Google ScholarDigital Library
- Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. Ieee, 263--272. Google ScholarDigital Library
- Thorsten Joachims, Dayne Freitag, and Tom Mitchell. 1997. Webwatcher: A tour guide for the world wide web. In International Joint Conference on Artificial Intelligence (IJCAI). 770--777.Google Scholar
- Thorsten Joachims and Adith Swaminathan. 2016. Tutorial on Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement. In ACM Conference on Research and Development in Information Retrieval (SIGIR). 1199--1201. Google ScholarDigital Library
- Antti Kangasrääsiö, Dorota Glowacka, and Samuel Kaski. 2015. Improving controllability and predictability of interactive recommendation interfaces for exploratory search. In Proceedings of the 20th international conference on intelligent user interfaces. ACM, 247--251. Google ScholarDigital Library
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009). Google ScholarDigital Library
- Pigi Kouki, James Schaffer, Jay Pujara, John O'Donovan, and Lise Getoor. 2017. User Preferences for Hybrid Explanations. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 84--88. Google ScholarDigital Library
- Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. In International Conference on Machine Learning (ICML). 767--776. Google ScholarDigital Library
- Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2014. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarDigital Library
- Paul Lamere. 2017. https://twitter.com/plamere/status/822021478170423296. Twitter. (2017).Google Scholar
- Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web (WWW). ACM, 661--670. Google ScholarDigital Library
- Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Steffen Rendle. 2010. Factorization machines. In IEEE 10th International Conference on Data Mining (ICDM). IEEE, 995--1000. Google ScholarDigital Library
- Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research 6, Sep (2005), 1265--1295. Google ScholarDigital Library
- Anongnart Srivihok and Pisit Sukonmanee. 2005. E-commerce intelligent agent: personalization travel support agent using Q Learning. In Proceedings of the 7th international conference on Electronic commerce. ACM, 287--292. Google ScholarDigital Library
- Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miro Dudik, John Langford, Damien Jose, and Imed Zitouni. 2017. Off-policy evaluation for slate recommendation. In Advances in Neural Information Processing Systems. 3635--3645.Google Scholar
- Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recommender systems. In IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 801--810. Google ScholarDigital Library
- Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 7. Google ScholarDigital Library
Index Terms
- Explore, exploit, and explain: personalizing explainable recommendations with bandits
Recommendations
The Exploit-Explore Dilemma in Music Recommendation
RecSys '16: Proceedings of the 10th ACM Conference on Recommender SystemsWere The Rolling Stones right when they said, "You can't always get what you want; but if you try sometime you get what you need"? Recommendation systems are the crystal ball of the Internet: predicting user intentions, making sense of big data, and ...
Let Me Explain: Impact of Personal and Impersonal Explanations on Trust in Recommender Systems
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsTrust in a Recommender System (RS) is crucial for its overall success. However, it remains underexplored whether users trust personal recommendation sources (i.e. other humans) more than impersonal sources (i.e. conventional RS), and, if they do, ...
Explore-exploit in top-N recommender systems via Gaussian processes
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systemsWe address the challenge of ranking recommendation lists based on click feedback by efficiently encoding similarities among users and among items. The key challenges are threefold: (1) combinatorial number of lists; (2) sparse feedback and (3) context ...
Comments