ABSTRACT
Sorting items by user rating is a fundamental interaction pattern of the modern Web, used to rank products (Amazon), posts (Reddit), businesses (Yelp), movies (YouTube), and more. To implement this pattern, designers must take in a distribution of ratings for each item and define a sensible total ordering over them. This is a challenging problem, since each distribution is drawn from a distinct sample population, rendering the most straightforward method of sorting --- comparing averages --- unreliable when the samples are small or of different sizes. Several statistical orderings for binary ratings have been proposed in the literature (e.g., based on the Wilson score, or Laplace smoothing), each attempting to account for the uncertainty introduced by sampling. In this paper, we study this uncertainty through the lens of human perception, and ask "How do people sort by ratings?" In an online study, we collected 48,000 item-ranking pairs from 4,000 crowd workers along with 4,800 rationales, and analyzed the results to understand how users make decisions when comparing rated items. Our results shed light on the cognitive models users employ to choose between rating distributions, which sorts of comparisons are most contentious, and how the presentation of rating information affects users' preferences.
- Dan Cosley, Shyong K. Lam, Istvan Albert, Joseph A. Konstan, and John Riedl. 2003. Is Seeing Believing?: How Recommender System Interfaces Affect Users' Opinions. In Proc. SIGCHI. 585--592. Google ScholarDigital Library
- F. Maxwell Harper, Xin Li, Yan Chen, and Joseph A. Konstan. 2005. An Economic Model of User Rating in an Online Recommender System. In Proc. UM. 307--316. Google ScholarDigital Library
- Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst. 22 (2004), 5--53. Google ScholarDigital Library
- Will Hill, Larry Stead, Mark Rosenstein, and George Furnas. 1995. Recommending and Evaluating Choices in a Virtual Community of Use. In Proc. CHI. 194--201. Google ScholarDigital Library
- Christopher K. Hsee, George F. Loewenstein, Sally Blount, and Max H. Bazerman. 1999. Preference reversals between joint and separate evaluation of options: A review and theoretical analysis. Psychological Bulletin 125, 5 (1999), 576--590. CHI 2019, May 4--9, 2019, Glasgow, Scotland Uk J. Talton et al.Google Scholar
- Nan Hu, Jie Zhang, and Paul A. Pavlou. 2009. Overcoming the J-shaped Distribution of Product Reviews. CACM 52 (2009), 144--147. Google ScholarDigital Library
- Daniel Kahneman. 2011. Thinking, fast and slow. Farrar, Straus and Giroux, New York.Google Scholar
- Daniel Kahneman and Amos Tversky. 1979. Prospect Theory: An Analysis of Decision under Risk. Econometrica 47, 2 (1979), 263--291.Google ScholarCross Ref
- Daniel Kluver, Tien T. Nguyen, Michael Ekstrand, Shilad Sen, and John Riedl. 2012. How Many Bits Per Rating?. In Proc. RecSys. 99--106. Google ScholarDigital Library
- Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225--331. Google ScholarDigital Library
- Nathan McAlone. 2017. The exec who replaced Netflix's 5-star rating system with 'thumbs up, thumbs down' explains why. http://www.businessinsider.com/ why-netflix-replaced-its-5-star-rating-system-2017--4Google Scholar
- Evan Miller. 2009. How Not To Sort By Average Rating. http://www. evanmiller.org/how-not-to-sort-by-average-rating.htmlGoogle Scholar
- Evan Miller. 2012. Bayesian Average Ratings. http://www.evanmiller. org/bayesian-average-ratings.htmlGoogle Scholar
- Evan Miller. 2014. Ranking Items With Star Ratings. http://www. evanmiller.org/how-not-to-sort-by-average-rating.htmlGoogle Scholar
- Michael P. O'Mahony, Neil J. Hurley, and Guénolé C.M. Silvestre. 2006. Detecting Noise in Recommender System Databases. In Proc. IUI. 109-- 115. Google ScholarDigital Library
- Will Qiu, Palo Parigi, and Bruno Abrahao. 2018. More Stars or More Reviews?. In Proc. CHI. 153:1--153:11. Google ScholarDigital Library
- Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee, Joseph A. Konstan, and John Riedl. 2002. Getting to Know You: Learning New User Preferences in Recommender Systems. In Proc. IUI. 127--134. Google ScholarDigital Library
- Alan Said and Alejandro Bellogín. 2018. Coherence and Inconsistencies in Rating Behavior: Estimating the Magic Barrier of Recommender Systems. UMUAI 28 (2018), 97--125. Google ScholarDigital Library
- Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proc. WWW. 285--295. Google ScholarDigital Library
- Aaron Schumacher. 2014. How To Sort By Average Rating. https://planspacedotorg.wordpress.com/2014/08/17/ how-to-sort-by-average-rating/Google Scholar
- Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, and Jose San Pedro. 2010. How Useful Are Your Comments?: Analyzing and Predicting Youtube Comments and Comment Ratings. In Proc. WWW. 891--900. Google ScholarDigital Library
- E. Isaac Sparling and Shilad Sen. 2011. Rating: How Difficult is It?. In Proc. RecSys. 149--156. Google ScholarDigital Library
- Jacob Thebault-Spieker, Daniel Kluver, Maximilian A. Klein, Aaron Halfaker, Brent Hecht, Loren Terveen, and Joseph A. Konstan. 2017. Simulation Experiments on (the Absence of) Ratings Bias in Reputation Systems. In Proc. CSCW. 101:1--101:25.Google Scholar
- Amos Tversky and Daniel Kahneman. 1985. The Framing of Decisions and the Psychology of Choice. Springer US, Boston, MA, 25--41.Google Scholar
- Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.Google ScholarCross Ref
- Timothy Wilson and Jonathan Schooler. 1991. Thinking Too Much: Introspection Can Reduce the Quality of Preferences and Decisions. Journal of personality and social psychology 60 (03 1991), 181--92.Google Scholar
- Dell Zhang, Robert Mao, Haitao Li, and Joanne Mao. 2011. How to Count Thumb-Ups and Thumb-Downs: User-Rating Based Ranking of Items from an Axiomatic Perspective. In Proc ICTIR. 238--249. Google ScholarDigital Library
Index Terms
- How do People Sort by Ratings?
Recommendations
Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics
RecSys '12: Proceedings of the sixth ACM conference on Recommender systemsThe evaluation of recommender systems in terms of ranking has recently gained attention, as it seems to better fit the top-k recommendation task than the usual ratings prediction task. In that context, several authors have proposed to consider missing ...
EigenRank: a ranking-oriented approach to collaborative filtering
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalA recommender system must be able to suggest items that are likely to be preferred by the user. In most systems, the degree of preference is represented by a rating score. Given a database of users' past ratings on a set of items, traditional ...
Pairwise preference regression for cold-start recommendation
RecSys '09: Proceedings of the third ACM conference on Recommender systemsRecommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We ...
Comments