ABSTRACT
Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or to demote some target products. In recent years, fake review detection has attracted significant attention from both the business and research communities. However, due to the difficulty of human labeling needed for supervised learning and evaluation, the problem remains to be highly challenging. This work proposes a novel angle to the problem by modeling spamicity as latent. An unsupervised model, called Author Spamicity Model (ASM), is proposed. It works in the Bayesian setting, which facilitates modeling spamicity of authors as latent and allows us to exploit various observed behavioral footprints of reviewers. The intuition is that opinion spammers have different behavioral distributions than non-spammers. This creates a distributional divergence between the latent population distributions of two clusters: spammers and non-spammers. Model inference results in learning the population distributions of the two clusters. Several extensions of ASM are also considered leveraging from different priors. Experiments on a real-life Amazon review dataset demonstrate the effectiveness of the proposed models which significantly outperform the state-of-the-art competitors.
- Popken, B. 2010. 30 Ways You Can Spot Fake Online Reviews. The Consumerist.Google Scholar
- Bishop, C.M. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
- Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M. and Vigna, S. 2006. A reference collection for web spam. SIGIR Forum. (2006). Google ScholarDigital Library
- Celeux, G., Chaveau, D., & Diebolt, J. 1996. Stochastic versions of the em algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation. (1996).Google Scholar
- Chirita, P.A., Diederich, J., and Nejdl, W. 2005. MailRank?: Using Ranking for Spam Detection. CIKM (2005). Google ScholarDigital Library
- Duda, R. O., Hart, P. E., and Stork, D.J. 2001. Pattern Recognition. Wiley.Google Scholar
- Fayyad, U., & Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. UAI (1993), 1022--1027.Google Scholar
- Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. and Ghosh, R. 2013. Exploiting Burstiness in Reviews for Review Spammer Detection. ICWSM. (2013).Google Scholar
- Feng, S., Xing, L., Gogar, A. and Choi, Y. 2012. Distributional Footprints of Deceptive Product Reviews. ICWSM (2012).Google Scholar
- Feng, S., Banerjee R., Choi, Y. 2011. Syntactic Stylometry for Deception Detection. ACL (2011). Google ScholarDigital Library
- Fleiss, J. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin. (1971), 378--382.Google Scholar
- Frietchen, C. 2009. How to spot fake user reviews. Consumersearch.com.Google Scholar
- Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N. and Gummadi, K.P. 2012. Understanding and combating link farming in the twitter social network. WWW. (2012). Google ScholarDigital Library
- Jindal, N. and Liu, B. 2008. Opinion Spam and Analysis. WSDM (2008). Google ScholarDigital Library
- Jindal, N., Liu, B. and Lim, E.-P. 2010. Finding Unusual Review Patterns Using Unexpected Rules. CIKM (2010). Google ScholarDigital Library
- Joachims, T. 1999. Making large-scale support vector machine learning practical. Advances in Kernel Methods. (1999). Google ScholarDigital Library
- Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. KDD (2002). Google ScholarDigital Library
- Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. ECML (1998). Google ScholarDigital Library
- Kang, H., Wang, K., Soukal, D., Behr, F. and Zheng, Z. 2010. Large-scale bot detection for search engines. WWW (2010). Google ScholarDigital Library
- Keselj, V., Peng, F., Cercone, N., Thomas, C. 2003. N- Gram-Based Author Profiles for Authorship Attribution. PACL (2003), 255--264.Google Scholar
- Klementiev, A., Roth, D. and Small, K. 2007. An Unsupervised Learning Algorithm for Rank Aggregation. ECML (2007). Google ScholarDigital Library
- Kolari, P., Java, A., Finin, T., Oates, T. and Joshi, A. 2006. Detecting Spam Blogs?: A Machine Learning Approach. AAAI (2006). Google ScholarDigital Library
- Landis, J. R. and Koch, G.G. 1977. The measurement of observer agreement for categorical data. Biometrics. (1977), 159--174.Google Scholar
- Lauw, H.W., Lim, E. and Wang, K. 2007. Summarizing Review Scores of "Unequal" Reviewers. SIAM SDM (2007), 539--544.Google Scholar
- Li, F., Huang, M., Yang, Y. and Zhu, X. 2011. Learning to Identify Review Spam. IJCAI (2011), 2488--2493. Google ScholarDigital Library
- Lim, E.-P., Nguyen, V.-A., Jindal, N., Liu, B. and Lauw, H.W. 2010. Detecting product review spammers using rating behaviors. CIKM (2010) Google ScholarDigital Library
- Liu, T.Y. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval. (2009), 225--331. Google ScholarDigital Library
- McAulié, D.B. and J. 2007. Supervised Topic Models. NIPS (2007).Google Scholar
- Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. WWW (2012). Google ScholarDigital Library
- Mukherjee, A., Liu, B., Wang, J., Glance, N. and Jindal, N. 2011. Detecting Group Review Spam. WWW (2011). Google ScholarDigital Library
- Mukherjee, A., Venkataraman, V., Liu, B. and Glance, N. 2013. What Yelp Fake Review Filter might be Doing? ICWSM. (2013).Google Scholar
- Newman, M.L., Pennebaker, J.W., Berry, D.S., Richards, J.M. 2003. Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin. (2003), 665--675.Google Scholar
- Ott, M., Cardie, C. and Hancock, J. 2012. Estimating the prevalence of deception in online review communities. WWW (2012). Google ScholarDigital Library
- Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. ACL (2011), 309--319. Google ScholarDigital Library
- Pandit, S., Chau, D.H., Wang, S. and Faloutsos, C. 2007. NetProbe?: A Fast and Scalable System for Fraud Detection in Online Auction Networks. WWW. Google ScholarDigital Library
- Ramage, D., Hall, D., Nallapati, R., & Manning, C.D. 2009. A supervised topic model for credit attribution in multi-labeled corpora. EMNLP (2009). Google ScholarDigital Library
- Smyth, P. 1999. Probabilistic Model-Based Clustering of Multivariate and Sequential Data. AISTATS (1999).Google Scholar
- Spirin, N. and Han, J. 2012. Survey on Web Spam Detection?: Principles and Algorithms. ACM SIGKDD Explorations. 13, 2 (2012), 50--64. Google ScholarDigital Library
- Streitfeld, D. 2012. Buy Reviews on Yelp, Get Black Mark. (2012).Google Scholar
- Streitfeld, D. 2012. Fake Reviews, Real Problem. New York Times.Google Scholar
- Vogt, C.C., Cottrell, G.W. 1999. Fusion via a linear combination of scores. Information Retrieval. (1999), 151--173. Google ScholarDigital Library
- Wang, G., Xie, S., Liu, B. and Yu, P.S. 2011. Review Graph Based Online Store Review Spammer Detection. ICDM (2011), 1242--1247. Google ScholarDigital Library
- Wang, Z. 2010. Anonymity, Social Image, and the Competition for Volunteers: A Case Study of the Online Market for Reviews. The B.E. Journal of Economic Analysis & Policy. 10, 1 (Jan. 2010), 1--34.Google Scholar
- Wei, F., Li, W., Liu, S. 2010. iRANK: A Rank-Learn-Combine Framework for Unsupervised Ensemble Ranking. Journal of the American Society for Information Science and Technology. (2010). Google ScholarDigital Library
- Wu, B., Goel V. & Davison, B.D. 2006. Topical TrustRank: using topicality to combat Web spam. WWW (2006). Google ScholarDigital Library
- Xie, S., Wang, G., Lin, S. and Yu, P.S. 2012. Review spam detection via temporal pattern discovery. KDD. (2012). Google ScholarDigital Library
- Freund, Y., Iyer, R., Schapire, R. and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research. 4 (2003), 933--959. Google ScholarDigital Library
- Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. 1997. L-BFGS-B: Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software. (1997). Google ScholarDigital Library
Index Terms
- Spotting opinion spammers using behavioral footprints
Recommendations
Opinion spam and analysis
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningEvaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research ...
Discovering Opinion Spammer Groups by Network Footprints
COSN '15: Proceedings of the 2015 ACM on Conference on Online Social NetworksOnline reviews are an important source for consumers to evaluate products/services on the Internet (e.g. Amazon, Yelp, etc.). However, more and more fraudulent reviewers write fake reviews to mislead users. To maximize their impact and share effort, ...
Toward understanding the cliques of opinion spammers with social network analysis
ASONAM '16: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningConsumer generated product reviews are considered as more persuasive than commercial advertising, and are now an important message source to make purchase decision. Nevertheless, firms may purposely hire spammers to create fake reviews to promote their ...
Comments