skip to main content
10.1145/2487575.2487580acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Spotting opinion spammers using behavioral footprints

Published:11 August 2013Publication History

ABSTRACT

Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or to demote some target products. In recent years, fake review detection has attracted significant attention from both the business and research communities. However, due to the difficulty of human labeling needed for supervised learning and evaluation, the problem remains to be highly challenging. This work proposes a novel angle to the problem by modeling spamicity as latent. An unsupervised model, called Author Spamicity Model (ASM), is proposed. It works in the Bayesian setting, which facilitates modeling spamicity of authors as latent and allows us to exploit various observed behavioral footprints of reviewers. The intuition is that opinion spammers have different behavioral distributions than non-spammers. This creates a distributional divergence between the latent population distributions of two clusters: spammers and non-spammers. Model inference results in learning the population distributions of the two clusters. Several extensions of ASM are also considered leveraging from different priors. Experiments on a real-life Amazon review dataset demonstrate the effectiveness of the proposed models which significantly outperform the state-of-the-art competitors.

References

  1. Popken, B. 2010. 30 Ways You Can Spot Fake Online Reviews. The Consumerist.Google ScholarGoogle Scholar
  2. Bishop, C.M. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M. and Vigna, S. 2006. A reference collection for web spam. SIGIR Forum. (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Celeux, G., Chaveau, D., & Diebolt, J. 1996. Stochastic versions of the em algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation. (1996).Google ScholarGoogle Scholar
  5. Chirita, P.A., Diederich, J., and Nejdl, W. 2005. MailRank?: Using Ranking for Spam Detection. CIKM (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Duda, R. O., Hart, P. E., and Stork, D.J. 2001. Pattern Recognition. Wiley.Google ScholarGoogle Scholar
  7. Fayyad, U., & Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. UAI (1993), 1022--1027.Google ScholarGoogle Scholar
  8. Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. and Ghosh, R. 2013. Exploiting Burstiness in Reviews for Review Spammer Detection. ICWSM. (2013).Google ScholarGoogle Scholar
  9. Feng, S., Xing, L., Gogar, A. and Choi, Y. 2012. Distributional Footprints of Deceptive Product Reviews. ICWSM (2012).Google ScholarGoogle Scholar
  10. Feng, S., Banerjee R., Choi, Y. 2011. Syntactic Stylometry for Deception Detection. ACL (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fleiss, J. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin. (1971), 378--382.Google ScholarGoogle Scholar
  12. Frietchen, C. 2009. How to spot fake user reviews. Consumersearch.com.Google ScholarGoogle Scholar
  13. Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N. and Gummadi, K.P. 2012. Understanding and combating link farming in the twitter social network. WWW. (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jindal, N. and Liu, B. 2008. Opinion Spam and Analysis. WSDM (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jindal, N., Liu, B. and Lim, E.-P. 2010. Finding Unusual Review Patterns Using Unexpected Rules. CIKM (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joachims, T. 1999. Making large-scale support vector machine learning practical. Advances in Kernel Methods. (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. KDD (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. ECML (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kang, H., Wang, K., Soukal, D., Behr, F. and Zheng, Z. 2010. Large-scale bot detection for search engines. WWW (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Keselj, V., Peng, F., Cercone, N., Thomas, C. 2003. N- Gram-Based Author Profiles for Authorship Attribution. PACL (2003), 255--264.Google ScholarGoogle Scholar
  21. Klementiev, A., Roth, D. and Small, K. 2007. An Unsupervised Learning Algorithm for Rank Aggregation. ECML (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kolari, P., Java, A., Finin, T., Oates, T. and Joshi, A. 2006. Detecting Spam Blogs?: A Machine Learning Approach. AAAI (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Landis, J. R. and Koch, G.G. 1977. The measurement of observer agreement for categorical data. Biometrics. (1977), 159--174.Google ScholarGoogle Scholar
  24. Lauw, H.W., Lim, E. and Wang, K. 2007. Summarizing Review Scores of "Unequal" Reviewers. SIAM SDM (2007), 539--544.Google ScholarGoogle Scholar
  25. Li, F., Huang, M., Yang, Y. and Zhu, X. 2011. Learning to Identify Review Spam. IJCAI (2011), 2488--2493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lim, E.-P., Nguyen, V.-A., Jindal, N., Liu, B. and Lauw, H.W. 2010. Detecting product review spammers using rating behaviors. CIKM (2010) Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Liu, T.Y. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval. (2009), 225--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. McAulié, D.B. and J. 2007. Supervised Topic Models. NIPS (2007).Google ScholarGoogle Scholar
  29. Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. WWW (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mukherjee, A., Liu, B., Wang, J., Glance, N. and Jindal, N. 2011. Detecting Group Review Spam. WWW (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mukherjee, A., Venkataraman, V., Liu, B. and Glance, N. 2013. What Yelp Fake Review Filter might be Doing? ICWSM. (2013).Google ScholarGoogle Scholar
  32. Newman, M.L., Pennebaker, J.W., Berry, D.S., Richards, J.M. 2003. Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin. (2003), 665--675.Google ScholarGoogle Scholar
  33. Ott, M., Cardie, C. and Hancock, J. 2012. Estimating the prevalence of deception in online review communities. WWW (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. ACL (2011), 309--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pandit, S., Chau, D.H., Wang, S. and Faloutsos, C. 2007. NetProbe?: A Fast and Scalable System for Fraud Detection in Online Auction Networks. WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ramage, D., Hall, D., Nallapati, R., & Manning, C.D. 2009. A supervised topic model for credit attribution in multi-labeled corpora. EMNLP (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Smyth, P. 1999. Probabilistic Model-Based Clustering of Multivariate and Sequential Data. AISTATS (1999).Google ScholarGoogle Scholar
  38. Spirin, N. and Han, J. 2012. Survey on Web Spam Detection?: Principles and Algorithms. ACM SIGKDD Explorations. 13, 2 (2012), 50--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Streitfeld, D. 2012. Buy Reviews on Yelp, Get Black Mark. (2012).Google ScholarGoogle Scholar
  40. Streitfeld, D. 2012. Fake Reviews, Real Problem. New York Times.Google ScholarGoogle Scholar
  41. Vogt, C.C., Cottrell, G.W. 1999. Fusion via a linear combination of scores. Information Retrieval. (1999), 151--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wang, G., Xie, S., Liu, B. and Yu, P.S. 2011. Review Graph Based Online Store Review Spammer Detection. ICDM (2011), 1242--1247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wang, Z. 2010. Anonymity, Social Image, and the Competition for Volunteers: A Case Study of the Online Market for Reviews. The B.E. Journal of Economic Analysis & Policy. 10, 1 (Jan. 2010), 1--34.Google ScholarGoogle Scholar
  44. Wei, F., Li, W., Liu, S. 2010. iRANK: A Rank-Learn-Combine Framework for Unsupervised Ensemble Ranking. Journal of the American Society for Information Science and Technology. (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wu, B., Goel V. & Davison, B.D. 2006. Topical TrustRank: using topicality to combat Web spam. WWW (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xie, S., Wang, G., Lin, S. and Yu, P.S. 2012. Review spam detection via temporal pattern discovery. KDD. (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Freund, Y., Iyer, R., Schapire, R. and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research. 4 (2003), 933--959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. 1997. L-BFGS-B: Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software. (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spotting opinion spammers using behavioral footprints

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2013
        1534 pages
        ISBN:9781450321747
        DOI:10.1145/2487575

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 August 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader