poster

Spotting opinion spammers using behavioral footprints

Authors:
Arjun Mukherjee

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

,
Abhinav Kumar

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

,
Bing Liu

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

,
Junhui Wang

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

,
Meichun Hsu

HP Labs, Palo Alto, CA, USA

HP Labs, Palo Alto, CA, USA
View Profile

,
Malu Castellanos

HP Labs, Palo Alto, CA, USA

HP Labs, Palo Alto, CA, USA
View Profile

,
Riddhiman Ghosh

HP Labs, Palo Alto, CA, USA

HP Labs, Palo Alto, CA, USA
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 632–640https://doi.org/10.1145/2487575.2487580

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 632–640

ABSTRACT

Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or to demote some target products. In recent years, fake review detection has attracted significant attention from both the business and research communities. However, due to the difficulty of human labeling needed for supervised learning and evaluation, the problem remains to be highly challenging. This work proposes a novel angle to the problem by modeling spamicity as latent. An unsupervised model, called Author Spamicity Model (ASM), is proposed. It works in the Bayesian setting, which facilitates modeling spamicity of authors as latent and allows us to exploit various observed behavioral footprints of reviewers. The intuition is that opinion spammers have different behavioral distributions than non-spammers. This creates a distributional divergence between the latent population distributions of two clusters: spammers and non-spammers. Model inference results in learning the population distributions of the two clusters. Several extensions of ASM are also considered leveraging from different priors. Experiments on a real-life Amazon review dataset demonstrate the effectiveness of the proposed models which significantly outperform the state-of-the-art competitors.

References

Popken, B. 2010. 30 Ways You Can Spot Fake Online Reviews. The Consumerist.Google Scholar
Bishop, C.M. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarDigital Library
Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M. and Vigna, S. 2006. A reference collection for web spam. SIGIR Forum. (2006). Google ScholarDigital Library
Celeux, G., Chaveau, D., & Diebolt, J. 1996. Stochastic versions of the em algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation. (1996).Google Scholar
Chirita, P.A., Diederich, J., and Nejdl, W. 2005. MailRank?: Using Ranking for Spam Detection. CIKM (2005). Google ScholarDigital Library
Duda, R. O., Hart, P. E., and Stork, D.J. 2001. Pattern Recognition. Wiley.Google Scholar
Fayyad, U., & Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. UAI (1993), 1022--1027.Google Scholar
Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. and Ghosh, R. 2013. Exploiting Burstiness in Reviews for Review Spammer Detection. ICWSM. (2013).Google Scholar
Feng, S., Xing, L., Gogar, A. and Choi, Y. 2012. Distributional Footprints of Deceptive Product Reviews. ICWSM (2012).Google Scholar
Feng, S., Banerjee R., Choi, Y. 2011. Syntactic Stylometry for Deception Detection. ACL (2011). Google ScholarDigital Library
Fleiss, J. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin. (1971), 378--382.Google Scholar
Frietchen, C. 2009. How to spot fake user reviews. Consumersearch.com.Google Scholar
Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N. and Gummadi, K.P. 2012. Understanding and combating link farming in the twitter social network. WWW. (2012). Google ScholarDigital Library
Jindal, N. and Liu, B. 2008. Opinion Spam and Analysis. WSDM (2008). Google ScholarDigital Library
Jindal, N., Liu, B. and Lim, E.-P. 2010. Finding Unusual Review Patterns Using Unexpected Rules. CIKM (2010). Google ScholarDigital Library
Joachims, T. 1999. Making large-scale support vector machine learning practical. Advances in Kernel Methods. (1999). Google ScholarDigital Library
Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. KDD (2002). Google ScholarDigital Library
Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. ECML (1998). Google ScholarDigital Library
Kang, H., Wang, K., Soukal, D., Behr, F. and Zheng, Z. 2010. Large-scale bot detection for search engines. WWW (2010). Google ScholarDigital Library
Keselj, V., Peng, F., Cercone, N., Thomas, C. 2003. N- Gram-Based Author Profiles for Authorship Attribution. PACL (2003), 255--264.Google Scholar
Klementiev, A., Roth, D. and Small, K. 2007. An Unsupervised Learning Algorithm for Rank Aggregation. ECML (2007). Google ScholarDigital Library
Kolari, P., Java, A., Finin, T., Oates, T. and Joshi, A. 2006. Detecting Spam Blogs?: A Machine Learning Approach. AAAI (2006). Google ScholarDigital Library
Landis, J. R. and Koch, G.G. 1977. The measurement of observer agreement for categorical data. Biometrics. (1977), 159--174.Google Scholar
Lauw, H.W., Lim, E. and Wang, K. 2007. Summarizing Review Scores of "Unequal" Reviewers. SIAM SDM (2007), 539--544.Google Scholar
Li, F., Huang, M., Yang, Y. and Zhu, X. 2011. Learning to Identify Review Spam. IJCAI (2011), 2488--2493. Google ScholarDigital Library
Lim, E.-P., Nguyen, V.-A., Jindal, N., Liu, B. and Lauw, H.W. 2010. Detecting product review spammers using rating behaviors. CIKM (2010) Google ScholarDigital Library
Liu, T.Y. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval. (2009), 225--331. Google ScholarDigital Library
McAulié, D.B. and J. 2007. Supervised Topic Models. NIPS (2007).Google Scholar
Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. WWW (2012). Google ScholarDigital Library
Mukherjee, A., Liu, B., Wang, J., Glance, N. and Jindal, N. 2011. Detecting Group Review Spam. WWW (2011). Google ScholarDigital Library
Mukherjee, A., Venkataraman, V., Liu, B. and Glance, N. 2013. What Yelp Fake Review Filter might be Doing? ICWSM. (2013).Google Scholar
Newman, M.L., Pennebaker, J.W., Berry, D.S., Richards, J.M. 2003. Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin. (2003), 665--675.Google Scholar
Ott, M., Cardie, C. and Hancock, J. 2012. Estimating the prevalence of deception in online review communities. WWW (2012). Google ScholarDigital Library
Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. ACL (2011), 309--319. Google ScholarDigital Library
Pandit, S., Chau, D.H., Wang, S. and Faloutsos, C. 2007. NetProbe?: A Fast and Scalable System for Fraud Detection in Online Auction Networks. WWW. Google ScholarDigital Library
Ramage, D., Hall, D., Nallapati, R., & Manning, C.D. 2009. A supervised topic model for credit attribution in multi-labeled corpora. EMNLP (2009). Google ScholarDigital Library
Smyth, P. 1999. Probabilistic Model-Based Clustering of Multivariate and Sequential Data. AISTATS (1999).Google Scholar
Spirin, N. and Han, J. 2012. Survey on Web Spam Detection?: Principles and Algorithms. ACM SIGKDD Explorations. 13, 2 (2012), 50--64. Google ScholarDigital Library
Streitfeld, D. 2012. Buy Reviews on Yelp, Get Black Mark. (2012).Google Scholar
Streitfeld, D. 2012. Fake Reviews, Real Problem. New York Times.Google Scholar
Vogt, C.C., Cottrell, G.W. 1999. Fusion via a linear combination of scores. Information Retrieval. (1999), 151--173. Google ScholarDigital Library
Wang, G., Xie, S., Liu, B. and Yu, P.S. 2011. Review Graph Based Online Store Review Spammer Detection. ICDM (2011), 1242--1247. Google ScholarDigital Library
Wang, Z. 2010. Anonymity, Social Image, and the Competition for Volunteers: A Case Study of the Online Market for Reviews. The B.E. Journal of Economic Analysis & Policy. 10, 1 (Jan. 2010), 1--34.Google Scholar
Wei, F., Li, W., Liu, S. 2010. iRANK: A Rank-Learn-Combine Framework for Unsupervised Ensemble Ranking. Journal of the American Society for Information Science and Technology. (2010). Google ScholarDigital Library
Wu, B., Goel V. & Davison, B.D. 2006. Topical TrustRank: using topicality to combat Web spam. WWW (2006). Google ScholarDigital Library
Xie, S., Wang, G., Lin, S. and Yu, P.S. 2012. Review spam detection via temporal pattern discovery. KDD. (2012). Google ScholarDigital Library
Freund, Y., Iyer, R., Schapire, R. and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research. 4 (2003), 933--959. Google ScholarDigital Library
Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. 1997. L-BFGS-B: Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software. (1997). Google ScholarDigital Library

Index Terms

Spotting opinion spammers using behavioral footprints
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Opinion spam and analysis
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research ...
Read More
Discovering Opinion Spammer Groups by Network Footprints
COSN '15: Proceedings of the 2015 ACM on Conference on Online Social Networks

Online reviews are an important source for consumers to evaluate products/services on the Internet (e.g. Amazon, Yelp, etc.). However, more and more fraudulent reviewers write fake reviews to mislead users. To maximize their impact and share effort, ...
Read More
Toward understanding the cliques of opinion spammers with social network analysis
ASONAM '16: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Consumer generated product reviews are considered as more persuasive than commercial advertising, and are now an important message source to make purchase decision. Nevertheless, firms may purposely hire spammers to create fake reviews to promote their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
abuse
deceptive and fake reviewer detection
opinion spam
Qualifiers
- poster
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 262
  Total Citations
  View Citations
- 1,774
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Spotting opinion spammers using behavioral footprints

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Opinion spam and analysis

Discovering Opinion Spammer Groups by Network Footprints

Toward understanding the cliques of opinion spammers with social network analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Spotting opinion spammers using behavioral footprints

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Opinion spam and analysis

Discovering Opinion Spammer Groups by Network Footprints

Toward understanding the cliques of opinion spammers with social network analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media