skip to main content
10.1145/2396761.2398472acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Mining coherent anomaly collections on web data

Authors Info & Claims
Published:29 October 2012Publication History

ABSTRACT

The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection. In this paper, we propose the concept of Coherent Anomaly Collection (CAC) to capture this kind of collections, and put forward an efficient algorithm to simultaneously find the top-K disjoint CACs together with their anomalous behavior patterns. Compared with existing approaches, our new algorithm can find disjoint anomaly collections with coherent extreme behavior without having to specify either their number or sizes. Results on real Twitter data show that our approach discovers meaningful and informative hashtag spammer groups of various sizes which are hard to detect by clustering-based methods.

References

  1. D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In SIGKDD Conf., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Dai, F. Zhu, E.-P. Lim, and H. H. Pang. Detecting extreme rank anomalous collections. In SDM Conf., 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Das, J. Schneider, and D. B. Neill. Anomaly pattern detection in categorical datasets. In SIGKDD Conf., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Duan, L. Xu, Y. Liu, and J. Lee. Cluster-based outlier detection. Annals of Operations Research, 168(1), 2009.Google ScholarGoogle Scholar
  6. M. Kendall. Rank correlation methods. Griffin, 1948.Google ScholarGoogle Scholar
  7. F. T. Liu, K. M. Ting, and Z.-H. Zhou. On detecting clustered anomalies using sciforest. In ECML/PKDD Conf., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Mukherjee, B. Liu, and N. Glance. Spotting fake reviewer groups in consumer reviews. In WWW Conf., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining coherent anomaly collections on web data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader