poster

Comparative study of clustering techniques for short text documents

Authors:
Aniket Rangrej

IIT Madras, Chennai, India

IIT Madras, Chennai, India
View Profile

,
Sayali Kulkarni

Self, Pune, India

Self, Pune, India
View Profile

,
Ashish V. Tendulkar

IIT Madras, Chennai, India

IIT Madras, Chennai, India
View Profile

WWW '11: Proceedings of the 20th international conference companion on World wide webMarch 2011Pages 111–112https://doi.org/10.1145/1963192.1963249

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference companion on World wide web

Pages 111–112

ABSTRACT

We compare various document clustering techniques including K-means, SVD-based method and a graph-based approach and their performance on short text data collected from Twitter. We define a measure for evaluating the cluster error with these techniques. Observations show that graph-based approach using affinity propagation performs best in clustering short text data with minimal cluster error.

References

Somnath Banerjee, Krishnan Ramanathan, and Ajay Gupta, Clustering short texts using wikipedia, SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA), ACM, 2007, pp. 787--788. Google ScholarDigital Library
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science 41 (1990), 391--407.Google ScholarCross Ref
Brendan J. Frey and Delbert Dueck, Clustering by passing messages between data points, Science 315 (2007), 972--976.Google ScholarCross Ref
Jeon hyung Kang, Kristina Lerman, and Plangprasopchok Anon, Analyzing microblogs with affinity propagation, Proceedings of KDD workshop on Social Media Analytic, July 2010. Google ScholarDigital Library
Brendan O'Connor, Michel Krieger, and David Ahn, Tweetmotif: Exploratory search and topic summarization for twitter, ICWSM, 2010.Google Scholar
Nordianah Ab Samat, Masrah Azrifah Azmi Murad, Muhamad Taufik Abdullah, and Rodziah Atan, Malay documents clustering algorithm based on singular value decomposition.Google Scholar
M. Steinbach, G. Karypis, and V. Kumar, A comparison of document clustering techniques, Technical Report 00-034, University of Minnesota, 2000.Google Scholar

Index Terms

Comparative study of clustering techniques for short text documents
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Initializing K-means Clustering Using Affinity Propagation
HIS '09: Proceedings of the 2009 Ninth International Conference on Hybrid Intelligent Systems - Volume 01

K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel ...
Read More
Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering
Abstract
Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k-means and fuzzy c-means (FCM) clustering. We first provide some related ...
Read More
Ant clustering algorithm with K-harmonic means clustering

Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference companion on World wide web
March 2011
552 pages
ISBN:9781450306379
DOI:10.1145/1963192
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
K-means
SVD
affinity propagation
clustering
short text
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 1,270
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Comparative study of clustering techniques for short text documents

WWW '11: Proceedings of the 20th international conference companion on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Initializing K-means Clustering Using Affinity Propagation

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Ant clustering algorithm with K-harmonic means clustering