Article

k-means++: the advantages of careful seeding

Authors:
David Arthur

Stanford University

Stanford University
View Profile

,
Sergei Vassilvitskii

Stanford University

Stanford University
View Profile

Authors Info & Claims

SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithmsJanuary 2007Pages 1027–1035

Published:07 January 2007Publication History

SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms

Pages 1027–1035

ABSTRACT

The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a very simple, randomized seeding technique, we obtain an algorithm that is Θ(logk)-competitive with the optimal clustering. Preliminary experiments show that our augmentation improves both the speed and the accuracy of k-means, often quite dramatically.

References

References are not available

Recommendations

ImageNet classification with deep convolutional neural networks

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, ...
Read More
Data clustering: a review

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad ...
Read More
XGBoost: A Scalable Tree Boosting System
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
January 2007
1322 pages
ISBN:9780898716245
Conference Chair:
Harold Gabow
University of Colorado, Boulder
Sponsors
In-Cooperation
Publisher
Society for Industrial and Applied Mathematics
United States
Publication History
- Published: 7 January 2007
Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SODA '07 Paper Acceptance Rate139of382submissions,36%Overall Acceptance Rate411of1,322submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 585
  Total Citations
  View Citations
- 8,827
  Total Downloads
- Downloads (Last 12 months)797
- Downloads (Last 6 weeks)74
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

k-means++: the advantages of careful seeding

SODA '07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms

ABSTRACT

References

Cited By

Recommendations

ImageNet classification with deep convolutional neural networks

Data clustering: a review

XGBoost: A Scalable Tree Boosting System