article

Free Access

Distributed data clustering can be efficient and exact

Authors:
George Forman

Hewlett-Packard Research Labs., 1501 Page Mill, Palo Alto, CA

Hewlett-Packard Research Labs., 1501 Page Mill, Palo Alto, CA
View Profile

,
Bin Zhang

Hewlett-Packard Research Labs., 1501 Page Mill, Palo Alto, CA

Hewlett-Packard Research Labs., 1501 Page Mill, Palo Alto, CA
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 2 Issue 2Dec. 2000pp 34–38https://doi.org/10.1145/380995.381010

Published:01 December 2000Publication History

ACM SIGKDD Explorations Newsletter

References

{BF98} Bradley, P., and Fayyad, U. M., "Refining Initial Points for KM Clustering," Microsoft Technical Report 98-36, May 1998.Google Scholar
{BFR98} Bradley, P., Fayyad, U. M., and Reina, C. A., "Scaling EM Clustering to Large Databases," Microsoft Technical Report, 1998.Google Scholar
{BFR98a} Bradley, P., Fayyad, U. M., and Reina, C. A., "Scaling Clustering to Large Databases," KDD98, 1998.Google Scholar
{DLR77} Dempster, A. P., Laird, N. M., and Rubin, D. B., "Miximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society, Series B, 39(1):1-38, 1977.Google Scholar
{DM99} Dhillon, I. S. and Modha, D. S. "A data clustering algorithm on distributed memory machines," ACM SIGKDD Workshop on Large-Scale Parallel KDD Systems (with KDD99), August 1999.Google Scholar
{GG92} Gersho & Gray, "Vector Quantization and Signal Compression," KAP, 1992. Google ScholarDigital Library
{JD77} Anil K. Jain, Richard C. Dubes, "Algorithms for Clustering Data (Prentice Hall Advanced Reference Series : Computer Science)," Prentice Hall, 1977.Google Scholar
{KC99} Kantabutra, S. and Couch, A. L., "Parallel K-Means Clustering Algorithm on NOWs," NECTEC Technical Journal, Vol. 1, No. l, March 1999.Google Scholar
{KR90} Kaufman, L. and Rousseeuw, P. J., "Finding Groups in Data : An Introduction to Cluster Analysis," John Wiley & Sons, 1990.Google Scholar
{M67} MacQueen, J. "Some methods for classification and analysis of multivariate observations," pp. 281-297 in: L. M. Le Cam & J. Neyman {eds.} Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p. 1967.Google Scholar
{MK97} McLachlan, G. J. and Krishnan, T., "The EM Algorithm and Extensions," John Wiley & Sons, Inc., 1997.Google Scholar
{NetPerception} A commercial recommender system, http://www.netperceptions.comGoogle Scholar
{RF97} Ruocco A. and Frieder O., "Clustering and Classification of Large Document Bases in a Parallel Environment," Journal of the American Society of Information Science, 48(10), October 1997. Google ScholarDigital Library
{S99} Snyder, L., "A Programmer's Guide to ZPL," Scientific and Engineering Computation Series, MIT Press; ISBN: 0262692171, 1999. See also: http://www.cs.washington.edu/research/zpl Google ScholarDigital Library
{ZHD00a} Zhang, B., Hsu, M. and Dayal, U. (2000). "K-Harmonic Means: A Spatial Clustering Algorithm with Boosting." In Proc. International Workshop on Temporal, Spatial and Spatio-Temporal Data Mining, TSDM2000, Lyon, France, Lecture Notes in Artificial Intelligence, 2007. Roddick, J. F. and Hornsby, K., Eds., Springer.Google ScholarDigital Library
{Z00b} Zhang, B. "Generalized K-Harmonic Means - Boosting in Unsupervised Learning", Hewllet-Packard Laboratories Technical Report: http://www.hpl.hp.com/techreports/2000/HPL- 2000-137.html.Google Scholar
{ZHF00} Zhang, B., Hsu, M., and Forman, G. "Accurate Recasting of Parameter Estimation Algorithms using Sufficient Statistics for Efficient Parallel Speed-up: Demonstrated for Center-Based Data Clustering Algorithms," 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 13-16, 2000. Also available as Hewlett-Packard Labs Technical Report HPL-2000-6.Google Scholar
{ZRL96} Zhang, T., Ramakrishnan, R., and Livny, M., "BIRCH: an efficient data clustering method for very large databases," ACM SIGMOD Record, Vol. 25, No. 2, pages 103-114, June 1996. Google ScholarDigital Library

Index Terms

Distributed data clustering can be efficient and exact

Recommendations

Mining constrained frequent itemsets from distributed uncertain data

Nowadays, high volumes of massive data can be generated from various sources (e.g.,sensor data from environmental surveillance). Many existing distributed frequent itemset mining algorithms do not allow users to express the itemsets to be mined ...
Read More
Distributed Association Mining on Message Passing Systems
ISPA '10: Proceedings of the International Symposium on Parallel and Distributed Processing with Applications

Association mining in finding relationships between items in a dataset has been demonstrated to be practical in business applications. Many companies are applying association mining on market data for analyzing consumers’ purchase behavior. The Apriori ...
Read More
New Spark solutions for distributed frequent itemset and association rule mining algorithms
Abstract
The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGKDD Explorations Newsletter Volume 2, Issue 2
Special issue on “Scalable data mining algorithms”
Dec. 2000
114 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/380995
Editors:
Usama Fayyad
digiMine, Inc.
,
Kyuseok Shim
Korea Advanced Institute of Science and Technology, Korea
,
P. S. Bradley
Disimine, Inc
,
S. Sarawagi
School of Information Technology, Powai-Mumbai, India
Issue’s Table of Contents
Copyright © 2000 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2000
Check for updates
Author Tags
data mining
distributed computing
multidimensional data clustering
parallel algorithms
very large databases
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 76
  Total Citations
  View Citations
- 1,383
  Total Downloads
- Downloads (Last 12 months)160
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter

References

Cited By

Index Terms

Recommendations

Mining constrained frequent itemsets from distributed uncertain data

Distributed Association Mining on Message Passing Systems

New Spark solutions for distributed frequent itemset and association rule mining algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter

References

Cited By

Index Terms

Recommendations

Mining constrained frequent itemsets from distributed uncertain data

Distributed Association Mining on Message Passing Systems

New Spark solutions for distributed frequent itemset and association rule mining algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media