skip to main content
article
Free Access

Distributed data clustering can be efficient and exact

Authors Info & Claims
Published:01 December 2000Publication History
First page image

References

  1. {BF98} Bradley, P., and Fayyad, U. M., "Refining Initial Points for KM Clustering," Microsoft Technical Report 98-36, May 1998.Google ScholarGoogle Scholar
  2. {BFR98} Bradley, P., Fayyad, U. M., and Reina, C. A., "Scaling EM Clustering to Large Databases," Microsoft Technical Report, 1998.Google ScholarGoogle Scholar
  3. {BFR98a} Bradley, P., Fayyad, U. M., and Reina, C. A., "Scaling Clustering to Large Databases," KDD98, 1998.Google ScholarGoogle Scholar
  4. {DLR77} Dempster, A. P., Laird, N. M., and Rubin, D. B., "Miximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society, Series B, 39(1):1-38, 1977.Google ScholarGoogle Scholar
  5. {DM99} Dhillon, I. S. and Modha, D. S. "A data clustering algorithm on distributed memory machines," ACM SIGKDD Workshop on Large-Scale Parallel KDD Systems (with KDD99), August 1999.Google ScholarGoogle Scholar
  6. {GG92} Gersho & Gray, "Vector Quantization and Signal Compression," KAP, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {JD77} Anil K. Jain, Richard C. Dubes, "Algorithms for Clustering Data (Prentice Hall Advanced Reference Series : Computer Science)," Prentice Hall, 1977.Google ScholarGoogle Scholar
  8. {KC99} Kantabutra, S. and Couch, A. L., "Parallel K-Means Clustering Algorithm on NOWs," NECTEC Technical Journal, Vol. 1, No. l, March 1999.Google ScholarGoogle Scholar
  9. {KR90} Kaufman, L. and Rousseeuw, P. J., "Finding Groups in Data : An Introduction to Cluster Analysis," John Wiley & Sons, 1990.Google ScholarGoogle Scholar
  10. {M67} MacQueen, J. "Some methods for classification and analysis of multivariate observations," pp. 281-297 in: L. M. Le Cam & J. Neyman {eds.} Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p. 1967.Google ScholarGoogle Scholar
  11. {MK97} McLachlan, G. J. and Krishnan, T., "The EM Algorithm and Extensions," John Wiley & Sons, Inc., 1997.Google ScholarGoogle Scholar
  12. {NetPerception} A commercial recommender system, http://www.netperceptions.comGoogle ScholarGoogle Scholar
  13. {RF97} Ruocco A. and Frieder O., "Clustering and Classification of Large Document Bases in a Parallel Environment," Journal of the American Society of Information Science, 48(10), October 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {S99} Snyder, L., "A Programmer's Guide to ZPL," Scientific and Engineering Computation Series, MIT Press; ISBN: 0262692171, 1999. See also: http://www.cs.washington.edu/research/zpl Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {ZHD00a} Zhang, B., Hsu, M. and Dayal, U. (2000). "K-Harmonic Means: A Spatial Clustering Algorithm with Boosting." In Proc. International Workshop on Temporal, Spatial and Spatio-Temporal Data Mining, TSDM2000, Lyon, France, Lecture Notes in Artificial Intelligence, 2007. Roddick, J. F. and Hornsby, K., Eds., Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {Z00b} Zhang, B. "Generalized K-Harmonic Means - Boosting in Unsupervised Learning", Hewllet-Packard Laboratories Technical Report: http://www.hpl.hp.com/techreports/2000/HPL- 2000-137.html.Google ScholarGoogle Scholar
  17. {ZHF00} Zhang, B., Hsu, M., and Forman, G. "Accurate Recasting of Parameter Estimation Algorithms using Sufficient Statistics for Efficient Parallel Speed-up: Demonstrated for Center-Based Data Clustering Algorithms," 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 13-16, 2000. Also available as Hewlett-Packard Labs Technical Report HPL-2000-6.Google ScholarGoogle Scholar
  18. {ZRL96} Zhang, T., Ramakrishnan, R., and Livny, M., "BIRCH: an efficient data clustering method for very large databases," ACM SIGMOD Record, Vol. 25, No. 2, pages 103-114, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed data clustering can be efficient and exact

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader