Abstract
We argue that there are many clustering algorithms, because the notion of "cluster" cannot be precisely defined. Clustering is in the eye of the beholder, and as such, researchers have proposed many induction principles and models whose corresponding optimization problem can only be approximately solved by an even larger number of algorithms. Therefore, comparing clustering algorithms, must take into account a careful understanding of the inductive principles involved.
- C. Aggarwal. A human-computer cooperative system for effective high dimensional clustering. In Proceedings of the KDD Conference, pages 221-226, San Francisco, CA, August 26-29 2001. ACM-SIGKDD, ACM Press.]] Google ScholarDigital Library
- R. Agrawal, R. Bayardo, and R. Srikant. Athena: Mining-based interactive management of text databases. In C. Zaniolo, P. Lockemann, M. Scholl, and T. Grust, editors, Extending Database Technology EDBT, 7th International Conference, volume 1777 of Lecture Notes in Computer Science, Konstanz, Germany, March 27-31, 2000. Springer.]] Google ScholarDigital Library
- M. Aldenderfer and R. Blashfield. Cluster Analysis. Sage Publications, Beverly Hills, USA, 1984.]]Google ScholarCross Ref
- C. Bajaj. Proving geometric algorithm non-solvability: An application of factoring polynomials. Journal of Symbolic Computation, 2:99-102, 1986.]] Google ScholarDigital Library
- J. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.]] Google ScholarDigital Library
- J. Bezdek and N. Pal. Some new indexes of cluster validity. IEEE Transactions on System, Man and Cybernetics, Part B, 28:301-315, 1998.]]Google ScholarDigital Library
- R. Bonner. On some clustering techniques. IBM Journal of Research and Development, 8:22-32, 1964.]]Google ScholarDigital Library
- P. Brucker. On the complexity of clustering problems. In R. Henn, B. Korte, and W. Oetti, editors, Optimization and Operations Research: Proceedings of the workshop held at the University of Bonn, pages 45-54, Berlin, 1978. Springer Verlag Lecture Notes in Economics and Mathematical Systems 157.]]Google ScholarCross Ref
- A. Dempster, N. Laird, and D. Rubin. Maximum likehood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-38, 1977.]]Google Scholar
- B. Dom. An information-theoretic external cluster-validity measure. IBM Research Report RJ 10219, IBM's Almaden Research Center, San Jose, CA, October 5th 2001.]]Google Scholar
- R. Dubes. Cluster analysis and reated issues. In C. Chen, L. Pau, and P. Wnag, editors, Handbook of Pattern Recognition and Computer Vision, pages 3-32, River Edge, NJ, 1993. World Scientific Publiching Co. Chapter 1.1.]] Google ScholarDigital Library
- R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, NY, USA, 1973.]]Google Scholar
- J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95-104, 1974.]]Google ScholarCross Ref
- U. Elsner. Graph partitioning: A survey. Technical Report 97-27, Technische Universit" at Chemnitz, December 1997.]]Google Scholar
- M. Ester, H. Kriegel, S. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pages 226-231, Menlo Park, CA, 1996. AAAI, AAAI Press.]]Google Scholar
- V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In R. Mizoguchi and J. Slaney, editors, Proceedings Sixth Pacific Rim International Conference on Artificial Intelligence PRICAI 2000, pages 424-434, Melbourne, Australia, 2000. Springer-Verlag Lecture Notes in Artificial Intelligence 1886.]]Google Scholar
- V. Estivill-Castro and M. Houle. Data structures for minimization of total within-group distance for spatio-temporal clustering. In L. De Raedt and A. Siebes, editors, 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'01), pages 91-102, Freiburg, Germany, September 3-7 2001. Springer Verlag Lecture Notes in Artificial Intelligence 2168.]] Google ScholarDigital Library
- V. Estivill-Castro and M. Houle. Fast minimization of total within-group distance. In J. Fong and M. Ng, editors, Proceedings of the International Workshop on Mining Spatial and Temporal Data in conjunction with the fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining PAKDD-2001, pages 72-81, Hong Kong, April 15-18 2001. City University of Hong Kong.]]Google Scholar
- V. Estivill-Castro and M. Houle. Robust distance-based clustering with applications to spatial data mining. Algorithmica, 30(2):216-242, June 2001.]]Google ScholarDigital Library
- V. Estivill-Castro and A. Murray. Discovering associations in spatial data - an efficient medoid based approach. In X. Wu, R. Kotagiri, and K. Korb, editors, Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), pages 110-121, Melbourne, Australia, 1998. Springer-Verlag Lecture Notes in Artificial Intelligence 1394.]] Google ScholarDigital Library
- V. Estivill-Castro and J. Yang. A fast and robust general purpose clustering algorithm. In R. Mizoguchi and J. Slaney, editors, Proceedings Sixth Pacific Rim International Conference on Artificial Intelligence PRICAI 2000, pages 208-218, Melbourne, Australia, 2000. Springer-Verlag Lecture Notes in Artificial Intelligence 1886.]]Google Scholar
- V. Estivill-Castro and J. Yang. Non-crisp clustering web visitors by fast, convergent and robust algorithms on access logs. In L. De Raedt and A. Siebes, editors, 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'01), pages 103-114, Freiburg, Germany, September 3-7 2001. Springer Verlag Lecture Notes in Artificial Intelligence 2168.]] Google ScholarDigital Library
- B. Everitt. Cluster Analysis. Halsted Press, New York, USA, 2nd. edition, 1980.]]Google Scholar
- U. Fayyad and R. Uthurusamy. Data mining and knowledge discovery in databases. Communications of the ACM, 39(11):24-26, Nov. 1996. Special issue on Data Mining.]] Google ScholarDigital Library
- M. Garey and D. Johnson. Computers and Intractability --- A guide to the Theory of NP-Completeness. Freeman, NY, 1979.]] Google ScholarDigital Library
- S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In Proceedings of ACM SIGMOD International Conference on Management of Data, volume 27, pages 73-84, New York, 1998. ACM, ACM Press.]] Google ScholarDigital Library
- M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. KDnuggets:News, page Numver 19 item 16, September 2001. www.db-net.aueb.gr/mhalk/papers/validity_survey.pdf.]]Google Scholar
- M. Halkidi, M. Vazirgianis, and Y. Batistakis. Quality scheme assessment in the clustering process. In H. Zighed, D. A. Komorowski and J. Zytkow, editors, Principles of Data Mining and Knowledge Discovery, 4th European Conference, PKDD, pages 265-276, Lyon, France, September, 13-16 2000. Springer Verlag Lecture Notes in Computer Science 1920.]] Google ScholarDigital Library
- I. Hall, L. O. Özyurt and J. Bezdek. Clustering with a genetically optimized approach. IEEE Transactions on Evolutionary Computation, 3(2):103-112, July 1999.]]Google ScholarDigital Library
- J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Mateo, CA, 2000.]] Google ScholarDigital Library
- A. Hinneburg and D. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. 4rd Int. Conf. on Knowledge Discovery and Data Mining, pages 58-65, New York, August 1998. AAAI Press.]]Google Scholar
- A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1988. Advanced Reference Series: Computer Science.]] Google ScholarDigital Library
- M. Jain, A. K. nd Murty and P. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264-320, September 1999.]] Google ScholarDigital Library
- J. Kalbfleisch. Probability and Statistical Inference --- Volume 2: Statistical Inference. Springer-Verlag, NY, US., second edition, 1985.]]Google Scholar
- G. Karypis, E.-H. Han, and V. Kumar. Chameleon: Hierachical clustering using dynamic modeling. Computer, 32(8):68-75, August 1999.]] Google ScholarDigital Library
- L. Kaufman and P. Rousseuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, NY, USA, 1990.]]Google Scholar
- W. Kloesgen and J. Zytkow. Machine discovery terminology. KDnuggets Publicatiosn and References http://www.kdnuggets.com/publications/index.html. http://orgwis.gmd.de/projects/explora/terms.html.]]Google Scholar
- H. Kuhn. A note on Fermat's problem. Mathematical Programming, 4(1):98-107, 1973.]]Google ScholarCross Ref
- H. Kuhn and E. Kuenne. An efficient algorithm for the numerical solution of the generalized Weber problem in spatial economics. Journal of Regional Science, 4(2):21-33,1962.]]Google ScholarCross Ref
- J. MacQueen. Some methods for classification and analysis of multivariate observations. In L. Le Cam and J. Neyman, editors, 5th Berkley Symposium on Mathematical Statistics and Probability, pages 281-297, 1967. Volume 1.]]Google Scholar
- R. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. In J. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of the 20th Conference on Very Large Data Bases (VLDB), pages 144-155, San Francisco, CA, 1994. Santiago, Chile, Morgan Kaufmann Publishers.]] Google ScholarDigital Library
- N. Pal and J. Bezdel. On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems, 3(3):370-379, August 1995.]]Google ScholarDigital Library
- S. Ray and R. Turi. Determination of number of clusters in k-means clustering and application in colour image segmentation. In N. Pal, D, A. K., and J. Das, editors, Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT'99), pages 137-143, New Delhi, India, December 27-29 1999. Narosa Publishing House.]]Google Scholar
- R. Rezaee, B. Lelieveldt, and J. Reiber. A new cluster validity index for the fuzzy c-mean. Pattern Recognition Letters, 19:237-246, 1998.]] Google ScholarDigital Library
- J. Rissanen. Stochastic complexity. Journal of the Royal Statistical Society, Series B, 49(3):223-239, 1987.]]Google Scholar
- P. Rousseeuw and A. Leroy. Robust regression and outlier detection. John Wiley & Sons, NY, USA, 1987.]] Google ScholarDigital Library
- M. Tanner. Tools for Statistical Inference. Springer-Verlag, NY, US., 1993.]]Google Scholar
- M. Teitz and P. Bart. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16:955-961, 1968.]]Google ScholarDigital Library
- S. Theodoridis and K. Koutroumbas. Pattern Recognition. Academic Press, NY, USA, 1999.]] Google ScholarDigital Library
- D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley & sons, UK, 1985.]]Google Scholar
- C. Wallace and D. Boulton. An information measure for classification. Computer Journal, 11:185-195, 1968.]]Google ScholarCross Ref
- R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing. Academic Press, San Diego, CA, 1997.]]Google Scholar
- M. Windham. Cluster validity for the fuzzy c-means clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(4):357-363, 1982.]]Google ScholarDigital Library
- I. Witten and E. Frank. Data Mining --- Practical Machine Learning Tools and Technologies with JAVA implementations. Morgan Kaufmann Publishers, San Mateo, CA, 2000.]] Google ScholarDigital Library
- O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'98), pages 46-54, 1998.]] Google ScholarDigital Library
- T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. SIGMOD Record, 25(2):103-114, June 1996. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data.]] Google ScholarDigital Library
Recommendations
Ant clustering algorithm with K-harmonic means clustering
Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Many-objective fuzzy centroids clustering algorithm for categorical data
We propose a novel many-objective clustering algorithm for categorical data.Our method can take advantage of different cluster validity indices simultaneously.Two versions of the proposed algorithm are presented with and without cluster number.The ...
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Comments