skip to main content
article
Free Access

BIRCH: an efficient data clustering method for very large databases

Published:01 June 1996Publication History
Skip Abstract Section

Abstract

Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH's time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.

References

  1. CKS88 Peter C, heeselnan, James Kelly, Matthew Self, et al., Auto CIass : A Bayesian Classification System, Proc. of the 5th Int'l Conf. on Machine Learning, Morgan Kaufman, 3un. 1988.Google ScholarGoogle Scholar
  2. DH73 Richard Duds, and Peter E. Hart, Pattern Classification and Scene Analysis, Wiley, 1973.Google ScholarGoogle Scholar
  3. DJ80 R. Dubes, and A.K. Jain, Clustering Methodologies in Exploratory Data Analysis Advances in C, omputers, Edited by M.C. Yovits, Vol. 19, Academic Press, New York, 1980.Google ScholarGoogle Scholar
  4. EKX95a Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu, A Database Interface .for Clustering in Large Spatial Databases, Proc. of 1st {nt'l Conf. on Knowledge Discovery and Data Mining, 1995.Google ScholarGoogle Scholar
  5. EKX95b Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu, Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification, Proc. of 4th Int'l Symposium on Large Spatial Databases, Portland, Maine, U.S.A., 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fis87 Douglas H. Fisher, Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning, 2(2), 1987 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fis95 Douglas H. Fisher, Iterative Optimization and Simplification of Hierarchical CIuaterings, Technical Report CS-95-01, Dept. of Computer Science, Vanderbilt l lniversity, Nashville, TN 37235.Google ScholarGoogle Scholar
  8. GG92 A. Gersho and P~. Gray, Vector quantization and signal compression, Boston, Ms.: Kluwer Academic Publishers, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. KR90 Leonard Kaufman, and Peter J. Rousseeuw, Finding Groups in Data - An Introduction to Cluster Analysis, Wiley Series in Probability and Mathematical Statistics, 1990.Google ScholarGoogle Scholar
  10. Leb87 Michael Lebowitz, Experiments with Incremental Concept Formation : UNIMEM, Machine Learning, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lee81 R.C.T.Lee, Clustering analysis and its applications, Advances in Information Systems Science, Edited by J .T.Toum, Vol. 8, pp. 169-292, Plenum Press, New York, 1981.Google ScholarGoogle Scholar
  12. Mur83 F. Murtagh, A Survey of _Recent Advances in Hier'archical Clustering Algorithms, The Computer Journal, 1933.Google ScholarGoogle Scholar
  13. NH94 Raymond T. Ng and Jiawei Hart, Efficient and Effective Clustering Methods for ,Spatial Data Mining, Proc. of VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ols93 (:lark F. Olson, Parallel Algorithms for Hierarchical Clustering, Technical Report, Computer Science Division, l.lniv, of California at Berkeley, Dec.,1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ZRL95 Tian Zhang, Raghu Ramakrishnan, and Miron Livl~y, BIRCH: An Efficient Data Clustering Method .for Very Large Databases, Technical Report, Computer Sciences Dept., Univ. of Wisconsin-Madison, 1995.Google ScholarGoogle Scholar

Index Terms

  1. BIRCH: an efficient data clustering method for very large databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 25, Issue 2
        June 1996
        557 pages
        ISSN:0163-5808
        DOI:10.1145/235968
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data
          June 1996
          560 pages
          ISBN:0897917944
          DOI:10.1145/233269

        Copyright © 1996 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 1996

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader