skip to main content
article
Free Access

OPTICS: ordering points to identify the clustering structure

Authors Info & Claims
Published:01 June 1999Publication History
Skip Abstract Section

Abstract

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

References

  1. AGG+ 98 Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD'98 Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94- 105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AKK 96 Ankerst M., Keim D. A., Kriegel H.-P.: "'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets", Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996.Google ScholarGoogle Scholar
  3. BKK 96 Berchthold S., Keim D., Kriegel H.-P.: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BKSS 90 Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An ti',fficient and Robust Access Method for Points and Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, ACM Press, New York, 1990, pp. 322-331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CPZ 97 Ciaccia P., Patella M., Zezula P.: "M-tree: An Efficient Access Method for Similarity Search in Metric Spaces ", Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 426-435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. EKSX 96 Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google ScholarGoogle Scholar
  7. EKS+ 98 Ester M., Kriegel H.-P., Sander J., Wimmer M., Xu X.: "Incremental Clustering for Mining in a Data Warehousing Environment", Proc. 24th Int. Conf. on Very I.,arge Data Bases, New York, NY, 1998, pp. 323-333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. EKX 95 Ester M., Kriegel H.-P., Xu X.: "Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification", Proc. 4th Int. Symp. on Laxge Spatial Databases, Portland, ME, 1995, in: Lecture Notes in Computer Science, Vol. 951, Springer, 1995, pp. 6'7-82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. GM 85 Grossman A., Morlet J.: "Decomposition oj'functions into wavelets of constant shapes and related tr~msforms". Mathematics and Physics: Lectures on Recent Restdts, World Scientific, Singapore, 1985.Google ScholarGoogle Scholar
  10. GRS 98 Guha S., Rastogi R., Shim K.: "CURE: A~ Efficient Clustering Algorithms for Large Databases", P-oc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. HK 98 Hinneburg A., Keim D.: "An Efficient Approa~:h to Clustering in Large Multimedia Databases with Noise"~, Proc. 4th Int. Conf. on Knowledge Discovery & Data Milling, New York City, NY, 1998.Google ScholarGoogle Scholar
  12. HT 93 Hattori K., Torii Y.: "Effective algorithms for Jhe nearest neighbor method in the clustering problem", Patt~.,rn Recognition, 1993, Vol. 26, No. 5, pp. 741-746.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hua 97 Huang Z.: "A Fast Clustering Algorithm to C,'uster Very Large Categorical Data Sets in Data Mining", 1)roc. SIG- OD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997.Google ScholarGoogle Scholar
  14. JD 88 Jain A. K., Dubes R. C.: "Algorithms for Clustering Data," Prentice-Hall, Inc., 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kei 96a Keim D. A.: "Pixel-oriented Database Visualizations", in: SIGMOD RECORD, Special Issue on Inform~tion Visualization, Dezember 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kei 96b Keim D. A.: "Databases and Visualization", t'roc. Tutorial ACM SIGMOD Int. Conf. on Managemen: of Data, Montreal, Canada, 1996, p. 543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. KN 96 Knorr E. M., Ng R.T.: "Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining," IEEE Trans. on Knowledge and Data Engineeri}lg, Vol. 8, No. 6, December 1996, pp. 884-897. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. KR 90 Kaufman L., Rousseeuw E J.: "Finding GrouFs in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.Google ScholarGoogle Scholar
  19. Mac 67 MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Synap. Math. Statist. Prob., Vol. 1, pp. 281-297.Google ScholarGoogle Scholar
  20. NH 94 Ng R. T., Han J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. PTVF 92 Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. E: "Numerical Recipes in C", 2nd ed., Cambridl,ye University Press, 1992.Google ScholarGoogle Scholar
  22. Ric 83 Richards A. J.: "Remote Sensing Digital hnag~: Analysis. An Introduction", 1983, Berlin, Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sch 96 Schikuta E.: "'Grid clustering: An efficient hierarchical clustering method for very large data sets". Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SE 97 Schikuta E., Erhart M.: "The bang-clustering system: Grid-based data analysis". Proc. Sec. Int. Symp. IDA-97, Vol. 1280 LNCS, London, UK, Springer-Verlag, 1 c,'97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SCZ 98 Sheikholeslami G., Chatterjee S., Zhang A.: "'lCaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. 24th Int. Conf. on Very I,arge Data Bases, New York, NY, 1998, pp. 428 - 439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sib 73 Sibson R.: "SLINK: an optimally efficient alggrithm for the single-link cluster method".The Comp. Journal, Vol. }'~ 6, No. 1, 1973, pp. 30-34.Google ScholarGoogle Scholar
  27. ZRL 96 Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Dtttabase,s". Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103- i 14. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OPTICS: ordering points to identify the clustering structure

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMOD Record
          ACM SIGMOD Record  Volume 28, Issue 2
          June 1999
          599 pages
          ISSN:0163-5808
          DOI:10.1145/304181
          Issue’s Table of Contents
          • cover image ACM Conferences
            SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
            June 1999
            604 pages
            ISBN:1581130848
            DOI:10.1145/304182

          Copyright © 1999 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1999

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader