OPTICS: ordering points to identify the clustering structure

Authors:
Mihael Ankerst

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
View Profile

,
Markus M. Breunig

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
View Profile

,
Hans-Peter Kriegel

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
View Profile

,
Jörg Sander

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany

Institute for Computer Science, University of Munich, Oettingenstr, 67, D-80538 Munich, Germany
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 28 Issue 2June 1999pp 49–60https://doi.org/10.1145/304181.304187

Published:01 June 1999Publication History

ACM SIGMOD Record

Abstract

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only 'traditional' clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

References

AGG+ 98 Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", Proc. ACM SIGMOD'98 Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 94- 105. Google ScholarDigital Library
AKK 96 Ankerst M., Keim D. A., Kriegel H.-P.: "'Circle Segments': A Technique for Visually Exploring Large Multidimensional Data Sets", Proc. Visualization'96, Hot Topic Session, San Francisco, CA, 1996.Google Scholar
BKK 96 Berchthold S., Keim D., Kriegel H.-P.: "The X-Tree: An Index Structure for High-Dimensional Data", 22nd Conf. on Very Large Data Bases, Bombay, India, 1996, pp. 28-39. Google ScholarDigital Library
BKSS 90 Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: "The R*-tree: An ti',fficient and Robust Access Method for Points and Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, ACM Press, New York, 1990, pp. 322-331. Google ScholarDigital Library
CPZ 97 Ciaccia P., Patella M., Zezula P.: "M-tree: An Efficient Access Method for Similarity Search in Metric Spaces ", Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, 1997, pp. 426-435. Google ScholarDigital Library
EKSX 96 Ester M., Kriegel H.-P., Sander J., Xu X.: "A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.Google Scholar
EKS+ 98 Ester M., Kriegel H.-P., Sander J., Wimmer M., Xu X.: "Incremental Clustering for Mining in a Data Warehousing Environment", Proc. 24th Int. Conf. on Very I.,arge Data Bases, New York, NY, 1998, pp. 323-333. Google ScholarDigital Library
EKX 95 Ester M., Kriegel H.-P., Xu X.: "Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification", Proc. 4th Int. Symp. on Laxge Spatial Databases, Portland, ME, 1995, in: Lecture Notes in Computer Science, Vol. 951, Springer, 1995, pp. 6'7-82. Google ScholarDigital Library
GM 85 Grossman A., Morlet J.: "Decomposition oj'functions into wavelets of constant shapes and related tr~msforms". Mathematics and Physics: Lectures on Recent Restdts, World Scientific, Singapore, 1985.Google Scholar
GRS 98 Guha S., Rastogi R., Shim K.: "CURE: A~ Efficient Clustering Algorithms for Large Databases", P-oc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998, pp. 73-84. Google ScholarDigital Library
HK 98 Hinneburg A., Keim D.: "An Efficient Approa~:h to Clustering in Large Multimedia Databases with Noise"~, Proc. 4th Int. Conf. on Knowledge Discovery & Data Milling, New York City, NY, 1998.Google Scholar
HT 93 Hattori K., Torii Y.: "Effective algorithms for Jhe nearest neighbor method in the clustering problem", Patt~.,rn Recognition, 1993, Vol. 26, No. 5, pp. 741-746.Google ScholarCross Ref
Hua 97 Huang Z.: "A Fast Clustering Algorithm to C,'uster Very Large Categorical Data Sets in Data Mining", 1)roc. SIG- OD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tech. Report 97-07, UBC, Dept. of CS, 1997.Google Scholar
JD 88 Jain A. K., Dubes R. C.: "Algorithms for Clustering Data," Prentice-Hall, Inc., 1988. Google ScholarDigital Library
Kei 96a Keim D. A.: "Pixel-oriented Database Visualizations", in: SIGMOD RECORD, Special Issue on Inform~tion Visualization, Dezember 1996. Google ScholarDigital Library
Kei 96b Keim D. A.: "Databases and Visualization", t'roc. Tutorial ACM SIGMOD Int. Conf. on Managemen: of Data, Montreal, Canada, 1996, p. 543. Google ScholarDigital Library
KN 96 Knorr E. M., Ng R.T.: "Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining," IEEE Trans. on Knowledge and Data Engineeri}lg, Vol. 8, No. 6, December 1996, pp. 884-897. Google ScholarDigital Library
KR 90 Kaufman L., Rousseeuw E J.: "Finding GrouFs in Data: An Introduction to Cluster Analysis", John Wiley & Sons, 1990.Google Scholar
Mac 67 MacQueen, J.: "Some Methods for Classification and Analysis of Multivariate Observations", 5th Berkeley Synap. Math. Statist. Prob., Vol. 1, pp. 281-297.Google Scholar
NH 94 Ng R. T., Han J.: "Efficient and Effective Clustering Methods for Spatial Data Mining", Proc. 20th Int. Conf. on Very Large Data Bases, Santiago, Chile, Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 144-155. Google ScholarDigital Library
PTVF 92 Press W. H.,Teukolsky S. A., Vetterling W. T., Flannery B. E: "Numerical Recipes in C", 2nd ed., Cambridl,ye University Press, 1992.Google Scholar
Ric 83 Richards A. J.: "Remote Sensing Digital hnag~: Analysis. An Introduction", 1983, Berlin, Springer Verlag. Google ScholarDigital Library
Sch 96 Schikuta E.: "'Grid clustering: An efficient hierarchical clustering method for very large data sets". Proc. 13th Int. Conf. on Pattern Recognition, Vol 2, 1996, pp. 101-105. Google ScholarDigital Library
SE 97 Schikuta E., Erhart M.: "The bang-clustering system: Grid-based data analysis". Proc. Sec. Int. Symp. IDA-97, Vol. 1280 LNCS, London, UK, Springer-Verlag, 1 c,'97. Google ScholarDigital Library
SCZ 98 Sheikholeslami G., Chatterjee S., Zhang A.: "'lCaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases", Proc. 24th Int. Conf. on Very I,arge Data Bases, New York, NY, 1998, pp. 428 - 439. Google ScholarDigital Library
Sib 73 Sibson R.: "SLINK: an optimally efficient alggrithm for the single-link cluster method".The Comp. Journal, Vol. }'~ 6, No. 1, 1973, pp. 30-34.Google Scholar
ZRL 96 Zhang T., Ramakrishnan R., Linvy M.: "BIRCH: An Efficient Data Clustering Method for Very Large Dtttabase,s". Proc. ACM SIGMOD Int. Conf. on Management of Data, ACM Press, New York, 1996, pp. 103- i 14. Google ScholarDigital Library

Index Terms

OPTICS: ordering points to identify the clustering structure
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering
2. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Visual languages

Recommendations

OPTICS: ordering points to identify the clustering structure
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms ...
Read More
VizOPTICS: Getting insights into OPTICS via interactive visual analysis
Abstract
Ordering points to identify the clustering structure (OPTICS) is a density-based clustering algorithm that allows the exploration of the cluster structure in the dataset by outputting an ordered queue called cluster ordering. However, ...
Highlights
- The challenges for visual cluster analysis are formulated by a pilot user study.
Read More
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 28, Issue 2
June 1999
599 pages
ISSN:0163-5808
DOI:10.1145/304181
Chairmen:
Susan Davidson
Univ. of Pennsylvania
,
Christos Faloutsos
Carnegie Mellon Univ.
Issue’s Table of Contents
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
June 1999
604 pages
ISBN:1581130848
DOI:10.1145/304182
Chairmen:
Susan B. Davidson
Univ. of Pennsylvania, Philidelphia
,
Christos Faloutsos
Carnegie Mellon Univ., Pittsburgh
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1999
Check for updates
Author Tags
cluster analysis
database mining
visualization
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2,911
  Total Citations
  View Citations
- 13,967
  Total Downloads
- Downloads (Last 12 months)2,882
- Downloads (Last 6 weeks)399
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

OPTICS: ordering points to identify the clustering structure

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

OPTICS: ordering points to identify the clustering structure

VizOPTICS: Getting insights into OPTICS via interactive visual analysis

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology