Hierarchical clustering is common method used to determine clusters of similar data points in multi-dimensional spaces. $O(n^2)$ algorithms, where $n$ is the number of points to cluster, have long been known for this problem. This paper discusses parallel algorithms to perform hierarchical clustering using various distance metrics. I describe $O(n)$ time algorithms for clustering using the single link, average link, complete link, centroid, median, and minimum variance metrics on an $n$ node CRCW PRAM and $O(n \log n)$ algorithms for these metrics (except average link and complete link) on $\frac{n}{\log n}$ node butterfly networks or trees. Thus, optimal efficiency is achieved for a significant number of processors using these distance metrics. A general algorithm is given that can be used to perform clustering with the complete link and average link metrics on a butterfly. While this algorithm achieves optimal efficiency for the general class of metrics, it is not optimal for the specific cases of complete link and average link clustering.
Cited By
- Guha S, Rastogi R and Shim K CURE Proceedings of the 1998 ACM SIGMOD international conference on Management of data, (73-84)
- Guha S, Rastogi R and Shim K (1998). CURE, ACM SIGMOD Record, 27:2, (73-84), Online publication date: 1-Jun-1998.
- Zhang T, Ramakrishnan R and Livny M BIRCH Proceedings of the 1996 ACM SIGMOD international conference on Management of data, (103-114)
- Zhang T, Ramakrishnan R and Livny M (1996). BIRCH, ACM SIGMOD Record, 25:2, (103-114), Online publication date: 1-Jun-1996.
Recommendations
Efficient Parallel Hierarchical Clustering Algorithms
Clustering of data has numerous applications and has been studied extensively. Though most of the algorithms in the literature are sequential, many parallel algorithms have also been designed. In this paper, we present parallel algorithms with better ...
Efficient algorithms for divisive hierarchical clustering with the diameter criterion
AbstractDivisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is smallest possible. We provide two such ...
Hierarchical Clustering Algorithms for Document Datasets
Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In particular, clustering ...