Performance criteria for graph clustering and Markov cluster experiments

Performance criteria for graph clustering and Markov cluster experimentsMay 2000

May 2000

2000 Technical Report

Author:
Stijn Dongen

Publisher:

CWI (Centre for Mathematics and Computer Science)
P. O. Box 94079 NL-1090 GB Amsterdam
Netherlands

Published:31 May 2000

Bibliometrics

Abstract

In~[1] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL~algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in~[2] establish an intrinsic relationship between the corresponding algebraic process (MCL~process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL~algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL~algorithm and the heuristic on which the algorithm is grounded. A generic performance criterion for clusterings of weighted graphs is derived, by a stepwise refinement of a simple and appealing criterion for simple graphs. The most refined criterion uses a particular Schur convex function, several properties of which are established. A metric is defined on the space of partitions, which is useful for comparing different clusterings of the same graph. The metric is compared with the metric known as the equivalence mismatch coefficient. The performance criterion and the metric are used for the quantitative measurement of experiments conducted with the MCL~algorithm on randomly generated test graphs with 10000 nodes. Scaling the MCL~algorithm requires a regime of pruning the stochastic matrices which need to be computed. The effect of pruning on the quality of the retrieved clusterings is also investigated. [1] A cluster algorithm for graphs. Technical report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000. [2] A stochastic uncoupling process for graphs. Technical report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000.

Cited By

Contributors

Stijn Dongen
- Publication Years1998 - 2000
- Publication counts4
- Citation count71
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article18
View Full Profile

Recommendations

Graph clustering based on structural/attribute similarities

The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected ...
Read More
On cluster tree for nested and multi-density data clustering

Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
Read More
Cluster graph modification problems

In a clustering problem one has to partition a set of elements into homogeneous and well-separated subsets. From a graph theoretic point of view, a cluster graph is a vertex-disjoint union of cliques. The clustering problem is the task of making the ...
Read More

Comments

Browse Reports

Sections

Cited By

Graph clustering based on structural/attribute similarities

On cluster tree for nested and multi-density data clustering

Cluster graph modification problems

Save to Binder

Sections

Cited By

Save to Binder

Recommendations

Graph clustering based on structural/attribute similarities

On cluster tree for nested and multi-density data clustering

Cluster graph modification problems