skip to main content
Performance criteria for graph clustering and Markov cluster experimentsMay 2000
2000 Technical Report
Publisher:
  • CWI (Centre for Mathematics and Computer Science)
  • P. O. Box 94079 NL-1090 GB Amsterdam
  • Netherlands
Published:31 May 2000
Bibliometrics
Skip Abstract Section
Abstract

In~[1] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL~algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in~[2] establish an intrinsic relationship between the corresponding algebraic process (MCL~process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL~algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL~algorithm and the heuristic on which the algorithm is grounded. A generic performance criterion for clusterings of weighted graphs is derived, by a stepwise refinement of a simple and appealing criterion for simple graphs. The most refined criterion uses a particular Schur convex function, several properties of which are established. A metric is defined on the space of partitions, which is useful for comparing different clusterings of the same graph. The metric is compared with the metric known as the equivalence mismatch coefficient. The performance criterion and the metric are used for the quantitative measurement of experiments conducted with the MCL~algorithm on randomly generated test graphs with 10000 nodes. Scaling the MCL~algorithm requires a regime of pruning the stochastic matrices which need to be computed. The effect of pruning on the quality of the retrieved clusterings is also investigated. [1] A cluster algorithm for graphs. Technical report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000. [2] A stochastic uncoupling process for graphs. Technical report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000.

Cited By

  1. Hao Y, Li L, Chang L and Gu T (2024). MLDA: a multi-level k-degree anonymity scheme on directed social network graphs, Frontiers of Computer Science: Selected Publications from Chinese Universities, 18:2, Online publication date: 1-Apr-2024.
  2. ACM
    Lin H, Liu H, Wu J, Li H and Günnemann S (2023). Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering, ACM Transactions on Mathematical Software, 49:4, (1-27), Online publication date: 31-Dec-2024.
  3. ACM
    Zhang G, Attaluri N, Emer J and Sanchez D Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (687-701)
  4. Douven I (2017). Clustering colors, Cognitive Systems Research, 45:C, (70-81), Online publication date: 1-Oct-2017.
  5. O'Hagan A, Murphy T, Gormley I, McNicholas P and Karlis D (2016). Clustering with the multivariate normal inverse Gaussian distribution, Computational Statistics & Data Analysis, 93:C, (18-30), Online publication date: 1-Jan-2016.
  6. ACM
    Pujari S, Hadgu A, Lex E and Jäschke R Social activity versus academic activity Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business, (1-8)
  7. Srubar S Speed Comparison of Segmentation Evaluation Methods Proceedings of the 16th International Workshop on Combinatorial Image Analysis - Volume 8466, (113-122)
  8. Ben-Gal I, Shavitt Y, Weinsberg E and Weinsberg U (2014). Peer-to-peer information retrieval using shared-content clustering, Knowledge and Information Systems, 39:2, (383-408), Online publication date: 1-May-2014.
  9. Kettleborough G and Rayward-Smith V (2013). Optimising sum-of-squares measures for clustering multisets defined over a metric space, Discrete Applied Mathematics, 161:16-17, (2499-2513), Online publication date: 1-Nov-2013.
  10. ACM
    Zheng H and Wu J Spectral graph multisection through orthogonality Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering, (1-6)
  11. Celebi M, Kingravi H and Vela P (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications: An International Journal, 40:1, (200-210), Online publication date: 1-Jan-2013.
  12. Anchuri P and Magdon-Ismail M Communities and Balance in Signed Networks Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), (235-242)
  13. ACM
    Zhang Y and Li T (2012). DClusterE, ACM Transactions on Intelligent Systems and Technology, 3:2, (1-24), Online publication date: 1-Feb-2012.
  14. Rocklin M and Pinar A Latent clustering on graphs with multiple edge types Proceedings of the 8th international conference on Algorithms and models for the web graph, (38-49)
  15. Klapaftis I and Manandhar S Word sense induction & disambiguation using hierarchical random graphs Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, (745-755)
  16. Jiang X and Abdala D Exploring the performance limit of cluster ensemble techniques Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition, (405-414)
  17. Reichart R, Abend O and Rappoport A Type level clustering evaluation Proceedings of the Fourteenth Conference on Computational Natural Language Learning, (77-87)
  18. Coen M, Ansari M and Fillmore N Comparing clusterings in space Proceedings of the 27th International Conference on International Conference on Machine Learning, (231-238)
  19. ACM
    Noh S and Kim J A high availability clustering and load balacing mechanism for information security infrastructure system Proceedings of the 2009 International Conference on Hybrid Information Technology, (502-507)
  20. ACM
    Wu J, Xiong H and Chen J Adapting the right measures for K-means clustering Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, (877-886)
  21. Reichart R and Rappoport A The NVI clustering evaluation measure Proceedings of the Thirteenth Conference on Computational Natural Language Learning, (165-173)
  22. Wu J, Chen J, Xiong H and Xie M (2009). External validation measures for K-means clustering, Expert Systems with Applications: An International Journal, 36:3, (6050-6061), Online publication date: 1-Apr-2009.
  23. Klapaftis I and Manandhar S Word Sense Induction Using Graphs of Collocations Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, (298-302)
  24. Meilă M (2007). Comparing clusterings---an information based distance, Journal of Multivariate Analysis, 98:5, (873-895), Online publication date: 1-May-2007.
  25. Li Y, Lao L and Cui J SDC Proceedings of the 5th international IFIP-TC6 conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems, (1234-1239)
  26. Jiang X, Marti C, Irniger C and Bunke H (2006). Distance measures for image segmentation evaluation, EURASIP Journal on Advances in Signal Processing, 2006, (209-209), Online publication date: 1-Jan-2006.
  27. Ramaswamy L, Gedik B and Liu L (2005). A Distributed Approach to Node Clustering in Decentralized Peer-to-Peer Networks, IEEE Transactions on Parallel and Distributed Systems, 16:9, (814-829), Online publication date: 1-Sep-2005.
  28. ACM
    Zhou D, Li J and Zha H A new Mallows distance based metric for comparing clusterings Proceedings of the 22nd international conference on Machine learning, (1028-1035)
  29. ACM
    Meilǎ M Comparing clusterings Proceedings of the 22nd international conference on Machine learning, (577-584)
Contributors

Recommendations