Performance criteria for graph clustering and Markov cluster experiments
In~ a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL~algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in~ establish an intrinsic relationship between the corresponding algebraic process (MCL~process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL~algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL~algorithm and the heuristic on which the algorithm is grounded. A generic performance criterion for clusterings of weighted graphs is derived, by a stepwise refinement of a simple and appealing criterion for simple graphs. The most refined criterion uses a particular Schur convex function, several properties of which are established. A metric is defined on the space of partitions, which is useful for comparing different clusterings of the same graph. The metric is compared with the metric known as the equivalence mismatch coefficient. The performance criterion and the metric are used for the quantitative measurement of experiments conducted with the MCL~algorithm on randomly generated test graphs with 10000 nodes. Scaling the MCL~algorithm requires a regime of pruning the stochastic matrices which need to be computed. The effect of pruning on the quality of the retrieved clusterings is also investigated.  A cluster algorithm for graphs. Technical report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000.  A stochastic uncoupling process for graphs. Technical report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000.
|MSC||Partitions of sets (msc 05A18), Matrices (incidence, Hadamard, etc.) (msc 05B20), Positive matrices and their generalizations; cones of matrices (msc 15B48), Stochastic matrices (msc 15B51), Classification and discrimination; cluster analysis (msc 62H30), Graph theory (including graph drawing) (msc 68R10), Pattern recognition, speech recognition (msc 68T10), Programming involving graphs or networks (msc 90C35)|
|Series||Information Systems [INS]|
van Dongen, S. (2000). Performance criteria for graph clustering and Markov cluster experiments. Information Systems [INS]. CWI.