2000-01-01

# A cluster algorithm for graphs

## Publication

### Publication

A cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The MCL~algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with a Markov graph $G_1$. Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs $G_{(i)$. Flow expansion corresponds with taking the~$k^{th$~power of a stochastic matrix, where~$kinN$. Flow contraction corresponds with a parametrized operator~$Gamma_r$, $rgeq 0$, which maps the set of (column) stochastic matrices onto itself. The image~$Gamma_r M$ is obtained by raising each entry in~$M$ to the~$r^{th$~power and rescaling each column to have sum~$1$ again. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The invariant limits of the process are easily derived and in practice the process converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph~$G$. Overlap is limited to cases where the input graph has a symmetric structure inducing it. The contraction and expansion parameters of the MCL~process influence the granularity of the output. The algorithm is space and time efficient and lends itself to drastic scaling. This report describes the MCL~algorithm and process, convergence towards equilibrium states, interpretation of the states as clusterings, and implementation and scalability. The algorithm is introduced by first considering several related proposals towards graph clustering, of both combinatorial and probabilistic nature. Revised version of the report~[1]. A more mathematically oriented account on the MCL~process is given in~[2], establishing that under certain weak conditions the iterands of the MCL~process posses structure admitting a cluster interpretation. Various experiments conducted on a wide range of test-graphs are described in~[3]. The latter report also describes a generic graph clustering performance measure and a distance defined on the space of partitions. The work was carried out under project INS-3.2, Concept Building from Key-Phrases in Scientific Documents and Bottom Up Classification Methods in Mathematics. [1] A new cluster algorithm for graphs. Technical report INS-R9814, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 1998. [2] A stochastic uncoupling process for graphs. Technical report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000. [3] Performance criteria for graph clustering and Markov cluster experiments. Technical report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000.

Additional Metadata | |
---|---|

MSC | Matrices (incidence, Hadamard, etc.) (msc 05B20), Positive matrices and their generalizations; cones of matrices (msc 15B48), Classification and discrimination; cluster analysis (msc 62H30), Graph theory (including graph drawing) (msc 68R10), Pattern recognition, speech recognition (msc 68T10), Programming involving graphs or networks (msc 90C35) |

Publisher | CWI |

Series | Information Systems [INS] |

Citation |
van Dongen, S. (2000).
A cluster algorithm for graphs. Information Systems [INS]. CWI. |