A new cluster algorithm for graphs called the emph{Markov Cluster algorithm ($MCL$ algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The $MCL$ algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with a Markov graph $G_1$. Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs $G_{(i)$. The expansion step is done by computing higher step transition probabilities ($TP$'s), the contraction step creates a new Markov graph by favouring high $TP$'s and demoting low $TP$'s in a specific way. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The stable limits of the process are easily derived and in practice the algorithm converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph~$G$. Overlap is limited to cases where the input graph has a symmetric structure inducing it. The contraction and expansion parameters of the algorithm influence the granularity of the output. The algorithm is space and time efficient with a space$+$quality/time trade--off, works very well for a wide range of test cases, and lends itself to drastic scaling. Experiments with a scaled $C$--implementation have been conducted on graphs having several tens of thousands of nodes. This report describes the algorithm, its complexity, and experimental results. The algorithm is introduced by first considering a generalization of generic single link clustering for graphs called $k$--path clustering.

, , , , ,
Information Systems [INS]

van Dongen, S. (1998). A new cluster algorithm for graphs. Information Systems [INS]. CWI.