We present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi entropy by minimum spanning trees (MSTs). The length of the MSTs can be used to order pairs of variables from strongly to weakly dependent, making it a useful tool for sensitivity analysis with dependent input variables. It is well-suited for cases where the input distribution is unknown and only a sample of the inputs is available. We introduce an estimator to quantify dependency based on the MST length, and investigate its properties with several numerical examples. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. We apply our proposed method to an artificial testcase based on the Ishigami function, as well as to a real-world testcase involving sediment transport in the North Sea. The results are consistent with prior knowledge and heuristic understanding, as well as with variance-based analysis using Sobol indices in the case where these indices can be computed.

Additional Metadata
Keywords Rényi entropy, Dependent data, Sensitivity analysis, Large datasets, Minimum spanning trees
Project Excellence in Uncertainty Reduction of Offshore Wind Systems (uitgewerkt programmavoorstel)
Grant This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/14186 - Excellence in Uncertainty Reduction of Offshore Wind Systems
Citation
Eggels, A.W, & Crommelin, D.T. (2018). Quantifying dependencies for sensitivity analysis with multivariate input sample data.