Community capacity is used to monitor socio-economic development. It is composed of a number of dimensions, which can be measured to understand the possible issues in the implementation of a policy or the outcome of a project targeting a community. Measuring community capacity dimensions is usually expensive and time consuming, requiring locally organised surveys. Therefore, we investigate a technique to estimate them by applying the Random Forests algorithm on secondary open government data. Our research focuses on the prediction of measures for two dimensions: sense of community and participation. The most important variables for this prediction were determined. The variables included in the datasets used to train the predictive models complied with two criteria: nationwide vailability; sufficiently fine-grained geographic breakdown, i.e. neighbourhood level. The models explained 76.6% of the sense of community measures and 62.5% of participation. Due to the low geographic detail of the outcome measures available, further research is required to apply the predictive models built to a neighbourhood level. The most important variables were only partially in agreement with the factors influencing sense of community and participation the most, according to the social science literature consulted.
UvA
L. Hardman (Lynda)
Human-Centered Data Analytics

Piscopo, A. (2014, August). Predicting sense of community and participation by applying machine learning to open government data. UvA.