Depression is a mental disorder with a high lifetime prevalence and one of the leading causes of disability worldwide. As many patients experience another depressive episode after being treated, predictive monitoring for the risk of relapse is essential for healthcare professionals to be able to follow up on patients and intervene early. However, automatically monitoring these large groups requires additional considerations going beyond predictive performance, such as data availability and interpretability. In the present paper, we study the suitability of using readily available administrative data for this prediction task. We contrast a logistic regression model containing only a small number of predictors on demographics, medication, and estimated depression severity with regularized regression and XGBoost models incorporating a large number of predictors describing individual treatment and social information. Our results demonstrate that the inclusion of more detailed input does not result in a significant improvement in performance when compared to simpler regression models. In similar data types, we therefore recommend to primarily focus on a small interpretable model.

doi.org/10.1002/qre.70139
Quality and Reliability Engineering International
creativecommons.org/licenses/by-nc-nd/4.0/

von Stackelberg, P., Goedhart, R., Huberts, L. C. E., Lokkerbol, J., & Birbil, I. (2025). Prediction of depression relapse using machine learning with administrative data: Balancing complexity and simplicity. Quality and Reliability Engineering International, 2025. doi:10.1002/qre.70139