The recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) has been shown to find much smaller solutions of equally high quality compared to other state-of-the-art GP approaches. This is an interesting aspect as small solutions better enable human interpretation. In this paper, an adaptation of GP-GOMEA to tackle real-world symbolic regression is proposed, in order to find small yet accurate mathematical expressions, and with an application to a problem of clinical interest. For radiotherapy dose reconstruction, a model is sought that captures anatomical patient similarity. This problem is particularly interesting because while features are patient-specific, the variable to regress is a distance, and is defined over patient pairs. We show that on benchmark problems as well as on the application, GP-GOMEA outperforms variants of standard GP. To find even more accurate models, we further consider an evolutionary meta learning approach, where GP-GOMEA is used to construct small, yet effective features for a different machine learning algorithm. Experimental results show how this approach significantly improves the performance of linear regression, support vector machines, and random forest, while providing meaningful and interpretable features.

Additional Metadata
Keywords Dose reconstruction, Feature construction, Genetic programming, GOMEA, Machine learning, Radiotherapy
Persistent URL dx.doi.org/10.1145/3205455.3205604
Conference Genetic and Evolutionary Computation Conference
Citation
Virgolin, M, Alderliesten, T, Bel, A, Witteveen, C, & Bosman, P.A.N. (2018). Symbolic regression and feature construction with GP-GOMEA applied to radiotherapy dose reconstruction of childhood cancer survivors. In GECCO 2018 - Proceedings of the 2018 Genetic and Evolutionary Computation Conference (pp. 1395–1402). doi:10.1145/3205455.3205604