In statistical settings such as regression and time series, we can condition on observed information when predicting the data of interest. For example, a regression model explains the dependent variables $y_1, \ldots, y_n$ in terms of the independent variables $x_1, \ldots, x_n$. When we ask such a model to predict the value of $y_{n+1}$ corresponding to some given value of $x_{n+1}$, that prediction's accuracy will vary with $x_{n+1}$. Existing methods for model selection do not take this variability into account, which often causes them to select inferior models. One widely used method for model selection is AIC (Akaike's Information Criterion \cite{Akaike}), which is based on estimates of the KL divergence from the true distribution to each model. We propose an adaptation of AIC that takes the observed information into account when estimating the KL divergence, thereby getting rid of a bias in AIC's estimate.

information theory, machine learning, minimum description length
Life Sciences (theme 5), Logistics (theme 3)
Safe Statistics
Workshop on Information Theoretic Methods in Science and Engineering
Algorithms and Complexity

van Ommen, M. (2012). Adapting AIC to conditional model selection. In Proceedings of the Fifth Workshop on Information Theoretic Methods in Science and Engineering. CWI.