We study the switch distribution, introduced by Van Erven et al. (2012), applied to model selection and subsequent estimation. While switching was known to be strongly consistent, here we show that it achieves minimax optimal parametric risk rates up to a log log n factor when comparing two nested exponential families, partially confirming a conjecture by Lauritzen (2012) and Cavanaugh (2012) that switching behaves asymptotically like the Hannan-Quinn criterion. Moreover, like Bayes factor model selection but unlike standard significance testing, when one of the models represents a simple hypothesis, the switch criterion defines a robust null hypothesis test, meaning that its Type-I error probability can be bounded irrespective of the stopping rule. Hence, switching is consistent, insensitive to optional stopping and almost minimax risk optimal, showing that, Yang’s (2005) impossibility result notwithstanding, it is possible to ‘almost’ combine the strengths of AIC and Bayes factor model selection.

model selection, post model selection estimation, switch distribution, AIC-BIC dilemma, worst-case risk, optional stopping, consistency, exponential family
Machine Learning

van der Pas, S.L, & Grünwald, P.D. (2016). Almost the best of three worlds: Risk, consistency and optional stopping for the switch criterion in nested model selection.