Finite-Sum Smooth Optimization with SARAH
The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function F(w)=1n∑ni=1fi(w) has been proven to be at least Ω(n−−√/ϵ) for n≤O(ϵ−2) where ϵ denotes the attained accuracy E[∥∇F(w~)∥2]≤ϵ for the outputted approximation w~ (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when n≤O(ϵ−2) for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.
|IBM Research, Thomas J. Watson Research Center, USA|
Nguyen, L. M, van Dijk, M.E, Phan, D.T, Nguyen, P.H, Weng, T.-W, & Kalagnanam, J. R. (2019). Finite-Sum Smooth Optimization with SARAH.