Su, Boyd and Candés showed that Nesterov's method for accelerated gradient descent can be obtained as the discretization of an Ordinary Differential Equation (ODE). By discretizing a modified ODE we show: (i) with a constant learning rate, we obtain Nesterov's method, (ii) with a decreasing learning rate, we obtain an accelerated stochastic gradient descent algorithm. We prove, using a Lyapunov function approach, that for both the convex and strongly convex cases, the algorithms converge at the optimal rate for the last iterate of SGD, with rate constants which are better than previously available.
Back to Workshop II: PDE and Inverse Problem Methods in Machine Learning