In this short course, we establish the connection between residual neural networks and differential equations. We will use this interpretation to relate learning problems in data science to optimal control and parameter estimation problems in physics, engineering, and image processing. The course consists of two lectures.
In the first lecture, we briefly introduce some learning problems and discuss linear models. We then extend our discussion to nonlinear models, in particular, multi-layer perceptrons and residual neural networks. We demonstrate that even the training of a single-layer neural network leads to a challenging non-convex optimization problem and overview some heuristics such as Variable Projection and stochastic approximation schemes that can effectively train nonlinear models. Finally, we demonstrate challenges associated with deep networks such as their stability and computational costs of training.
In the second lecture, we show that residual neural networks can be interpreted as discretizations of a nonlinear time-dependent ordinary differential equation that depends on unknown parameters, i.e., the network weights. We show how this insight has been used, e.g., to study the stability of neural networks, design new architectures, or use established methods from optimal control methods for training ResNets. We extend this view point to convolutional ResNets, which are popular for speech, image, and video data, and connect them to partial differential equations. Finally, we discuss open questions and opportunities for mathematical advances in this area.