How does a neural network approximate a given function? What kinds of functions can it approximate well/poorly? How does the optimization algorithm bias learning? Deep Learning has revolutionized many fields, and yet answers to fundamental questions like these remain elusive. This lack of understanding makes it difficult to develop applications in fields like physics where substantial prior knowledge exists.
Here we present a new emerging viewpoint -- the functional or spline perspective -- that has the power to answer these questions. We will review the latest results from this work and talk about our most recent advances. We find that understanding NNs is most easily done in the function space, not in the parameter space. This change of coordinates sheds light on many perplexing phenomena including the overparametrization, “unreasonable” generalization, the loss surface and the consequent difficulty of training, and the implicit regularization of gradient descent. We present several novel results concerning how (deep) ReLu nets approximate functions, the number of bad local minima in the loss surface, and the fundamental dynamical laws governing breakpoints and slopes during gradient descent.
The impact on physical applications will be made by showing how the breakpoint densities of randomly initialized NNs poorly model boundary conditions in physical applications such as approximating the energy function of a protein molecule (joint work with Cecilia Clementi, Rice BioPhysics). Another application will show how to impose global group symmetries on the learned function (joint work with Fabio Anselmi, IIT).