Granger causality quantifies the extent to which the past activity of one time series is predictive of another time series. Studying networks of such interactions has become increasingly popular. However, classical methods for estimating Granger causality assume linear time series dynamics. Clearly, the evolution of many real world time series is nonlinear and using linear models may lead to inconsistent estimation of Granger causal interactions. We instead present a framework for interpretable nonlinear Granger causality discovery using regularized neural networks. We construct a set of disentangled architectures--both feed-forward and recurrent--combined with structured sparsity-inducing penalties--placed either on the weights of the encoding or decoding stage--that enable us to extract our desired Granger causality statements. By deploying recurrent neural networks like LSTMs and echo-state networks, we can efficiently capture long-range dependencies between series and perform lag selection, both of which have presented serious challenges to traditional approaches.