The techniques currently used for training neural networks are often very effective at avoiding overfitting and find solutions that generalize well, even when applied to very complex architectures in an overparametrized regime. This phenomenon is currently poorly understood. Building on a framework that we have been developing in recent years, based on a large-deviation statistical physics analysis, we have studied analytically, numerically and algorithmically the structural properties of simplified models in relation to the existance and accessibility of so-called "wide flat minima" of the loss function. We have investigated the effect of the ReLU transfer function and of the cross-entropy loss function, contrasted these devices with others that don't exhibit the same phenomena, and developed message-passing and greedy local-search algorithms that exploit the analytical findings.