Three important properties of classification machinery are: (i) the system preserves the core information of the input data; (ii) the training examples convey information about unseen data; and (iii) the system is able to treat differently points from different classes. In this talk, I will show that these fundamental properties are satisfied by the architecture of deep neural networks. I will highlight a formal proof that such networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Similar points at the input of the network are likely to have a similar output. The theoretical analysis of deep networks I will present exploits tools used in the compressed sensing and dictionary learning literature, thereby making a formal connection between these important topics. The derived results allow drawing conclusions about the metric learning properties of the network and their relation to its structure, as well as providing bounds on the required size of the training set such that the training examples would represent faithfully the unseen data. I will show how these conclusions are validated by state-of-the-art trained networks. I believe that these results provide an insight why deep learning works and potentially allow designing more efficient network architectures and learning strategies.