In this talk I will first describe an efficient way of learning models with "deep" architectures, composed of multiple levels of non-linear operations. I will show that this framework can be successfully applied to a variety of machine learning problems, including nonlinear dimensionality reduction, information retrieval, building good generative models of highly-structured input data, as well as learning good classification and regression models.
In the second half of the talk I will concentrate on the problem of document retrieval and show how to learn a deep model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the model performs ``Semantic Hashing'': Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. I will demonstrate that this way of extending the efficiency of hash-coding to approximate matching achieves good performance and is considerably faster than most of the existing information retrieval algorithms.
Joint work with Geoffrey Hinton