This presentation addresses the problem of large-scale image
search. Three constraints have to be taken into account: search accuracy, efficiency and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the
reference bag-of-visual words approach for any given vector
dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving
high accuracy. Searching a 100 million image dataset takes about 250 ms on one processor core.
This is joint work with H. Jegou, F. Perronnin, M. Douze, J. Sanchez
and P. Perez.