Non-parametric approaches to object recognition have received limited attention in the Computer Vision community due to the high dimensionality of visual data. With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. We explore this world with the aid of a large dataset of 10^8 images collected from the Internet, using a variety of non-parametric methods.
Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32x32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels in such a way that the effects of labeling noise are minimized. For certain classes which are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors. We also demonstrate a range of other applications including automatic image colorization and orientation determination.
Joint work with Antonio Torralba and William T. Freeman, both at MIT.