We present an unsupervised approach to feature learning that uses a hierarchical model to decompose images via alternating layers of convolutional sparse coding and a trainable form of pooling. When applied to natural images, the layers of our model capture image information in a variety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hierarchical approaches. This makes it possible to learn multiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. Features extracted from these models, in combination with a standard classi?er, outperform SIFT and representations from other feature learning approaches. Joint work with Matt Zeiler (NYU) and Graham Taylor (NYU, now at U.Guelph, Canada).
Back to Graduate Summer School: Computer Vision