This task involves extracting visual models, e.g., a cluster of regions, from a large collection of unannotated images. Typical approaches to this problem have three ingredients: Generating candidate segments (e.g., regions) from the input images, estimating the similarity between regions to form (implicitly) a similarity graph of regions, and pruning the graph to extract a small number of subgraphs corresponding to individual models. We will review the current approaches to this task in the first part of the talk. This task remains extremely challenging: As the size of data increases, which is necessary in order to discover "interesting" models, the complexity of the similarity graph increases rapidly because of the unsupervised and unconstrained nature of the task. We will discuss two broad classes of approaches to address this challenge. We discuss ways to incorporate additional constraints in the discovery process and we show how they can reduce dramatically the computational complexity, increase the accuracy, and retain the unsupervised flavor of the task. We conclude by briefly discussing approaches to unsupervised discovery of mid-level representations as an intermediate step.
Back to Graduate Summer School: Computer Vision