In text analysis and other symbolic domains, algorithms for efficient retrieval are very mature. Inverted indices, hash tables, Bloom filters and other related data structures allow rapid matching of partial queries to documents in a large repository. But in domains such as face recognition, audio/video copyright monitoring or fingerprint identification, although the concept of matching is still clearly defined, the associated algorithms are much less advanced.
I'll describe an approach to solving such problems using geometric fingerprints. These fingerprints are continous vector-valued hash codes and approximate matches can be retrieved using KD-tree type data structures. As an example application I will describe a system our group has built which can take as input any image of the night sky and identifies its precise location (and correspondences with know stellar objects). This system effectively "searches the entire sky" in much less than a second on a single workstation using the continuous hash code trick. For fun, if you email me a (link to) image before the talk I will show a live demo of solving it.
(joint work with Dustin Lang, Keir Mierle and David Hogg)
Audio (MP3 File, Podcast Ready)