The principal goal of the Information Retrieval (IR) task is, given a user query, to present results that are as relevant as possible to that query. Secondary goals include ensuring that the presented results cover the possible user intents, if the query is ambiguous, and ensuring that adversarial third parties don't contaminate the process with results that appear to be relevant, but that turn out not to be useful for most users ("spam"). In this talk I will concentrate on the former, principal goal - the problem of training systems, given a set of queries, urls, and relevance labels, to return the most relevant documents possible. The typical metrics for this task used by the IR community depend only upon the sorted order of the returned documents and on the relevance labels (which are typically human-assigned, either by judges or from click data). This presents some difficulties for standard machine learning techniques, since the objective functions, viewed as functions of the scores output by the model, are either flat or discontinuous everywhere. Secondly, in the context of Web search, the data sets used for training, and certainly those encountered in test phase, tend to be very large, and this presents extra constraints on the kinds of algorithms we can use. I will discuss these problems and some solutions.
Audio (MP3 File, Podcast Ready)