The problem of transfer and domain adaptation is ubiquitous in machine learning and concerns situations where predictive technologies, trained on a given source dataset, have to be transferred to a new target domain that is somewhat related. For example, transferring voice recognition trained on American English accents, to apply to Scottish or Kenyan accents. A first challenge is to understand how to properly model the ‘distance’ between source and target domains, viewed as probability distributions.
In this talk we will argue that various existing notions of distance between distributions turn out to be pessimistic, i.e., these distances might appear high in various situations where transfer is possible, even at fast rate. Instead we show that some new notions of distance tightly capture a continuum from easy to hard transfer, and furthermore can be adapted to, i.e., do not need to be estimated in order to perform near-optimal transfer. Finally we will discuss rate-optimal approaches to practical questions such as how much target data (usually more expensive to obtain) is sufficient for a desired target error, if one already has access to any given amount of source data.
This talk is based on some joint work with G. Martinet, and ongoing work with S. Hanneke.