Multimedia content has become a ubiquitous presence on all our computing devices, spanning the gamut from live content captured by device sensors such as smartphone cameras to immense databases of images, audio, video, 3D scans and 3D models stored in the cloud. As we try to maximize the utility and value of all these petabytes of content, we often do so by analyzing each piece of data individually and foregoing a deeper analysis of the relationships between the media. Yet with more and more data, there will be more and more connections and correlations, because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy.
It is important to develop rigorous mathematical and computational tools for making such relationships or correspondences between data sets first-class citizens -- so that the relationships themselves become explicit, algebraic, storable and searchable objects. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between media data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps and relationships, small and large, between inter-related data. Information transport and aggregation in such networks naturally lead to abstractions of objects and other visual entities, allowing data compression while capturing variability as well as shared structure. Furthermore, the network can act as a regularizer, allowing us to to benefit from the "wisdom of the collection" in performing operations on individual data sets or in map inference between them, ultimately enabling a certain joint understanding of data that provides the powers of abstraction, analogy, compression, error correction, and summarization. Examples include entity extraction from images or videos, 3D segmentation, the propagation of annotations and labels among images/videos/3D models, variability analysis in a collection of shapes, etc.
Finally we describe our ShapeNet effort, an attempt to build a large-scale repository of 3D models richly annotated with geometric, physical, functional, and semantic information -- both individually and in relation to other models and media. More than a repository, ShapeNet is a true network that allows information transport not only between its nodes but also to/from new visual data coming from sensors. This effectively enables us to add missing information to signals (computational imagination), giving us for example the ability to infer what an occluded part of an object in an image may look like, or what other object arrangements may be possible, based on the world-knowledge encoded in ShapeNet.