Advances in memory and storage technologies are delivering improvements in capacity, bandwidth, and latency that continue to revolutionize data analytics. The blurring of the performance gap between memory and storage offers opportunities to to innovate in the interaction between application and data. In this talk I will describe on-going research into techniques that support data views, i.e. grouping and selecting narrow windows into large, irregular data sets.
Leveraging fast, high capacity storage available today, the UMap software library mediates low overhead user space access into virtual views of terabyte scientific data sets. For example, a multi-terabyte collection of tiles assembled from telescope imagery can be viewed as a 3D image cube to search for transients. The library uses a page fault messaging protocol in the Linux kernel to read specific files at specific offsets to satisfy accesses into the virtual 3D cube. Looking to future architectures, near memory hardware logic can search and filter data to assemble a dense, regular virtual view into sparse, irregular data sets. Such an offload facility reduces memory bandwidth pressure, improves cache locality, and enables vector/SIMD unit utilization.