Extracting knowledge from vast datasets is a major challenge in data-driven applications. Where the scalability of many classical, data-centered algorithms in data mining prohibits the use of massive data, ideas from simulation can help: Numerical methods that discretize the underlying feature space can lead to algorithms that scale only linearly in the size of the data set to learn from. However, they fully suffer the curse of dimensionality and are thus typically disregarded. Fortunately, hierarchical numerical schemes can help to mitigate the curse, at least to some extent.
We consider data mining with adaptive sparse grids, which results in scalable hierarchical algorithms for the analysis of Big Data. This approach is suited for massive numerical data with a moderate number of features, as obtained from sensor measurements or simulations, for example. To keep the overall computational effort small, we have studied efficient implementations on parallel and distributed systems for regression, classification and clustering. The co-design of data structures and algorithms results in a streaming algorithm that is well-suited for modern hardware. We show that this can exploit GPUs and other accelerator cards extremely well, and that it can even scale to massively parallel HPC systems.