Effective use of data management techniques for analysis and visualization of massive scientific data is a crucial ingredient for the success of any supercomputing center and cyberinfrastructure for data-intensive scientific investigation. In the progress towards exascale computing, the data movement challenges have fostered innovation leading to complex streaming workflows that take advantage of any data processing opportunity arising while the data is in motion.
In this talk, I will present a number of techniques developed at the Center for Extreme Data Management Analysis and Visualization (CEDMAV) that allow building a scalable data movement infrastructure for fast I/O while organizing the data in a way that makes it immediately accessible for analytics and visualization. Also, I will present an advanced in-situ data analytics framework that allows processing data on parallel supercomputers without requiring advanced user knowledge of parallel computing or advanced runtime systems.
Overall, this leads to a flexible data streaming workflow that allows working with massive simulation models or data from high-resolution experimental facilities without compromising the interactive nature of the exploratory process that is characteristic of the most effective data analytics and visualization environment.