Typical components of data-processing pipelines in the machine learning are characterized by computational costs that grow quickly with the increasing data complexity and size – accompanied by a very moderate improvement of performance for the increasing model complexity. Typical key prerequisite consists of finding reliable discrete approximations of complex systems, however, common discretization approaches (for example, the very popular K-means clustering) are crucially limited in terms of quality and cost.
We introduce a low-cost improved-quality Scalable Probabilistic Approximation (SPA) framework, allowing for a simultaneous joint optimal solution of datadriven feature selection, discretization and Bayesian/Markovian model inference problems.
We shortly review theoretical foundations of new methodology and demonstrate the performance on a broad range of application problems from natural sciences – as well as comparisons with common tools from statistics and deep learning.
Joint work with Gerber S. and Horenko I.
References:
Gerber S., Pospisil L., Navandar M., Horenko I.: Low-cost scalable discretization, prediction and feature selection for complex systems, under review, 2019, https://www.biorxiv.org/content/10.1101/720441v1