Atomistic simulations based on classical mechanics are nowadays routinely employed to investigate the behavior of chemical and biological systems. Large amounts of high-dimensional data can be produced in atomistic simulations (atomic positions, forces, etc.). Often, however, only a few macroscopic observables are recorded during these simulations. Deciding which data to keep, in a principled fashion, and how to best utilize the giant amount of generated data to produce useful results and learn about the important collective variables that determine the macroscopic behavior of proteins and chemical systems are key questions which will be discussed in this workshop.
The simulations are often stochastic or approximated by stochastic systems, and important features of the dynamics include rare events. Designing better adaptive sampling algorithms in these situations, leveraging data from long or short simulations, is often tied to the problem of learning good collective variables. Finally, the dimensionality reduction problems underlying the questions above require a robust quantitative understanding of the geometry of the effective spaces of configurations of a molecule, or of family of molecules in chemical compound space. This will permit better understanding of collective variables and the ability to navigate and explore molecular and chemical compound space. Since robust dimensionality reduction techniques and fast computational methods tend to be multiscale (in space, time, molecular resolution, etc.), a key aspect of the workshop will be to develop a better understanding of the ways in which “multiscale” reasoning can have the greatest impact in the context of molecular simulations.
This workshop will bring together a mix of mathematicians, physicists, chemists, computer scientists, and biologists to address some of the following questions: Is it possible to generate a low-dimensional representation for (a subset of) the chemical compound space (CCS)? What are the appropriate descriptors for different molecular properties in CCS? How can we deal with the permutational space of many-component alloys? How does the choice of descriptors affect the efficiency and faithfulness of a model constructed with Machine Learning (ML) techniques? Are current coarse-graining approaches that represent proteins as collections of functional groups or backbone degrees of freedom optimal in any sense? What are other possibilities? Can ML techniques be trained on long or short molecular dynamics (MD) trajectories and condense these complicated trajectories into a reduced representation? How can accurately determined macroscopic observables from MD simulations be obtained from such reduced (collective) representations?
This workshop will include a poster session; a request for posters will be sent to registered participants in advance of the workshop.
Cecilia Clementi
(Rice University)
Leslie Greengard
(New York University)
Mauro Maggioni
(Duke University)
Susan Sinnott
(Pennsylvania State University)