The cgDNA sequence-dependent coarse-grain model of dsDNA: Bridging the scales from Molecular Dynamics to Bioinformatics

John Maddocks
École Polytechnique Fédérale de Lausanne (EPFL)

The cgDNA+ coarse-grain model of DNA ( lcvmwww.epfl.ch/research/cgDNA/ ) can now accurately predict the sequence-dependent statistical mechanics properties, for example shape and stiffness (or equivalently first and second moments of the equilibrium distributions), of double-stranded DNA fragments of arbitrary sequence. At scales of tens of base pairs these predictions can be compared with Molecular Dynamics simulations and they agree very well. However the efficiency of the cgDNA+ model allows genome length scales to be scanned in order to identify mechanically exceptional sequence fragments, including in an epigenetically modified sequence alphabet. Large data sets and aspects of machine learning arise both in a) fitting model parameters (20K+) to molecular dynamics training data (some terabytes of time series data), and in b) identifying common patterns in the billions of Gaussian PDFs that are generated when a genome is scanned with a sliding sequence window.

Presentation (PDF File)

Back to Workshop III: Validation and Guarantees in Learning Physical Models: from Patterns to Governing Equations to Laws of Nature