Learning effective representations of molecules is critical in a variety of tasks in biomedical and material science. In this seminar, I will talk about how to design and evaluate data-efficient and trustworthy deep learning models toward more accurate and generalizable predictions as well as targeted generations. Challenges such as incorporating molecular grammar learning, augmenting molecular generative models with geometric information, and efficient latent space search and exploitation will be discussed. I will also present MolFormer, a recently published compute-efficient large chemical “language” model, which demonstrated emergence of human taste perception and three-dimensional geometric features – that are the signature of trust differentiators beyond the training data.
Back to Learning and Emergence in Molecular Systems