The availability of modern massive parallel computers empowers scientists to study larger and complex problems, thus, nowadays simulations with unprecedented size can be performed, and the new models with more realistic conditions can be tested and applied faster than just few years ago. Computer architectures are constantly evolving and increasing the of calculation power, therefore this poses new challenges to computer and computational scientists. Thus, the efficient use of new advanced computers require sophisticated algorithms and apply new programing paradigms that allow researchers to speed up new discoveries and inventions. Indeed, it will require a significant effort to modify these codes to survive in the upcoming exascale era. Distributed computing could be used to process concurrently thousands of atomic –scale structures. Intensive computational campaigns to produce meaningful data sets require not only anticipate production paths but also protocols to maximize computer cycles and reduce biases. Workflows combined with data analytics could dynamically adapt and systematically improve the size and quality of the samples. In this talk, I will share some experiences in workflows to produce molecular datasets in the context of the brand new Argonne Data Science Program.