Thanks in part to the recent popularity of the buzzword "big data," it is now generally understood that many important scientific breakthroughs are made by interdisciplinary collaborations of scientists working in geographically distributed locations, producing and analyzing vast and complex data sets. The extraordinary advances in our ability to acquire and generate data in physical, biological, and social sciences are transforming the fundamental nature of science discovery across domains. Much of the research in this area, which has become known as data science, has focused on automated methods of analyzing data such as machine learning and new database techniques. Less attention has been directed to the human aspects of data science, including how to build interactive tools that maximize scientific creativity and human insight, and how to train, support, motivate, and retain the individuals with the necessary skills to produce the next generation of scientific discoveries.
In this talk, I will argue for the importance of a human centered approach to data science as necessary for the success of 21st century scientific discovery. Further, I attest that we need to go beyond well-designed user interfaces for data science software tools to consider the entire ecosystem of software development and use: we need to study scientific collaborations interacting with technology as socio-technical systems, where both computer science and social science approaches are interwoven. I will discuss promising research in this area, describe opportunities to participate in the recently announced $37.8M Moore/Sloan Data Science Environment, and speculate upon future directions for data science.