Humanists have few opportunities to use advanced technologies for analyzing large, messy sound archives. In response to this lack, the HiPSTAS (High Performance Sound Technologies for Access and Scholarship) Project is developing a research environment that uses machine learning and visualization to automate processes for describing unprocessed spoken-word collections of keen interest to humanists. This paper describes how we have developed, as a result of HiPSTAS, a machine learning system called ARLO (Adaptive Recognition with Layered Optimization). I describe a use case for finding moments of applause in the PennSound collection, which includes approximately
36,000 files comprising 6,200 hours of poetry performances and related materials. We conclude with a brief discussion about our preliminary results and some observations on the efficacy of using machine learning to facilitate generating data about unprocessed spoken-word sound collections in the humanities.