Operations research has a long history of supporting decision-making in public school systems to improve efficiency and equity in their transportation and logistics operations, going back to early papers related to school desegregation in the United States in the 1960s. One particularly rich area of research is in access to schools, developing models and solution methods to address decisions related to school district design and student transportation. However, one critical issue remains in this stream of work. Data related to where students live and the schools they attend are fundamental to district design and school transportation decisions. Such data are also highly protected under federal law as they contain sensitive student information and, hence, the data are not publicly available. As a result, this field of study lacks common data sets that reflect the associated aspects of the school districts to test new approaches and compare with prior work. Additionally, without such widely available data, it is difficult for researchers new to this area of study to conduct research in school transportation and logistics without a partnership with a school district.
Data sets used for school operations problems, such as school bus routing or district design, include geocoded data regarding student residences and school locations, as well as demographic data about students. Similar to the common practices in the literature, one approach to address the protected nature of school data is to create synthetic data sets, often by randomly locating schools and students over a geographic region. Expectedly, there are many unknowns in creating synthetic data sets related to school operations, including how to locate schools and distribute enrollments around the schools and across the district. Clearly, these are not random in reality. In the United States, for example, decades of systemic discrimination in access to housing and education has profoundly impacted the distribution of students and schools within communities. Randomly distributing students over space also ignores important geographic characteristics and special features of school districts, including bodies of water and industrial parks.
In this talk, we present an approach to create data sets for school transportation and logistics problems that capture realistic features of school districts, while relying only on publicly available data. Using data from two government sources, the U.S. Census Bureau and the U.S. Department of Education National Center for Education Statistics, we create simulated school district data sets to be used in the development and evaluation of operations research methods for school operations. These simulated school districts represent the range of public school districts throughout the United States.