What is it about?

The article proposes case studies for introduction to statistics courses. Specifically, the case studies rely on real, regional open data to teach statistical concepts. For example, does the probability of a search during a police vehicle stop depend on the driver's race? We use data from over 100,000 traffic stops in San Diego to explore this question. R code is made available to instructors and motivated students.

Why is it important?

One recommendation in the Guidelines for Assessment and Instruction in Statistics Education College Report to teach introductory courses in statistics was to use real data with context and purpose. Many educators have created databases consisting of multiple datasets for use in class; sometimes making hundreds of datasets available. Yet “the context and purpose” component of the data may remain elusive if just a generic database is made available. These datasets are often also unrealistically small. Countries and cities continue to share data through open data portals. Hence, educators can find regional data that engage their students more effectively. We present excerpts from case studies that show the application of statistical methods to data on: crime, housing, rainfall, tourist travel, and others. Data wrangling and discussion of results are recognized as important case study components. Thus, the open data based case studies attend most GAISE College Report recommendations.


Introductory Managerial statistics is among the easiest college stat courses. Yet, business students are among the least interested in the topic, perhaps due to statistics anxiety and unfavorable attitude towards statistics. These case studies have enabled me to capture the attention of students, leading to better participation from them.

Roberto Rivera
Universidad de Puerto Rico

