Every summer, the Biocomplexity Institute’s Social and Decision Analytics Division’s Data Science for the Public Good (DSPG) Young Scholars program draws university students from around the country to work together on projects that use computational expertise to address critical social issues faced by local, regional, state or federal governments. The students conduct research at the intersection of statistics, computation, and the social sciences to determine how information generated within every community can be leveraged to improve quality of life and inform public policy. The program, held at the University of Virginia’s Arlington offices, runs for 10 weeks for undergraduate interns and 11 weeks for graduate fellows who work in teams collaborating with postdoctoral associates and research faculty from the division, and project stakeholders.
The 2019 cohort conducted nine research projects, and their methodologies and discoveries will be presented at MethodSpace over the next three weeks as part of our examinations of Methods In Action. The descriptions of the projects were penned by the students themselves, and their names, mentors and sponsors appear under the DSPG logo in the text.
“If you can’t measure it, you can’t improve it.” – Peter Drucker
Arlington County features unique nightlife destinations in the Washington, D.C. metro region. However, areas such as Clarendon, with abundant restaurants that serve alcohol challenge police with alcohol-related crimes such as assault, theft, public intoxication, DUI, and disorderly conduct. The Arlington County Police Department is working to reduce crime and ensure that people are safe, and businesses can prosper.
The police department launched the Arlington Restaurant Initiative encouraging bars and restaurants to adopt techniques for improving safety within bars and beyond. For instance, the Arlington nightlife officers may see someone stumbling down the sidewalk and send them home in a cab. They also help bar managers and staff reduce behaviors contributing to pressing problems, such as fake ID, underage drinking, and over-serving. To better understand these misconducts, officers began compiling information on civilian-police interactions during the busiest nights, regardless of whether they wrote a formal incident report.
After two years of the initiative, crime in the area has notably declined. We wanted to evaluate whether the reduction in crimes can be attributed to the program and to understand if resources devoted to the effort are justified. For this reason, we used a snapshot of Clarendon’s weekend nightlife population as a case study. Our criteria included establishments open Friday and Saturday nights. We also consulted the Virginia Alcoholic Beverage Control Authority licensing data to ensure these establishments serve alcoholic beverages. Moreover, participation in the program is voluntary. We considered if an establishment willingly participates in the intervention by seeking certification or training for their staff. This approach is an example of using multiple administrative datasets some which are public and others that were provided by the partner.
From there, we needed a counterfactual; an understanding of crime levels if the intervention had not occurred. To do so, we identified similar establishments and compared their rates of crime. The establishments had an extensive list of attributes available on Yelp ranging from kid-friendliness to ambiance. We enriched this comparison with police officers’ records of their experience and perception of an establishment’s crime risk.
Our approach was to consider pairs of certified restaurants that are similar for the risk assessment scale and Yelp features. We used a Jaccard similarity matrix to create pairs of similar restaurants. For the statistical modeling, we compared the calls for service of relevant incident types for each pair during the time that the first restaurant was certified and the later one had not been yet. The baseline model was a paired t-test. A paired t-test is a test of means between two groups taking into account the variance of each group.
The goal of program evaluation and a cost-benefit analysis is an optimal allocation of resources. One way to estimate the societal costs of crime is to determine the reduction of crime attributed to the program and multiply by the cost per crime. However, placing a price tag on offense is not without challenge: there is no golden number. Instead, we assign a degree of weight with informed estimates. One approach employed by economists involves surveys to gauge individual willingness to pay for a crime reduction. The technique provides perceived value, like the psychological comfort of safety, but people often overstate their willingness to pay.
For this reason, we are using tangible dollar values for both individual and societal costs. Let’s take the example of cost assessment for drunk driving. There are fixed costs such as legal or medical fees incurred from the incident. Jail sentence duration varies by severity of the offense, and we multiplied daily operating costs per length of inmate stay. There are also hidden costs: vehicle towing, license reinstatement, community service hours, and vehicle breathalyzer machines. For now, we do not have access to multiple offense information. Therefore, we assumed a first-time stand-alone offense, and our estimate is conservative with a low to high range.
Statistical models are not data agnostic and depend on the assumptions and characteristics of the data for each application. Our preliminary results agree with literature finding weak evidence for similar interventions. However, challenges such as underpower are prevalent across such studies. If the initiative reduces crime, our sample was too small to detect the effect. Few restaurants had crime incidents, and most reported none. Lesson 1: if possible, profile your data before selecting the statistical approach.
Another lesson from this experience is the importance of gradual model refinement. For example, precise estimates and elaborate modeling techniques cost time and energy. It is often a good idea to enact basic models that can provide a general understanding of the data beforehand. For instance, consider a conservative cost-benefit analysis showing high investment return. Adding a marginal cost will not increase the already high-cost estimate by much. Our next approach will rely on police domain knowledge records to consider the voluntary element of the program and increase the sample size. With our recent acquisition of police salary records, we can also incorporate estimates for the Arlington police department’s costs. Measuring the impact of the program is vital to being able to improve it.