Every summer, the Biocomplexity Institute’s Social and Decision Analytics Division’s Data Science for the Public Good (DSPG) Young Scholars program draws university students from around the country to work together on projects that use computational expertise to address critical social issues faced by local, regional, state or federal governments. The students conduct research at the intersection of statistics, computation, and the social sciences to determine how information generated within every community can be leveraged to improve quality of life and inform public policy. The program, held at the University of Virginia’s Arlington offices, runs for 10 weeks for undergraduate interns and 11 weeks for graduate fellows who work in teams collaborating with postdoctoral associates and research faculty from the division, and project stakeholders.
The 2019 cohort conducted nine research projects, and their methodologies and discoveries will be presented at MethodSpace over the next three weeks as part of our examinations of Methods In Action. The descriptions of the projects were penned by the students themselves, and their names, mentors and sponsors appear under the DSPG logo in the text.
Over the past three decades, social scientists and public health scholars documented how socio-economic factors and poor access to resources affect the health of millions of Americans today. Lower levels of income, education, and wealth, often coupled with limited access to health insurance and quality healthcare facilities, place poorer communities at a higher risk of health morbidities and mortality. For example, communities of color, especially those in segregated urban and remote rural areas, often lack preventive health care resulting in higher rates of illness and disease after accounting for residents’ age. These contexts make it harder to acquire quality nutrition and engage in physical exercise, which can contribute to rising obesity rates over time.
The Social and Decision Analytics Division at University of Virginia‘s Biocomplexity Institute and Initiative has partnered with Fairfax County and Inova Translational Medicine Institute to better understand regional health and well-being through a data science framework. Located in Northern Virginia, Fairfax County has nearly doubled in size—from approximately 455,000 to over 1,142,000 inhabitants—since its establishment in 1974. Its demographic characteristics are changing along with steady population growth. The county has seen an increase in the share of Hispanic and Black residents, as well as a slight increase in the proportion of its residents living in poverty.
To establish a baseline for measuring change, monitor progress, and intervene on social inequities through the development of future policies, Fairfax County first wanted to understand its socioeconomic landscape at sub-county level. Prior research made considerable strides exploring how economic vulnerability shapes health outcomes at the national, state, or county level. However, there is a growing need to scale research to inform policies at smaller geographic levels that are actionable for stakeholders, like in the case of Fairfax County. Our research team used an interdisciplinary, data-driven approach to discover, acquire, and statistically integrate publicly available local data and create CommunityScapes – statistical indicators that tell us more about the social, economic, and environmental well-being of Fairfax at the sub-county level.
Data and Methodology
Using the literature on the social determinants of health to guide our analysis, we set out to understand which mechanisms drive health risks in Fairfax County, Virginia. Social determinants of health and health disparities research more generally is typically conducted using national, state, or even country-level datasets. Our project is innovative in that it draws from multiple publicly available data sources to examine three distinct and small sub-county geographical units for patterns of economic vulnerability and obesogenic environment exposure. In doing so, our analyses are honed specifically for policymakers at supervisor districts and high school attendance areas rather than by administrative groupings like census tracts. We are able to provide stakeholders with actionable insights at a more meaningful geographic level.
Our study drew from three primary data sources: the American Community Survey (ACS), the Fairfax County Housing Stock, and OpenStreetMap (OSM) data. Given that federal surveys are typically not available at sub-county levels with sufficient granularity, our first challenge was constructing an ACS-based synthetic population for Fairfax County. The synthetic population enabled us to infer the characteristics and location of individuals in Fairfax County that we could then reaggregate to new geographies. First, we combined the synthetic population data with publicly available Fairfax housing stock data, and reaggregated ACS census-tract level data to high school attendance areas and supervisor districts.
Second, we explored the food and physical environment of Fairfax County, including locations of fast food stores, playgrounds, and supermarkets, using OSM. OSM is an open source mapping project that can be thought of as a combination of Wikipedia and Google Maps. We wrote functions to easily retrieve, reshape, and prepare OSM data for use, and then calculated 20-minute bus travel time isochrones from each facility or service of interest. The travel time isochrones describe the environment that contributes to obesity by determining whether a property is within a 20-minute bus trip to healthy food sources and physical exercise opportunities. Figure 1 shows an example of such a supermarket access polygon.
Next, to build the CommunityScape, we created two composite indices using our variables to measure economic vulnerability and obesogenic environments. Rather than simply aggregating measurable variables to generate a relative index score, composite indices provide a robust strategy for identifying unobserved factors and adjusting features by their underlying dimensional importance. We used principal component and factor analyses to quantify both feature and factor importance, using factor loadings and percentage of variance explained respectively. We obtained final results using a weighted aggregation scheme, normalized to a scaled score from zero to one, where one describes the most vulnerable area and zero the least.
Our economic vulnerability index highlights areas with high proportions of individuals who are, for example, unemployed, live in poverty, or do not own a vehicle. Our obesogenic environments exposure index (i.e., obesity risk environment) identifies areas where residents do not have convenient access to supermarkets and team sports facilities, or live in the vicinity of fast food restaurants. We generated maps to visually decompose these results at the census tract, high school attendance area, and supervisor district levels, showing the most and least vulnerable areas in Fairfax for each respective index. Figures 2 and 3 show examples of economic vulnerability mapped to Fairfax County high school attendance areas and of obesogenic environment exposure mapped to supervisor districts. The CommunityScape points to Arlington-adjacent areas and those south of Arlington as areas with the highest proportion of at risk populations. For example, residents of Annandale, Justice, and Mount Vernon high school attendance areas are most economically vulnerable; those living in Mason, Providence, and Lee supervisor districts have the highest obesogenic environment exposure. We mapped both composite indices to both geographies. Although results appeared similar, they had different implications. Maps at the high school attendance area can guide school-level interventions, while supervisor district maps can be informative for broader policy programs.
Findings and Future Directions
Our CommunityScapes offer legislators a novel tool for developing policies that span multiple geographic levels. While past research typically explores disparities at the national or state level, our work highlights economic and social risk factors at actionable sub-county geographies. Similarly, policy is often created at a national and state level, but local policymakers like boards of supervisors have the ability to influence resource distribution across sub-county populations. For example, both the economic vulnerability and obesogenic indices suggest that the Mason and Lee supervisor districts in Fairfax County are areas with the highest risk of poor outcomes, suggesting that legislators should tailor their policies to address mechanisms that reproduce health inequalities in those specific areas.
Moreover, our factor analyses suggest that multiple intersecting social and economic factors, such as living in poverty, not having health insurance, and not having a high school degree increase the health burdens of individuals living in those districts. Our model also points to challenges for racial and ethnic minorities, especially those without good ability to speak English proficiently, as these groups have an increased risk of being both economically vulnerable and living in obesogenic environments.
These factors provide lawmakers important insights into policies that directly address the needs of at-risk populations. For example, Fairfax County policymakers may consider implementing English as a Second Language programs, low or zero cost childcare initiatives, as well as GED programs for parents that are available in both English and Spanish. Our sub-county analyses illustrate the need for policymakers to look at populations from various viewpoints to gain a full understanding of how resources could be re-distributed across the county. Analyzing health inequities using computational methods provides decision-makers with new, vital tools for evidence-based policymaking.
Moving forward, we plan to enhance the CommunityScape to better inform policymaking for Fairfax County residents. To verify the robustness of our composite indices, the team will conduct sensitivity analyses of our model findings. We will continue to collaborate with our sponsor, Fairfax County, to contextualize the priorities of residents and policymakers and further refine our interpretations of which areas are most in need of support. Additionally, through discussions with sponsors and deeper policy research, we can gain a better understanding of current policy measures that are working and past initiatives that succeeded or failed to address health and prosperity. Overlaying Inova healthcare facilities’ locations on the CommunityScape will create a more informed picture of individuals’ ability to access care.
The current CommunityScape provides an overview of the distribution of opportunity across Fairfax County, but further measures will provide even more insight into the resources and demographic factors that impact one’s ability to live, work, and play in a healthy environment in Fairfax County.
Finally, our project provides an example for other communities at county and lower geographic levels that are in need of granular data to better understand their constituents. They may similarly employ synthetic population techniques and place-based, open source data to characterize small sub-county areas by boundaries relevant to decision-makers. Our approach allows communities to monitor change and progress, and to design and implement programs with a high degree of precision—where their residents need them most.