By Amin Boukari, Aniruddha Murali, Bryan Lu, Ispeeta Deka, Jacob Nardini, and Srikripa Krishnan | Socioeconomic Factors Team
Overview
The pandemic has shed light on societal disparities and economic inequalities. We are undertaking extensive analysis and drawing data-driven insights on how the pandemic is disproportionately affecting marginalized and underserved communities. We are assessing the relationship between socioeconomic variables and the epidemiological profiling of COVID-19 including case, death, and recovery rates on zip-code, county, state, and nation-wide level through statistical testing and modelling. This has enabled us to understand the distribution of spread and impact across geographies on a local district level to a country wide level. Through methodologies such as feature extraction and cluster analyses we are able to understand the inherent correlations between socioeconomic variables so we can model appropriately.
Our team has also computed and visualized aggregate risk scores for specific counties based on the socioeconomic variables, population density, and testing rate so we can inform, or rather, assist policy makers and higher officials in their prioritization and allocation of resources.
Methodology For Developing The Model
Pull and parse relevant data from credible data sources with proper licenses.
Clean dataset.
Hone in on 99 socioeconomic variables from ACS Census Bureau 2015-2019.
Pull population density and append with rest of data.
Pull updated case and death data for each county and calculate respective case and fatality rates per 1000 tests.
Normalize all variables using min-max scaler function.
Determine correlation coefficients between each variable + case rate / fatality rate and multiply them by each index to then create a matrix.
Sum all indexes across the row to get total risk score for each county.