Introducing SIDE machine-learning spatial-statistics geospatial-data ethnicity
with Carl Muller-Crepon. Published in the Journal of Peace Research, 2018.
Abstract: Research on ethnic politics and political violence has benefited substantially from the growing availability of cross-national, geo-coded data on ethnic settlement patterns. However, because existing datasets represent ethnic homelands using aggregate polygon features, they lack information on ethnic compositions at the local level. Addressing this gap, this article introduces the Spatially Interpolated Data on Ethnicity (SIDE) dataset, a collection of 253 near-continuous maps of local ethno-linguistic, religious and ethno-religious settlement patterns in 47 low- and middle-income countries. We create these data using spatial interpolation and machine learning methods to generalize the ethnicity-related information in the geo-coded Demographic and Health Surveys (DHS). For each DHS survey we provide the ethnic, religious and ethno-religious compositions of cells on a raster that covers the respective countries at a resolution of 30 arc-seconds. The resulting data are optimized for use with geographic information systems (GIS) software. Comparisons of SIDE with existing categorical datasets and district-level census data from Uganda and Senegal are used to assess the data’s accuracy. Finally, we use the new data to study the effects of local polarization between politically relevant ethnic groups, finding a positive effect on the risk of local violence such as riots and protests. However, local ethno-political polarization is not statistically associated with violent events pertaining to larger-scale processes such as civil wars.