Gerrymeandering in Alabama

Contributors

  • Lucas Nerbonne, , Middlebury College Earth and Climate Sciences Department, Middlebury Geography Department

Abstract

This is a study of gerrymandering in Alabama. We will test different metrics of spatial compactness and diversity to assess their efficacy in predicting the representiveness of different voting districts. We will then extend the work of prior studies by calculating a representivness metric to combines social and geographic metrics of ‘fairness’.

Study Metadata

  • Key words: Political Representation, Gerrymeandering, Alabama, Convex Hull, Elections
  • Subject: Social and Behavioral Sciences: Geography: Geographic Information Sciences
  • Date created: 2025-02-17
  • Date modified: 2020-02-17
  • Spatial Coverage: Alabama (State)
  • Spatial Resolution: Census block groups
  • Spatial Reference System: EPSG:4269 NAD 1983 Geographic Coordinate System
  • Temporal Coverage: 2020-2023
  • Temporal Resolution: Decennial Census

Study design

An original, exploratory study assessing the comparative findings of commonly used to quantify degreess of congressional district gerrymandering. We will also assess the usefulness of a new gerrymandering metric based on the convex hull of a congressional district and the representativeness inside the convex hull compared to the congressional district writ large.

Enumerate specific hypotheses to be tested or research questions to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.

Materials and procedure

Computational environment

# record all the packages you are using here
# this includes any calls to library(), require(),
# and double colons such as here::i_am()
packages <- c("tidyverse", "here", "sf", "tmap", "tidycensus", "lwgeom")

# force all conflicts to become errors
# if you load dplyr and use filter(), R has to guess whether you mean dplyr::filter() or stats::filter()
# the conflicted package forces you to be explicit about this
# disable at your own peril
# https://conflicted.r-lib.org/
require(conflicted)
## Loading required package: conflicted
# load and install required packages
# https://groundhogr.com/
if (!require(groundhog)) {
  install.packages("groundhog")
  require(groundhog)
}
## Loading required package: groundhog
## groundhog says: No default repository found, setting to 'http://cran.r-project.org/'
## Attached: 'Groundhog' (Version: 3.2.2)
## Tips and troubleshooting: https://groundhogR.com
# this date will be used to determine the versions of R and your packages
# it is best practice to keep R and its packages up to date
groundhog.day <- "2025-02-19"

# this replaces any library() or require() calls
groundhog.library(packages, groundhog.day)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4
## here() starts at /Users/lucas/Documents/GitHub.nosync/gerrymanderAL
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
## Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.11.0, PROJ 9.1.0
## Warning in fun(libname, pkgname): GEOS versions differ: lwgeom has 3.11.0 sf
## has 3.13.0
## Warning in fun(libname, pkgname): PROJ versions differ: lwgeom has 9.1.0 sf has
## 9.5.1
## 
## Attaching package: 'lwgeom'
## The following object is masked from 'package:sf':
## 
##     st_perimeter
## Successfully attached 'tidyverse_2.0.0'
## Successfully attached 'here_1.0.1'
## Successfully attached 'sf_1.0-19'
## Successfully attached 'tmap_4.0'
## Successfully attached 'tidycensus_1.7.1'
## Successfully attached 'lwgeom_0.2-14'
# you may need to install a correct version of R
# you may need to respond OK in the console to permit groundhog to install packages
# you may need to restart R and rerun this code to load installed packages
# In RStudio, restart r with Session -> Restart Session

# record the R processing environment
# alternatively, use devtools::session_info() for better results
writeLines(
  capture.output(sessionInfo()),
  here("procedure", "environment", paste0("r-environment-", Sys.Date(), ".txt"))
)

# save package citations
knitr::write_bib(c(packages, "base"), file = here("software.bib"))

# set up default knitr parameters
# https://yihui.org/knitr/options/
knitr::opts_chunk$set(
  echo = FALSE, # Run code, show outputs (don't show code)
  fig.retina = 4,
  fig.width = 8,
  fig.path = paste0(here("results", "figures"), "/")
)

Data and variables

Describe the data sources and variables to be used. Data sources may include plans for observing and recording primary data or descriptions of secondary data. For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.

Primary data sources for the study are to include census block groups, alabama congressional districts, and presidential voting totals from the 2020 election.

Each of the next subsections describes one data source.

Alabama Census Block Groups

  • Abstract: Vector polygon geopackage layer of Census tracts and demographic data.
  • Spatial Coverage: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]
  • Spatial Resolution: Census block groups
  • Spatial Reference System: EPSG 4269 NAD 1983 geographic coordinate system
  • Temporal Coverage: 2020 census
  • Temporal Resolution: Single census survey period
  • Lineage: Downloaded from the U.S. Census APL “pl” public law summary file using ‘tidycensus’ in R
  • Distribution: US Census API
  • Constraints: Public Domain data free for use and redistribution.

Aquiring data using tidycensus in R

blockgroups <- get_decennial(geography = "block group",
                               sumfile = "pl",
                               table = "P3",
                               year = 2020,
                               state = "Alabama",
                               output = "wide",
                               geometry = TRUE,
                               keep_geo_vars = TRUE)
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
GEOID ID Code Code that uniquely identifies census tracts Numeric N/A
P4_001N Total population over 18 Total population over 18 years old in the 2020 census, divided by block group Numeric Generally Accurate
P4_006N Total black population over 18 Total black population over 18 years old in the 2020 census, divided by block group Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)
P5_003N Institutionalized population Total institutionalized population in correctional facilities for adults during the 2020 census, 18 years or older divided by block group Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)

Voting Precincts from 2020 Presidential Election

  • Abstract: Voting data by precinct
  • Spatial Coverage: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]
  • Spatial Resolution: Voting Precincts
  • Spatial Reference System: EPSG 4269 NAD 1983 Geographic Coordinate System
  • Temporal Coverage: One Year
  • Temporal Resolution: 2020
  • Lineage: Downloaded as a sgpkg. Prior processing information is avalible in al_vest_20_validation_report.pdf and readme_al_vest_20.txt
  • Distribution: Publically avalible at the Redistricting Hub website with free login.
  • Constraints: Permitted for noncommercial and nonpartisan use only, as per original data access agreement. Copyright information found in redistrictingdatahub_legal.txt
  • Data Quality: Complete
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
VTDST20 District ID Voting District ID Numeric
GEOID20 Location Unique Geographic ID Coordinate
G20PRETRU Republican Voters Total votes for Donald Trump in 2020 Numeric
G20PREBID Democratic Voters Total votes for Joe Biden in 2020 Numeric

districts23 Layer of districts.gpkg

  • Abstract: Spatial bounds and characteristics of U.S. Congressional districts in Alabama
  • Spatial Coverage: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]
  • Spatial Resolution: U.S. Congressional Districts
  • Spatial Reference System: EPSG 3857 WGS 1984 Web Mercator Projection
  • Temporal Coverage: Districts approved in 2023 for use in the 2024 elections.
  • Temporal Resolution: N/A
  • Lineage: Loaded into QGIS as ArcGIS feature service layer and saved in geopackage format. Etraneous data fields were removed and the FIX GEOMETRIES tool was used to correect geometry errors.
  • Distribution: Avalible from the Alabama State GIS via ESRI feature service
  • Constraints: Public Domain data free for use and redistribution.
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
DISTRICT District Number U.S. Congressional District Number Numeric N/A N/A N/A N/A
POPULATION Population Number of people residing in each congressional district (2020 census) Numeric Generally accurate on a full-population scale
WHITE Number of white residents Total number of white residents (2020 census) Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)
BLACK Number of black residents Total number of black residents (US Census) Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)

Prior observations/bias

At the time of this study pre-registration, the authors had very little prior knowledge of the geography of the study region with regards to the potential gerrymandering congressional districts. The study authors have some prior knowledge of the racial distribution of populations in the state as they pertain to historical settlement (oftentimes involuntary) patterns.

For each secondary source, declare the extent to which authors had already engaged with the data:

###Alabama Census Block Groups - [ ] data is not available yet - [x] data is available, but only metadata has been observed - [ ] metadata and descriptive statistics have been observed - [ ] metadata and a pilot test subset or sample of the full dataset have been observed - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data.

###2020 Presidential Election Voting Precincts - [ ] data is not available yet - [x] data is available, but only metadata has been observed - [ ] metadata and descriptive statistics have been observed - [ ] metadata and a pilot test subset or sample of the full dataset have been observed - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data. - [ ] data is not available yet

###Districts23 layer of districts.gpkg - [ ] data is not available yet - [x] data is available, but only metadata has been observed - [ ] metadata and descriptive statistics have been observed - [ ] metadata and a pilot test subset or sample of the full dataset have been observed - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data. - [ ] data is not available yet

Bias and threats to validity

Because primary data is not being incorporated in this study, potential sources of bias are limited. The data utilized in this study is generally considered reputable (census, voting totals), although at larger scales the 2020 census has been seen to systematically undercount minorities, a trend that may impact the racial distribution section of this study by not accuratly giving a measure of the relative diversity of different block groups. Because it’s difficult to know how this systemic undercounting might effect areas differently, I will not attempt to make any corrections for it.

Data transformations

Coordinate Transformation

Transform the Census coordinate systen to match that of the districts and precincts layer

library(tidyverse)
blockgroups<-blockgroups%>%
  st_transform(crs = 3857)

Calculate the total Black/African American population in each block group

The Census makes it tricky to pull the ‘black’ population data because of the plethora of different combinations of race designations that respondents can use to describe their racial identity. For example, someone who responds that they are both Hispanic AND Black, they will have a different designation than someone who responds as only black. For this study, we’re going to consider the hispanic and black individual black, so that designation’s population total will need to be added to the overall black population total.

To gather this data, I’ll gather a list of all the words that might be used to describe a black individual. Code courtesy of Joseph Holler’s Github, because I had no idea how to do this.

pulled_metadata <- load_variables(2020, "pl")
black_vars <- pulled_metadata |> 
  dplyr::filter(str_detect(name, "P3"), #P3 are population columns that include race designations
                str_detect(label, "Black")) |> #pulls only the data where there label column includes 'Black'
  select(-concept) #excludes the descriptor label column

Next, I’ll use this list to aggregate population data from the columns that are included in the ‘black_vars’ list.

blockgroups2<-blockgroups%>%
  mutate(BlackPopulation = rowSums(across(all_of(black_vars$name))))

final_population <- blockgroups2 %>%
  mutate(
    Total_POP = P3_001N,
    Black_POP = BlackPopulation,
    Black_Percentage = BlackPopulation / P3_001N
  ) %>%
  select(GEOID, Total_POP, Black_POP, Black_Percentage)

This code chunk will output a table named ‘final_population’ with four columns- their names and descriptors are below. Total_POP: Total population in each census block Black_POP: Total black population in each census block Black_Percentage: The percentage of each census block that at minimum partially identifies as black

Calculate area for each district

I’ll calculate three separate compactness metrics that all are in the form of area vs perimeter (Polsby-Popper metric), convex hull area, and minimum bounding circle area.

#sf_use_s2 is set to FALSE to calculate ellipsoidal area instead of spherical  
sf_use_s2(FALSE)

#read in districts 
districts <- st_read(here("data", "raw", "public", "alabama", "districts.gpkg"), layer = "districts23")

#calculate area/perimeter metric (polsby-popper)
districts1 <- mutate(districts, 
                     districts_area = st_area(geom),
                     districts_perim = st_length(st_cast(st_cast(geom, "MULTIPOLYGON"), "MULTILINESTRING")),
                     polsby_popper = round(
                       as.numeric(
                         (4 * pi * districts_area) / districts_perim^2),
                       2))%>%
  select(DISTRICT, districts_area, districts_perim,polsby_popper)

#create a seperate layer, districts_convex, that saves creates a convex hull for each district
districts_convex <- districts1%>%
  st_convex_hull()
districts_convex <- districts_convex %>% 
  mutate(hullarea = st_area(geom),
         compact_hull = round(as.numeric(districts_area / hullarea), 2))

#create a third layer, bound_circle, that will store the minimum bounding circle metric
bound_circle<- districts1%>%
  st_minimum_bounding_circle()
bound_circle <- bound_circle %>%
  mutate(bound_circle,
         mbcarea = st_area(geom),
         compact_circ = round(as.numeric(districts_area / mbcarea), 2))

#combine all into a table
compactness_summary<- tibble(districts$DISTRICT, districts1$polsby_popper, districts_convex$compact_hull, bound_circle$compact_circ, districts$geom)
#change the column names
colnames(compactness_summary)<- c("District", "Polsby_Popper", "Convex_Hull", "Minimum_Bounding_Circle", "geom")

This gives us the compactness scores from each of our three metrics in the same df, ready to be mapped. We also still retain the three seperate geometries used to calculate area for the compactness metrics, which will be potentially useful down the road.

###Race

To gather the population race breakdown for each district, we’ll employ our block group data that we cleaned earlier. The block group data needs to be split by district, convex hull, and minimum bounding circle so that we can get an accurate measure of population- to do this, it also needs to undergo area weighted reaggregation to properly apportion population on either side of a split. This introduces more error into our calculations, as people don’t evenly distribute across space, but it’s a acceptable amount of error given the small scale we’re working at.

#generate 
final_population$area<-final_population%>%
  st_area()

st_crs(final_population)
st_crs(districts1)
#segmenting block groups by district, convex hull boundaries, and boundary circles. 
district_fragments <- st_intersection(final_population, districts1)
chull_fragments <- st_intersection(final_population, districts_convex)
boundcirc_fragments <- st_intersection(final_population, bound_circle)

#calculating area weighted aggregation and re-grouping by district
district_fragments <- district_fragments%>%
                    mutate(
                    new_area = st_area(geometry),
                    aw = as.numeric(new_area / area),
                    aw_pop = aw * Total_POP,
                    aw_black = aw * Black_POP,
                    aw_blackpct = aw_pop * aw_black)

district_pop <- district_fragments %>%
  group_by(DISTRICT)%>%
  summarize(
    sumpop = sum(aw_pop),
    sumblack = sum(aw_black),
    black_pct = sumblack/sumpop,
    geom = st_union(geometry)
  )

#calculating area weighted aggregation and re-grouping by convex hull
chull_fragments <- chull_fragments%>%
                    mutate(
                    new_area = st_area(geometry),
                    aw = as.numeric(new_area / area),
                    aw_pop = aw * Total_POP,
                    aw_black = aw * Black_POP,
                    aw_blackpct = aw_pop * aw_black)

chull_pop <- chull_fragments %>%
  group_by(DISTRICT) %>%
  summarize(
    sumpop = sum(aw_pop),
    sumblack = sum(aw_black),
    black_pct = sumblack/sumpop,
    geom = st_union(geometry)
  )

##calculating area weighted aggregation and re-grouping by minimum bounding circle
boundcirc_fragments <- boundcirc_fragments%>%
                    mutate(
                    new_area = st_area(geometry),
                    aw = as.numeric(new_area / area),
                    aw_pop = aw * Total_POP,
                    aw_black = aw * Black_POP,
                    aw_blackpct = aw_pop * aw_black)

bound_pop <- boundcirc_fragments %>%
  group_by(DISTRICT)%>% 
  summarize(
    sumpop = sum(aw_pop),
    sumblack = sum(aw_black),
    black_pct = sumblack/sumpop,
    geom = st_union(geometry)
  )

Now, conducting the same style intersection -> area weighting -> reaggregation for precinct data, saving back the democratic vote share as a percentage.

precincts <- st_read(here("data", "raw", "public", "alabama", "districts.gpkg"), layer = "precincts20")
# 15 precincts hace geometry issues- thus, repair. 
precincts <- st_make_valid(precincts)%>%
  mutate(area= st_area(geom))

#segmenting block groups by district, convex hull boundaries, and boundary circles. 
district_fragments <- st_intersection(precincts, districts1)
chull_precincts <- st_intersection(precincts, districts_convex)
boundcirc_precincts <- st_intersection(precincts, bound_circle)

#calculating area weighted aggregation and re-grouping by district
district_fragments <- district_fragments%>%
                    mutate(
                    new_area = st_area(geom),
                    aw = as.numeric(new_area / area),
                    aw_total = aw * (G20PREDBID+G20PRERTRU+ G20PRELJOR + G20PREOWRI),
                    aw_total_democrat = aw * G20PREDBID,
                    aw_percent_dem = aw_total_democrat/aw_total)
                    

district_votes <- district_fragments %>%
  group_by(DISTRICT)%>%
  summarize(
    total_votes = sum(aw_total),
    total_dem = sum(aw_total_democrat),
    percent_dem = total_dem/total_votes)

#calculating area weighted aggregation and re-grouping by convex hull geometry
chull_precincts <- chull_precincts%>%
                    mutate(
                    new_area = st_area(geom),
                    aw = as.numeric(new_area / area),
                    aw_total = aw * (G20PREDBID+G20PRERTRU+ G20PRELJOR + G20PREOWRI),
                    aw_total_democrat = aw * G20PREDBID,
                    aw_percent_dem = aw_total_democrat/aw_total)
                    

chull_votes <- chull_precincts %>%
  group_by(DISTRICT)%>%
  summarize(
    total_votes = sum(aw_total),
    total_dem = sum(aw_total_democrat),
    percent_dem = total_dem/total_votes)

#calculating area weighted aggregation and re-grouping by minimum bounding circle geometry
boundcirc_precincts <- boundcirc_precincts%>%
                    mutate(
                    new_area = st_area(geom),
                    aw = as.numeric(new_area / area),
                    aw_total = aw * (G20PREDBID+G20PRERTRU+ G20PRELJOR + G20PREOWRI),
                    aw_total_democrat = aw * G20PREDBID,
                    aw_percent_dem = aw_total_democrat/aw_total)
                    

boundcirc_votes <- boundcirc_precincts %>%
  group_by(DISTRICT)%>%
  summarize(
    total_votes = sum(aw_total),
    total_dem = sum(aw_total_democrat),
    percent_dem = total_dem/total_votes)

Having created everything, I’m going to append data to a final table with attached geometry. I’ll conduct the final calculations in this table.

#creating the full table from derived data products
gerrymander_data<- tibble(districts1$DISTRICT,
                          compactness_summary$`Polsby_Popper`,
                          compactness_summary$`Convex_Hull`,
                          compactness_summary$`Minimum_Bounding_Circle`,
                          district_pop$black_pct,
                          chull_pop$black_pct,
                          bound_pop$black_pct,
                          district_votes$percent_dem,
                          chull_votes$percent_dem,
                          boundcirc_votes$percent_dem)

Final Calcualtions

To craft an index that combines the geographic compactness measurements with demographic information, I’m going to take the difference in the black population that lives within each district and subtract the percentage black population that lives within each district’s convex hull and minimum bounding circle. If the resulting number is negative it means that the district is whiter than the surrounding area; if it’s positive, it means that the district has a higher percentage of black residents than the surrounding area. A district that had a equally black population to the surrounding area (or had a geometry the meant that either it’s corresponding convex hull or minimum bounding circle was the exact size of the district itself) would have a score of 0.

#calculate the nerbonne metric
gerrymander_data<- gerrymander_data%>%
  mutate(Nerbonne_convex = district_pop$black_pct-chull_pop$black_pct,
         Nerbonne_minimum_bounding = `district_pop$black_pct`- bound_pop$black_pct)
#append geometry to districts 
gerrymander_data$Geometry<- districts$geom

#rename columns for readability
colnames(gerrymander_data)<- c("District", "Polsby-Popper", "Convex Hull", "Minimum Bounding Circle", "Percent Black Within District", "Percent Black Within CHull", "Percent Black Within MinBounds", "Percent Dem Votes in District", "Percent Dem Votes in CHull", "Percent Dem Votes in MinBounds", "Nerbonne CHull", "Nerbonne MinBounds", "Geometry")

Analysis

Given the relatively small number of districts (7) much of the analysis will be purely observational. This will include looking at and comparing metrics across districts from a table as well as mapping those metrics so they can be put in context of the actual district shape.

Results

Results are to be presented in a full table – gerrymander_data – that combines compactness metrics, voting information, percentage black residents, and the Nerbonne metric for each district. This data will still be associated with a geometry, so it’s able to be mapped.

Discussion

District geometry is to be compared to it’s compactness scores, which then can be interpreted through the lens of percentage black population. The effectivness of the gerrymander should be visible in the relative difference in Democratic votes in districts with low compactness (high Democratic percentage) and low compactness (less Democratic). The novel Nerbonne metric should reflect the relative difference in black voter percentage within and outside a district, which should be high in districts that have been specifically designed to condense traditionally Democratic voters in single districts.

Integrity Statement

This is the only preregistration for this research project.

Acknowledgements

This project is part of class-based undergraduate research, and as such does not have any funding sources.

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Discrete Geometry for Electoral Geography; Duchin and Tennor 2024.

Gerrymandering and Compactness; Implementation Flexibility and Abuse; Barnes and Solomon 2020.

Practical Application of District Compactness; Horn, Hampton and Vandenburg 1993.