Contributors

Abstract

This is a study of gerrymandering in Alabama. We will test different metrics of spatial compactness and diversity to assess their efficacy in predicting the representiveness of different voting districts. We will then extend the work of prior studies by calculating a representivness metric to combines social and geographic metrics of ‘fairness’.

Study Metadata

Study design

An original, exploratory study assessing the comparative findings of commonly used to quantify degreess of congressional district gerrymandering. We will also assess the usefulness of a new gerrymandering metric based on the convex hull of a congressional district and the representativeness inside the convex hull compared to the congressional district writ large.

Enumerate specific hypotheses to be tested or research questions to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.

Materials and procedure

Computational environment

# record all the packages you are using here
# this includes any calls to library(), require(),
# and double colons such as here::i_am()
packages <- c("tidyverse", "here", "sf", "tmap", "tidycensus", "lwgeom", "kableExtra")

Data and variables

Describe the data sources and variables to be used. Data sources may include plans for observing and recording primary data or descriptions of secondary data. For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.

Primary data sources for the study are to include census block groups, alabama congressional districts, and presidential voting totals from the 2020 election.

Each of the next subsections describes one data source.

Alabama Census Block Groups

  • Abstract: Vector polygon geopackage layer of Census tracts and demographic data.

  • Spatial Coverage: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]

  • Spatial Resolution: Census block groups

  • Spatial Reference System: EPSG 4269 NAD 1983 geographic coordinate system

  • Temporal Coverage: 2020 census

  • Temporal Resolution: Single census survey period

  • Lineage: Downloaded from the U.S. Census APL “pl” public law summary file using ‘tidycensus’ in R

  • Distribution: US Census API

  • Constraints: Public Domain data free for use and redistribution.

Aquiring data using tidycensus in R

blockgroup_file <- here("data", "raw", "public", "block_groups.gpkg")

# if the data is already downloaded, just load it
# otherwise, query from the census and save
if(file.exists(blockgroup_file)){
  blockgroups <- st_read(blockgroup_file, quiet = TRUE)
} else {
  blockgroups <- get_decennial(geography = "block group",
                               sumfile = "pl",
                               table = "P3",
                               year = 2020,
                               state = "Alabama",
                               output = "wide",
                               geometry = TRUE,
                               keep_geo_vars = TRUE)
  st_write(blockgroups, blockgroup_file)
}
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
GEOID ID Code Code that uniquely identifies census tracts Numeric N/A
P4_001N Total population over 18 Total population over 18 years old in the 2020 census, divided by block group Numeric Generally Accurate
P4_006N Total black population over 18 Total black population over 18 years old in the 2020 census, divided by block group Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)
P5_003N Institutionalized population Total institutionalized population in correctional facilities for adults during the 2020 census, 18 years or older divided by block group Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)

Voting Precincts from 2020 Presidential Election

  • Abstract: Voting data by precinct
  • Spatial Coverage: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]
  • Spatial Resolution: Voting Precincts
  • Spatial Reference System: EPSG 4269 NAD 1983 Geographic Coordinate System
  • Temporal Coverage: One Year
  • Temporal Resolution: 2020
  • Lineage: Downloaded as a sgpkg. Prior processing information is avalible in al_vest_20_validation_report.pdf and readme_al_vest_20.txt
  • Distribution: Publically avalible at the Redistricting Hub website with free login.
  • Constraints: Permitted for noncommercial and nonpartisan use only, as per original data access agreement. Copyright information found in redistrictingdatahub_legal.txt
  • Data Quality: Complete
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
VTDST20 District ID Voting District ID Numeric
GEOID20 Location Unique Geographic ID Coordinate
G20PRETRU Republican Voters Total votes for Donald Trump in 2020 Numeric
G20PREBID Democratic Voters Total votes for Joe Biden in 2020 Numeric
precincts <- st_read(here("data", "raw", "public", "alabama", "districts.gpkg"), layer = "precincts20", quiet = TRUE)
# 15 precincts have geometry issues- thus, repair. 
precincts <- st_make_valid(precincts)%>%
  mutate(area= st_area(geom))
precincts<- precincts%>%
  mutate(vote_swing= G20PREDBID-G20PRERTRU)

Here’s the precinct data colored by vote swing- positive values are more Democratic while negative values are more Republican.

districts23 Layer of districts.gpkg

  • Abstract: Spatial bounds and characteristics of U.S. Congressional districts in Alabama
  • Spatial Coverage: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]
  • Spatial Resolution: U.S. Congressional Districts
  • Spatial Reference System: EPSG 3857 WGS 1984 Web Mercator Projection
  • Temporal Coverage: Districts approved in 2023 for use in the 2024 elections.
  • Temporal Resolution: N/A
  • Lineage: Loaded into QGIS as ArcGIS feature service layer and saved in geopackage format. Etraneous data fields were removed and the FIX GEOMETRIES tool was used to correect geometry errors.
  • Distribution: Avalible from the Alabama State GIS via ESRI feature service
  • Constraints: Public Domain data free for use and redistribution.
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
DISTRICT District Number U.S. Congressional District Number Numeric N/A N/A N/A N/A
POPULATION Population Number of people residing in each congressional district (2020 census) Numeric Generally accurate on a full-population scale
WHITE Number of white residents Total number of white residents (2020 census) Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)
BLACK Number of black residents Total number of black residents (US Census) Numeric The US Census tends to overcount white populations and undercount those of minorities (US Census)
#mapping the outputted districts
districts <- st_read(here("data", "raw", "public", "alabama", "districts.gpkg"), layer = "districts23", quiet = TRUE)

Mapped 2023 Districts:

Prior observations/bias

At the time of this study pre-registration, the authors had very little prior knowledge of the geography of the study region with regards to the potential gerrymandering congressional districts. The study authors have some prior knowledge of the racial distribution of populations in the state as they pertain to historical settlement (oftentimes involuntary) patterns.

For each secondary source, declare the extent to which authors had already engaged with the data:

Alabama Census Block Groups

2020 Presidential Election Voting Precincts

Districts23 layer of districts.gpkg

Bias and threats to validity

Because primary data is not being incorporated in this study, potential sources of bias are limited. The data utilized in this study is generally considered reputable (census, voting totals), although at larger scales the 2020 census has been seen to systematically undercount minorities, a trend that may impact the racial distribution section of this study by not accuratly giving a measure of the relative diversity of different block groups. Because it’s difficult to know how this systemic undercounting might effect areas differently, I will not attempt to make any corrections for it.

Data transformations

Coordinate Transformation

Transform the Census coordinate systen to match that of the districts and precincts layer

blockgroups<-blockgroups%>%
  st_transform(crs = 3857)

Calculate the total Black/African American population in each block group

The Census makes it tricky to pull the ‘black’ population data because of the plethora of different combinations of race designations that respondents can use to describe their racial identity. For example, someone who responds that they are both Hispanic AND Black, they will have a different designation than someone who responds as only black. For this study, we’re going to consider the hispanic and black individual black, so that designation’s population total will need to be added to the overall black population total.

To gather this data, I’ll gather a list of all the race designations that have the work “Black” listed somewhere in the name.

pulled_metadata <- load_variables(2020, "pl")
black_vars <- pulled_metadata |> 
  dplyr::filter(str_detect(name, "P3"), #P3 are population columns that include race designations
                str_detect(label, "Black")) |> #pulls only the data where there label column includes 'Black'
  select(-concept) #excludes the descriptor label column

Next, I’ll use this list to aggregate population data from the columns that are included in the ‘black_vars’ list.

blockgroups2<-blockgroups%>%
  mutate(BlackPopulation = rowSums(across(all_of(black_vars$name))))

final_population <- blockgroups2 %>%
  mutate(
    Total_POP = P3_001N,
    Black_POP = BlackPopulation,
    Black_Percentage = BlackPopulation / P3_001N
  ) %>%
  select(GEOID, Total_POP, Black_POP, Black_Percentage)

This code chunk will output a table named ‘final_population’ with four columns- their names and descriptors are below. Total_POP: Total population in each census block Black_POP: Total black population in each census block Black_Percentage: The percentage of each census block that at minimum partially identifies as black

Below is the mapped Black_Percentage by block group, with district borders overlain. A pattern is already emerging, with majority-black districts lumped into several districts. Feel free to explore it for yourself:

Analysis

With our census data cleaned and black population numbers gathered, I’ll now seek to quantify spatial gerrymandering in the district organization by calculating different metrics of spatial compactness before diving into how to quantify the difference in racial makeup within vs. outside each district boundary.

Calculating Compactness Metrics

I’ll calculate three separate compactness metrics that all are in the form of area vs perimeter (Polsby-Popper metric), convex hull area, and minimum bounding circle area. To do this, I need the initial area of the districts, along with the areas of their associated products (convex hull, min bounding circle)

#sf_use_s2 is set to FALSE to calculate ellipsoidal area instead of spherical for more accurate area calculation. 
sf_use_s2(FALSE)

#calculate area/perimeter metric (polsby-popper)
districts1 <- mutate(districts, 
                     districts_area = st_area(geom),
                     districts_perim = st_length(st_cast(st_cast(geom, "MULTIPOLYGON"), "MULTILINESTRING")),
                     polsby_popper = round(
                       as.numeric(
                         (4 * pi * districts_area) / districts_perim^2),
                       2))%>%
  select(DISTRICT, districts_area, districts_perim,polsby_popper)

#create a separate layer, districts_convex, that saves creates a convex hull for each district
districts_convex <- districts1%>%
  st_convex_hull()
districts_convex <- districts_convex %>% 
  mutate(hullarea = st_area(geom),
         compact_hull = round(as.numeric(districts_area / hullarea), 2))

#create a third layer, bound_circle, that will store the minimum bounding circle metric
bound_circle<- districts1%>%
  st_minimum_bounding_circle()
bound_circle <- bound_circle %>%
  mutate(bound_circle,
         mbcarea = st_area(geom),
         compact_circ = round(as.numeric(districts_area / mbcarea), 2))

#combine all into a table
compactness_summary<- tibble(districts$DISTRICT, districts1$polsby_popper, districts_convex$compact_hull, bound_circle$compact_circ, districts$geom)
#change the column names
colnames(compactness_summary)<- c("District", "Polsby_Popper", "Convex_Hull", "Minimum_Bounding_Circle", "geom")

This gives us the compactness scores from each of our three metrics in the same df, ready to be mapped. We also still retain the three separate geometries used to calculate area for the compactness metrics, which will be potentially useful down the road. Here’s the resulting table:

District Polsby-Popper CM Convex Hull CM Minimum Bounding Circle CM
1 0.15 0.59 0.19
2 0.14 0.60 0.20
3 0.35 0.81 0.47
4 0.20 0.61 0.32
5 0.40 0.91 0.32
6 0.20 0.71 0.46
7 0.21 0.71 0.47

Race

To gather the population race breakdown for each district, we’ll employ our block group data that we cleaned earlier. The block group data needs to be split by district, convex hull, and minimum bounding circle so that we can get an accurate measure of population- to do this, it also needs to undergo area weighted reaggregation to properly apportion population on either side of a split. This introduces more error into our calculations, as people don’t evenly distribute across space, but it’s a acceptable amount of error given the small scale we’re working at.

#generate 
final_population$area<-final_population%>%
  st_area()
final_population<- st_transform(final_population, 4269)

#segmenting block groups by district, convex hull boundaries, and boundary circles. 
district_fragments <- st_intersection(final_population, districts1)
chull_fragments <- st_intersection(final_population, districts_convex)
boundcirc_fragments <- st_intersection(final_population, bound_circle)

#calculating area weighted aggregation and re-grouping by district
district_fragments <- district_fragments%>%
                    mutate(
                    new_area = st_area(geometry),
                    aw = as.numeric(new_area / area),
                    aw_pop = aw * Total_POP,
                    aw_black = aw * Black_POP,
                    aw_blackpct = aw_pop * aw_black)

district_pop <- district_fragments %>%
  group_by(DISTRICT)%>%
  summarize(
    sumpop = sum(aw_pop),
    sumblack = sum(aw_black),
    black_pct = sumblack/sumpop,
    geom = st_union(geometry)
  )

#calculating area weighted aggregation and re-grouping by convex hull
chull_fragments <- chull_fragments%>%
                    mutate(
                    new_area = st_area(geometry),
                    aw = as.numeric(new_area / area),
                    aw_pop = aw * Total_POP,
                    aw_black = aw * Black_POP,
                    aw_blackpct = aw_pop * aw_black)

chull_pop <- chull_fragments %>%
  group_by(DISTRICT) %>%
  summarize(
    sumpop = sum(aw_pop),
    sumblack = sum(aw_black),
    black_pct = sumblack/sumpop,
    geom = st_union(geometry)
  )

##calculating area weighted aggregation and re-grouping by minimum bounding circle
boundcirc_fragments <- boundcirc_fragments%>%
                    mutate(
                    new_area = st_area(geometry),
                    aw = as.numeric(new_area / area),
                    aw_pop = aw * Total_POP,
                    aw_black = aw * Black_POP,
                    aw_blackpct = aw_pop * aw_black)

bound_pop <- boundcirc_fragments %>%
  group_by(DISTRICT)%>% 
  summarize(
    sumpop = sum(aw_pop),
    sumblack = sum(aw_black),
    black_pct = sumblack/sumpop,
    geom = st_union(geometry)
  )

Now, conducting the same style intersection -> area weighting -> re-aggregation for precinct data, and saving back the democratic vote share as a percentage.

#segmenting block groups by district, convex hull boundaries, and boundary circles. 
district_fragments <- st_intersection(precincts, districts1)
chull_precincts <- st_intersection(precincts, districts_convex)
boundcirc_precincts <- st_intersection(precincts, bound_circle)

#calculating area weighted aggregation and re-grouping by district
district_fragments <- district_fragments%>%
                    mutate(
                    new_area = st_area(geom),
                    aw = as.numeric(new_area / area),
                    aw_total = aw * (G20PREDBID+G20PRERTRU+ G20PRELJOR + G20PREOWRI),
                    aw_total_democrat = aw * G20PREDBID,
                    aw_percent_dem = aw_total_democrat/aw_total)
                    

district_votes <- district_fragments %>%
  group_by(DISTRICT)%>%
  summarize(
    total_votes = sum(aw_total),
    total_dem = sum(aw_total_democrat),
    percent_dem = total_dem/total_votes)

#calculating area weighted aggregation and re-grouping by convex hull geometry
chull_precincts <- chull_precincts%>%
                    mutate(
                    new_area = st_area(geom),
                    aw = as.numeric(new_area / area),
                    aw_total = aw * (G20PREDBID+G20PRERTRU+ G20PRELJOR + G20PREOWRI),
                    aw_total_democrat = aw * G20PREDBID,
                    aw_percent_dem = aw_total_democrat/aw_total)
                    

chull_votes <- chull_precincts %>%
  group_by(DISTRICT)%>%
  summarize(
    total_votes = sum(aw_total),
    total_dem = sum(aw_total_democrat),
    percent_dem = total_dem/total_votes)

#calculating area weighted aggregation and re-grouping by minimum bounding circle geometry
boundcirc_precincts <- boundcirc_precincts%>%
                    mutate(
                    new_area = st_area(geom),
                    aw = as.numeric(new_area / area),
                    aw_total = aw * (G20PREDBID+G20PRERTRU+ G20PRELJOR + G20PREOWRI),
                    aw_total_democrat = aw * G20PREDBID,
                    aw_percent_dem = aw_total_democrat/aw_total)
                    

boundcirc_votes <- boundcirc_precincts %>%
  group_by(DISTRICT)%>%
  summarize(
    total_votes = sum(aw_total),
    total_dem = sum(aw_total_democrat),
    percent_dem = total_dem/total_votes)

Having created everything, I’m going to append data to a final table with attached geometry. I’ll conduct the final calculations in this table.

#creating the full table from derived data products
gerrymander_data<- tibble(districts1$DISTRICT,
                          compactness_summary$`Polsby_Popper`,
                          compactness_summary$`Convex_Hull`,
                          compactness_summary$`Minimum_Bounding_Circle`,
                          district_pop$black_pct,
                          chull_pop$black_pct,
                          bound_pop$black_pct,
                          district_votes$percent_dem,
                          chull_votes$percent_dem,
                          boundcirc_votes$percent_dem)

Final Calcualtions

To craft an index that combines the geographic compactness measurements with demographic information, I’m going to take the absolute value of the difference in the black population that lives within each district and subtract the percentage black population that lives within each district’s convex hull and minimum bounding circle. The higher the number is, the greater difference there will be between the population within the district and within either the convex hull or min bounding circle.

#calculate the combined population and spatial distribution metric
gerrymander_data<- gerrymander_data%>%
  mutate(difference_convex = abs(district_pop$black_pct-chull_pop$black_pct),
         difference_minimum_bounding = abs(`district_pop$black_pct`- bound_pop$black_pct))
#append geometry to districts 
gerrymander_data$Geometry<- districts$geom

#rename columns for readability
colnames(gerrymander_data)<- c("District", "Polsby-Popper", "Convex Hull", "Minimum Bounding Circle", "Percent Black Within District", "Percent Black Within CHull", "Percent Black Within MinBounds", "Percent Dem Votes in District", "Percent Dem Votes in CHull", "Percent Dem Votes in MinBounds", "Difference between CHull and District", "Difference between MinBounds and District", "Geometry")

Results

Here’s the final table with all data present; it’s an eyeful.
District Polsby-Popper Convex Hull Minimum Bounding Circle Percent Black Within District Percent Black Within CHull Percent Black Within MinBounds Percent Dem Votes in District Percent Dem Votes in CHull Percent Dem Votes in MinBounds Difference between CHull and District Difference between MinBounds and District
1 0.15 0.59 0.19 0.1618084 0.2401238 0.3220640 0.2483075 0.3201917 0.3940843 0.0783154 0.1602556
2 0.14 0.60 0.20 0.4854015 0.4197871 0.3159340 0.5535809 0.4821736 0.4128305 0.0656144 0.1694675
3 0.35 0.81 0.47 0.2075454 0.2018218 0.1920300 0.2891610 0.2862030 0.2763298 0.0057236 0.0155154
4 0.20 0.61 0.32 0.0764468 0.1775206 0.2146003 0.1890672 0.3090298 0.3442199 0.1010738 0.1381535
5 0.40 0.91 0.32 0.1831743 0.1765976 0.1423208 0.3547989 0.3461872 0.2793155 0.0065767 0.0408535
6 0.20 0.71 0.46 0.1753480 0.3135707 0.3359383 0.3021368 0.4361503 0.4397915 0.1382227 0.1605903
7 0.21 0.71 0.47 0.5187260 0.3695764 0.3564540 0.6394298 0.4963417 0.4686894 0.1491496 0.1622720

To test the correlation between the commonly used metrics of gerrymandering and our new metric, I’ll construct a correlation matrix. For simplicity, I’m only including the convex hull metric.

Difference between CHull and District Percent Black Within District Percent Black Within MinBounds Polsby-Popper Convex Hull Minimum Bounding Circle Percent Dem Votes in District
Difference between CHull and District 1.0000000 0.2630634 0.8003954 -0.6889201 -0.5454830 0.2304469 0.2720990
Percent Black Within District 0.2630634 1.0000000 0.5341085 -0.2805643 -0.1239143 0.0404106 0.9792100
Percent Black Within MinBounds 0.8003954 0.5341085 1.0000000 -0.8244382 -0.6492356 -0.0240672 0.4697419
Polsby-Popper -0.6889201 -0.2805643 -0.8244382 1.0000000 0.9557498 0.4197650 -0.1702736
Convex Hull -0.5454830 -0.1239143 -0.6492356 0.9557498 1.0000000 0.5039445 0.0110548
Minimum Bounding Circle 0.2304469 0.0404106 -0.0240672 0.4197650 0.5039445 1.0000000 0.1057330
Percent Dem Votes in District 0.2720990 0.9792100 0.4697419 -0.1702736 0.0110548 0.1057330 1.0000000

As you can see in the three panels above, our new metric has an interesting relationship to the commonly used gerrymandering metrics. While it has a slight positive relationship to the minimum bounding circle score, it has a strongly negative relationship to both the Convex Hull and Polsby-Popper metrics. This potentially points out flaws in the two metrics- despite having low gerrymandering scores by 2/3 of the metrics, we’re still seeing packing of either black or white voters as evidenced by the demographic difference inside vs. outside the district.

Discussion

The new metric comparing the interior with the exterior of a district does a good job at discovering packing of black/white populations to create strongly majority districts, even when several of the commonly used metrics give the redrawn districts a pass with low scores. This underscores the importance of including comparative demographics in any determination of the degree of gerrymandering in a statem, especially when racially concentrated areas are concerned.

One potential issue with this metric is it’s reliance on there being sufficient area outside of the district that’s still within either the district’s minimum bounding circle or convex hulls. This reliance could lead to a correlation between low scores in multiple metrics. We can see this in our analysis districts like AL 3 or 5 that score relatively well in compact hull metrics also score very neutrally in our difference metric because there’s very little area the falls within the district’s convex hull but isn’t in the district. What area there is is likely quite similar demographically the the district itself because it’s quite close to the district border. This is in contrast to districts like AL 7, which has a large area of no overlap that has a very different racial distribution, giving it a high difference score.

Integrity Statement

This is the only preregistration for this research project.

Acknowledgements

This project is part of class-based undergraduate research, and as such does not have any funding sources.

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)

References

Discrete Geometry for Electoral Geography; Duchin and Tennor 2024.

Gerrymandering and Compactness; Implementation Flexibility and Abuse; Barnes and Solomon 2020.

Practical Application of District Compactness; Horn, Hampton and Vandenburg 1993.