This is a study of gerrymandering in Alabama. We will test different metrics of spatial compactness and diversity to assess their efficacy in predicting the representiveness of different voting districts. We will then extend the work of prior studies by calculating a representivness metric to combines social and geographic metrics of ‘fairness’.
Key words
: Political Representation, Gerrymeandering,
Alabama, Convex Hull, ElectionsSubject
: Social and Behavioral Sciences: Geography:
Geographic Information SciencesDate created
: 2025-02-17Date modified
: 2020-02-17Spatial Coverage
: Alabama (State)Spatial Resolution
: Census block groupsSpatial Reference System
: EPSG:4269 NAD 1983 Geographic
Coordinate SystemTemporal Coverage
: 2020-2023Temporal Resolution
: Decennial CensusAn original, exploratory study assessing the comparative findings of commonly used to quantify degreess of congressional district gerrymandering. We will also assess the usefulness of a new gerrymandering metric based on the convex hull of a congressional district and the representativeness inside the convex hull compared to the congressional district writ large.
Enumerate specific hypotheses to be tested or research questions to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.
# record all the packages you are using here
# this includes any calls to library(), require(),
# and double colons such as here::i_am()
packages <- c("tidyverse", "here", "sf", "tmap", "tidycensus", "lwgeom", "kableExtra")
Describe the data sources and variables to be used. Data sources may include plans for observing and recording primary data or descriptions of secondary data. For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.
Primary data sources for the study are to include census block groups, alabama congressional districts, and presidential voting totals from the 2020 election.
Each of the next subsections describes one data source.
Abstract
: Vector polygon geopackage layer of Census
tracts and demographic data.
Spatial Coverage
: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]
Spatial Resolution
: Census block groups
Spatial Reference System
: EPSG 4269 NAD 1983
geographic coordinate system
Temporal Coverage
: 2020 census
Temporal Resolution
: Single census survey
period
Lineage
: Downloaded from the U.S. Census APL “pl”
public law summary file using ‘tidycensus’ in R
Distribution
: US Census API
Constraints
: Public Domain data free for use and
redistribution.
Aquiring data using tidycensus in R
blockgroup_file <- here("data", "raw", "public", "block_groups.gpkg")
# if the data is already downloaded, just load it
# otherwise, query from the census and save
if(file.exists(blockgroup_file)){
blockgroups <- st_read(blockgroup_file, quiet = TRUE)
} else {
blockgroups <- get_decennial(geography = "block group",
sumfile = "pl",
table = "P3",
year = 2020,
state = "Alabama",
output = "wide",
geometry = TRUE,
keep_geo_vars = TRUE)
st_write(blockgroups, blockgroup_file)
}
Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
GEOID | ID Code | Code that uniquely identifies census tracts | Numeric | N/A | … | … | … |
P4_001N | Total population over 18 | Total population over 18 years old in the 2020 census, divided by block group | Numeric | Generally Accurate | … | … | … |
P4_006N | Total black population over 18 | Total black population over 18 years old in the 2020 census, divided by block group | Numeric | The US Census tends to overcount white populations and undercount those of minorities (US Census) | … | … | … |
P5_003N | Institutionalized population | Total institutionalized population in correctional facilities for adults during the 2020 census, 18 years or older divided by block group | Numeric | The US Census tends to overcount white populations and undercount those of minorities (US Census) | … | … | … |
Abstract
: Voting data by precinctSpatial Coverage
: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]Spatial Resolution
: Voting PrecinctsSpatial Reference System
: EPSG 4269 NAD 1983 Geographic
Coordinate SystemTemporal Coverage
: One YearTemporal Resolution
: 2020Lineage
: Downloaded as a sgpkg. Prior processing
information is avalible in al_vest_20_validation_report.pdf and
readme_al_vest_20.txtDistribution
: Publically avalible at the Redistricting
Hub website with free login.Constraints
: Permitted for noncommercial and
nonpartisan use only, as per original data access agreement. Copyright
information found in redistrictingdatahub_legal.txtData Quality
: CompleteLabel | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
VTDST20 | District ID | Voting District ID | Numeric | … | … | … | … |
GEOID20 | Location | Unique Geographic ID | Coordinate | … | … | … | … |
G20PRETRU | Republican Voters | Total votes for Donald Trump in 2020 | Numeric | … | … | … | … |
G20PREBID | Democratic Voters | Total votes for Joe Biden in 2020 | Numeric | … | … | … | … |
precincts <- st_read(here("data", "raw", "public", "alabama", "districts.gpkg"), layer = "precincts20", quiet = TRUE)
# 15 precincts have geometry issues- thus, repair.
precincts <- st_make_valid(precincts)%>%
mutate(area= st_area(geom))
precincts<- precincts%>%
mutate(vote_swing= G20PREDBID-G20PRERTRU)
Here’s the precinct data colored by vote swing- positive values are more Democratic while negative values are more Republican.
Abstract
: Spatial bounds and characteristics of U.S.
Congressional districts in AlabamaSpatial Coverage
: Alabama (State). OSM link: [https://www.openstreetmap.org/relation/161950]Spatial Resolution
: U.S. Congressional DistrictsSpatial Reference System
: EPSG 3857 WGS 1984 Web
Mercator ProjectionTemporal Coverage
: Districts approved in 2023 for use
in the 2024 elections.Temporal Resolution
: N/ALineage
: Loaded into QGIS as ArcGIS feature service
layer and saved in geopackage format. Etraneous data fields were removed
and the FIX GEOMETRIES tool was used to correect geometry errors.Distribution
: Avalible from the Alabama State GIS via
ESRI feature serviceConstraints
: Public Domain data free for use and
redistribution.Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
DISTRICT | District Number | U.S. Congressional District Number | Numeric | N/A | N/A | N/A | N/A |
POPULATION | Population | Number of people residing in each congressional district (2020 census) | Numeric | Generally accurate on a full-population scale | … | … | … |
WHITE | Number of white residents | Total number of white residents (2020 census) | Numeric | The US Census tends to overcount white populations and undercount those of minorities (US Census) | … | … | … |
BLACK | Number of black residents | Total number of black residents (US Census) | Numeric | The US Census tends to overcount white populations and undercount those of minorities (US Census) | … | … | … |
#mapping the outputted districts
districts <- st_read(here("data", "raw", "public", "alabama", "districts.gpkg"), layer = "districts23", quiet = TRUE)
Mapped 2023 Districts:
At the time of this study pre-registration, the authors had very little prior knowledge of the geography of the study region with regards to the potential gerrymandering congressional districts. The study authors have some prior knowledge of the racial distribution of populations in the state as they pertain to historical settlement (oftentimes involuntary) patterns.
For each secondary source, declare the extent to which authors had already engaged with the data:
Because primary data is not being incorporated in this study, potential sources of bias are limited. The data utilized in this study is generally considered reputable (census, voting totals), although at larger scales the 2020 census has been seen to systematically undercount minorities, a trend that may impact the racial distribution section of this study by not accuratly giving a measure of the relative diversity of different block groups. Because it’s difficult to know how this systemic undercounting might effect areas differently, I will not attempt to make any corrections for it.
Transform the Census coordinate systen to match that of the districts and precincts layer
blockgroups<-blockgroups%>%
st_transform(crs = 3857)
The Census makes it tricky to pull the ‘black’ population data because of the plethora of different combinations of race designations that respondents can use to describe their racial identity. For example, someone who responds that they are both Hispanic AND Black, they will have a different designation than someone who responds as only black. For this study, we’re going to consider the hispanic and black individual black, so that designation’s population total will need to be added to the overall black population total.
To gather this data, I’ll gather a list of all the race designations that have the work “Black” listed somewhere in the name.
pulled_metadata <- load_variables(2020, "pl")
black_vars <- pulled_metadata |>
dplyr::filter(str_detect(name, "P3"), #P3 are population columns that include race designations
str_detect(label, "Black")) |> #pulls only the data where there label column includes 'Black'
select(-concept) #excludes the descriptor label column
Next, I’ll use this list to aggregate population data from the columns that are included in the ‘black_vars’ list.
blockgroups2<-blockgroups%>%
mutate(BlackPopulation = rowSums(across(all_of(black_vars$name))))
final_population <- blockgroups2 %>%
mutate(
Total_POP = P3_001N,
Black_POP = BlackPopulation,
Black_Percentage = BlackPopulation / P3_001N
) %>%
select(GEOID, Total_POP, Black_POP, Black_Percentage)
This code chunk will output a table named ‘final_population’ with four columns- their names and descriptors are below. Total_POP: Total population in each census block Black_POP: Total black population in each census block Black_Percentage: The percentage of each census block that at minimum partially identifies as black
Below is the mapped Black_Percentage by block group, with district borders overlain. A pattern is already emerging, with majority-black districts lumped into several districts. Feel free to explore it for yourself: