Is GIScience Reproducible?
As the scientific community grows and publication rates increase, it’s paramount for the applicability and trustworthiness of this increasing amount of data to scale with the sheer volume of new information being presented yearly, irrespective of the discipline. In tandem with the wider accessibility of the sciences worldwide, this has resulted in the number of scientific papers published annually more than tripling since 2000. This trend is especially true for geography, a discipline that could be said to be having somewhat of a ‘kid in a candy store’ boom of research potential with exponential increases in recorded image volume every year. For even just two satellites- ESA’s Sentinel 1 and 2- downlinked data volume is over 3 petabytes annually. If printed, this data would be enough to fill 1.5 trillion sheets of 8.5 x 11 paper. As the sheer amount of data and its accessibility increases year-to-year, so too do the opportunities for scientific breakthroughs that warrant dissemination through publication.
As the number of papers published annually has increased, so have concerns that many of the findings presented within are not sufficiently reproducible. This lack of reproducibility goes against one of the central tenets of the Scientific Method; the communality of science allows for work that builds off of the findings of others, trusting in the validity of their work to make further conclusions. This field-wide lack of reproducibility has been frequently publicized in the fields of psychology and medicine as a replicability crisis that threatens to call many things previously believed to be scientific fact into question. A lack of reproducibility can stem from various factors, including failure to share research data, withholding statistical analysis code, or inconsistencies in methodology that lead to others not being able to replicate your work. These lapses represent failures of the scientific process as it means that other researchers aren’t necessarily able to rely on the validity of your results.
GIScience, being a subdiscipline of geography focused mainly on solving problems through spatial data analysis, is especially well suited to both effectively take advantage of the scientific community’s newfound glut of spatial information and to pave the way for new standards of reproducibility. This could look like several things: sharing specific information on how data was acquired (so that others can get the same data), sharing code from data manipulation (so others can check for mistakes), and documenting your process of analysis (so that others can question what priors you may have introduced to the work). GIScience is well-positioned to do this for a couple of reasons. Geographic data sources are sometimes publicly available and unlike many other ‘hard’ sciences often don’t involve field data collection, something that can’t necessarily be repeated the same way twice. Additionally, computational geographic techniques allow researchers to make their entire workflow public, allowing for the exact duplication of work by others to verify results and learn from techniques used.
If this is so possible, are we doing it? Broadly speaking, no. When surveyed, more than half of geographers said they were either only sometimes or rarely/ever using open-source software to communicate their research findings. Similarly, more than 3/4 of the respondents said they only sometimes/rarely/ever attempt to share the code used in their research. While this usage rate appears poor, the motivation to share work is there; approximately 75% of respondents said that reproducibility was either very or somewhat important within their subfield. This begs the question: where is the disconnect? If researchers both know that reproducibility is important but still aren’t following through with implementing these practices in their own work, how can we get them to yes and ‘bake in’ the expectation of open-source research into the GIScience field? Likely the answers lie in incentivising the more diligent and time-consuming work that it takes to truly make a project open source and convincing researchers that it’s in their best interest to work together, along with giving researchers resources to learn the skills that are necessary to adequately share their work in a reproducible way.
I’ll be thinking more about this in the weeks ahead, so stay tuned for my definitive and all-encompassing findings (ha ha). I might even share my thought process (so open-source of me)!