These scholars asked Claudine Gay for her data. She said no.
''We were, however, unable to scrutinize Gay’s results because she would not release her dataset to us (personal communication with Claudine Gay, 2002).''
This is a The Dossier contributor post from
, a contributing editor at The American Conservative who also writes an online newsletter at karlstack.substack.com.Yesterday on X, new questions arose about Claudine Gay’s data, from Jonatan Pallesen:
If we look at Claudine Gay's 2001 paper, there are some numbers that raises questions. This has been discussed on Econjobrumors. The thesis is that Black representatives make White people vote less. Looking at the the White Turnout in Clay's district it seems to be about middle of the pack:
But if we look at the regression results in Table 3, Bill Clay is listed as having a highly significant effect of a 16.8% reduction in voter turnout:
I took a look at her PhD thesis which looked at the same data. And here the coefficient is as expected for a data point in the middle of the pack:
I checked that her thesis and the paper have the same number of data points, 2827, so it is looking at the same thing. So why did the result change so much between the thesis and the paper? A possibility is that is it caused by a change in control variables. But if the control variables are this crucial, then it is important whether they really should be included, or whether we risk a garden of forking paths. An impactful variable is the winning vote margin. But it has totally different impact in the different states. It seems kind of weird if people in Missouri strongly prefer to vote if the election is close, while the people in Tennessee strongly prefer to vote if the election is not close. I couldn't find any discussion about why this control variable was introduced, when it wasn't included originally. In any case Claudine Gay should release her data. Especially since there is such discrepancy between her two publicized results.
Making the data public should be standard scientific practice to begin with.
Certainly worth looking into, thanks Jonatan — go follow him on X!
So, where is her data?
Well, it turns out that I am not the first one who has searched for it.
She was granted tenure at Stanford in 2005 with just 4 peer-reviewed political science articles to her name: 1998 PolPsych, 2001 APSR, 2002 AJPS, 2004 APSR.
Her 2001 APSR paper, which was central to her already-meager tenure case, was debunked by Michael C. Herron, the Remsen 1943 Professor of Quantitative Social Science at Dartmouth, and Kenneth W. Shotts, the David S. and Ann M. Barlow Professor of Political Economy at Stanford Graduate School of Business, in a paper presented at the 2002 conference of The Society for Political Methodology (PolMeth) after she would not share her data or code with them.
We were, however, unable to scrutinize Gay’s results because she would not release her dataset to us (personal communication with Claudine Gay, 2002).
Consider Gay’s (2001) EI–R analysis of the precinct-level socioeconomic covariates that affect black and white turnout. […] For Gay’s Michigan and Pennsylvania EI–R analyses to be logically consistent, it must be true that knowledge of a precinct’s percent black (Xi) tells us nothing about the precinct’s per capita income (an element of Gay’s Zi). This is untenable in light of contemporary American social realities: if a precinct has a large African-American population, then all things equal this precinct will have a relatively low per capita income. Nonetheless, without assuming that a precinct’s per capita income is not a function of its racial composition, and without making a host of similarly implausible assumptions for the other right hand side variables in her second stage regressions, Gay’s use of EI–R is logically inconsistent.
— the footnotes of a 2002 conference paper titled 'Logical Inconsistency in King-based Ecological Regressions.'
This takedown of Gay’s work was removed from the final version of Herron and Shotts’ paper (it was replaced with a ''hypothetical example'' of flawed work), and any mention of the conference paper was scrubbed from PolMeth website, where this paper was presented.
This working paper with the damning footnote is not available anywhere on the internet, but I exclusively managed to get my hands on it:
The program for every PolMeth conference from 1984-2021 is available for download on the PolMeth website, except for 2002, indicating the missing year of 2002 was deliberately removed from this span, seemingly to protect Gay. This is especially true because there was no internet in e.g. 1984, so there is clearly a repository of old programs somewhere that was uploaded to the website at one point, from which 2002 is conspicuously missing. Perhaps 2002 was the sole year lost in the shuffle during this span.
I reached out to The Society for Political Methodology for comment, and they told me: ''Our website was developed several years ago.''
It’s worth noting that PolMeth leadership rotates every year, so if there was a coverup a decade or two ago, the current PolMeth leadership wouldn’t even be aware.
As Stephen McIntyre points out, ‘‘Ironically, Gary King, [Claudine Gay’s advisor and mentor], was a zealous advocate of authors providing data in replicable form. The "replication standard" advocated by Gary King began with the replication data set, which needed to be publicly available and cited in the original publication. King urged that adherence to replication standard be considered by tenure and promotion committees.’’:
Claudine Gay’s mentor should be ashamed of her refusal to share her data.
What is she hiding?
It is now incumbent for institutions like Harvard, Congress (who already launched a probe into her plagiarism), and the academic community to encourage the sharing of Claudine's data to uphold the principles of openness and accountability in research. If the data is unproblematic, sharing it should not pose any issues.