The famous cases of research misconduct in psychology — Diederik Stapel, Marc Hauser, Karen Ruggiero and Brian Wansink — span outright data fabrication to borderline questionable research practices (QRPs), and together they exposed a discipline whose statistical culture made both fraud and self-deception unusually easy to commit and unusually slow to catch. Research misconduct is formally defined as fabrication, falsification or plagiarism (FFP) in proposing, performing or reviewing research, or in reporting results; psychology’s unique exposure came from a perfect storm of small samples, flexible statistics and a “publish or perish” incentive structure that the field’s own 2010s replication crisis later laid bare.
Unlike biomedical fraud, which usually involves a single fabricated dataset, psychology’s misconduct scandals repeatedly intersected with a discipline-wide reproducibility problem — meaning some of its “famous cases” are proven fraud, while others are unproven QRPs that were only distinguishable from fraud after the field built better detection tools.
What is research misconduct in psychology?
Research misconduct is fabrication, falsification, or plagiarism in proposing, conducting, reviewing, or reporting research — the “FFP” definition used by the US Office of Research Integrity and mirrored in the UK’s Concordat to Support Research Integrity. Psychology cases fall into this definition unevenly: some, like data fabrication, are unambiguous; others involve practices that were once normalised and only later reclassified as unacceptable.
This matters because psychology’s most cited scandals are not a single category. Some researchers invented entire datasets from nothing. Others manipulated real data through selective analysis choices that fell short of formal fabrication but still produced unreliable findings. Distinguishing the two is essential to understanding why the field’s scrutiny has been so intense and so prolonged.
Which cases define psychology’s misconduct history?
Four cases anchor most discussions of research misconduct in psychology. Each was detected differently, and each shaped a different part of the field’s subsequent reform.
- Diederik Stapel (Tilburg University, social psychology): admitted in 2011 to fabricating or manipulating data across dozens of studies, ultimately resulting in 58 retractions — the largest fabrication case in the field’s history, uncovered after junior colleagues reported inconsistencies in datasets that appeared too clean to be real.
- Marc Hauser (Harvard University, cognitive science): found responsible for eight counts of scientific misconduct by a Harvard investigation in 2010, involving fabricated and falsified data on primate cognition; he resigned in 2011 after a paper in Cognition was retracted.
- Karen Ruggiero (Harvard University, social psychology): admitted to fabricating data in five discrimination-related experiments; failed replication attempts triggered the discovery, and in 2001 the US Public Health Service imposed a five-year federal funding ban.
- Brian Wansink (Cornell Food and Brand Lab): a 2018 Cornell investigation found no evidence of data fabrication but confirmed misreported data, flawed statistics, and inappropriate authorship practices; six of his papers were retracted by JAMA network journals on a single day in September 2018, part of an eventual total exceeding a dozen retractions.
The Wansink case is the pivot point for understanding why psychology’s scrutiny differs from other fields: it was not fraud in the FFP sense, yet it did more to popularise the term “questionable research practices” than any fabrication case before it.
Why did the replication crisis intersect so heavily with misconduct?
Psychology’s misconduct scandals broke at almost the same moment as its reproducibility crisis, and the two fed each other. The Open Science Collaboration’s 2015 Reproducibility Project, published in Science, attempted to replicate 100 published psychology studies and found that only around 36% produced statistically significant results matching the original direction — a figure that made the entire discipline’s evidentiary base look fragile, not just the work of a few fraudsters.
That fragility had identifiable causes that predate any individual scandal:
- Small sample sizes increased the odds that a false-positive result would look statistically significant and be published.
- P-hacking — running multiple analyses until one crosses the p<0.05 threshold — was shown by Simmons, Nelson and Simonsohn’s influential 2011 “false-positive psychology” paper to make almost any hypothesis appear supported.
- HARKing (hypothesising after results are known) let researchers present exploratory findings as if they had been predicted in advance.
- Publication bias rewarded novel, positive results and left null findings in the file drawer, distorting the published record even without any individual acting in bad faith.
Daryl Bem’s 2011 “Feeling the Future” precognition study, published in the Journal of Personality and Social Psychology using entirely conventional statistical methods, is often cited as the moment the field realised its standard toolkit could produce an implausible result — arriving in the same period Stapel’s fraud was exposed. The coincidence of timing meant fabrication and questionable-but-legal statistics were investigated side by side, and the public struggled to separate the two.
Fabrication vs questionable research practices: where is the line?
The distinction between outright fabrication and QRPs is the single most misunderstood part of psychology’s misconduct history, and it explains why some “famous cases” ended careers while others prompted only policy reform.
| Case | Confirmed misconduct type | Detection method | Institutional outcome |
|---|---|---|---|
| Diederik Stapel | Fabrication (FFP) | Colleague-reported data inconsistencies | 58 retractions; resigned 2011 |
| Marc Hauser | Fabrication/falsification (FFP) | Internal Harvard investigation | 8 misconduct counts; resigned 2011 |
| Karen Ruggiero | Fabrication (FFP) | Failed independent replication | 5-year federal funding ban (2001) |
| Brian Wansink | Questionable research practices, not FFP | Journalist and blogger scrutiny of published p-values | 13+ retractions; resigned 2018 |
The Stapel, Hauser and Ruggiero cases were confirmed FFP violations following formal investigations. Wansink’s case is different in kind: Cornell’s inquiry explicitly did not find fabricated data, yet the scale of statistical and reporting problems was severe enough to end his career and trigger a wave of scrutiny of “p-hacked” nutrition and consumer-behaviour research across the field.
Common questions about psychology’s misconduct cases
What are some examples of research misconduct?
Research misconduct includes fabrication (inventing data), falsification (altering real data or results), and plagiarism. In psychology, documented examples include Diederik Stapel’s fabricated datasets across 58 retracted papers and Karen Ruggiero’s invented discrimination-study data, both confirmed by formal institutional investigations.
What are the five unethical practices most associated with research misconduct?
Commonly cited unethical practices are fabrication of data, failure to credit others, plagiarism, undisclosed conflicts of interest, and biased design or interpretation. Psychology’s scandals add a sixth practical concern: undisclosed post hoc statistical manipulation, which sits just outside formal misconduct definitions but produces comparably unreliable findings.
Is the “most famous case study in psychology” the same as a misconduct case?
No — famous case studies (Little Albert, the Stanford Prison Experiment) are ethically debated research designs, not confirmed fraud. Misconduct cases like Stapel’s involve proven fabricated data, whereas case-study controversies typically involve consent, coercion, or methodological criticism rather than invented results.
What changed, and what it means for research integrity now
Psychology’s response to its misconduct-and-replication double crisis has been more structural than punitive. The Center for Open Science’s Transparency and Openness Promotion (TOP) Guidelines, introduced in 2015, have been adopted by more than 1,000 journals and push researchers toward preregistration, open data, and open materials as standard practice rather than optional virtue.
For research administrators and institutions, the practical lesson is attribution, not just detection. Multi-author fabrication cases are hard to unwind precisely because it is often unclear who ran the analysis, who collected the data, and who wrote the manuscript. Structured contributor taxonomies such as the CRediT contributor role taxonomy — originated by CASRAI in 2014 and now stewarded by NISO as ANSI/NISO Z39.104-2022 — give institutions a documented record of who performed formal analysis, data curation, and investigation roles on a paper, which is exactly the information gap that slowed the Stapel and Hauser investigations.
Psychology’s misconduct history is not over, but it is better instrumented than it was in 2011. Preregistration, statistical detection tools, and clearer contributor accountability mean the next fabrication case is more likely to be caught earlier — and more likely to be correctly distinguished from a legitimate but flawed questionable research practice.