Tag: publication bias

  • The Replication Crisis in Psychology and the Open Science Response

    The replication crisis is the recognition that a substantial share of published findings, notably in psychology, fail to reproduce when independent teams repeat the studies. It prompted a wide-ranging reform movement built around transparency and pre-specified methods. Rather than discrediting the discipline, the crisis has driven psychology to strengthen the reliability of its evidence.

    The 2015 reproducibility project

    A landmark moment was the Open Science Collaboration’s Reproducibility Project: Psychology, published in 2015. A large network of researchers attempted to replicate 100 studies from leading psychology journals using high-powered designs. A considerable proportion of the original effects did not replicate, and where effects did appear they were on average markedly smaller than in the original reports. The result was a wake-up call: publication did not guarantee a finding was robust. Crucially, the project was itself a model of open practice—its protocols were shared, its analyses were transparent, and its data were made public—so its own conclusions could be scrutinised and re-examined by others. It demonstrated that large-scale, coordinated replication was feasible, and it gave the reform movement a concrete, quantified anchor rather than anecdote. Subsequent multi-lab projects in psychology and adjacent fields extended the approach, confirming that the pattern was systemic rather than confined to a handful of studies.

    What drives non-replication

    Several interacting causes are now well understood:

    Cause How it inflates false findings
    P-hacking Flexible analysis choices made until results cross significance, producing false positives
    Publication bias Journals favour positive, novel results, so null findings stay unpublished (the “file drawer”)
    Low statistical power Small samples yield unstable estimates and exaggerated effect sizes
    Researcher degrees of freedom Undisclosed choices in design and analysis enable selective reporting

    These pressures interact with weak measurement: instruments with poor reliability and validity add noise that low-powered studies are ill-equipped to handle.

    The Open Science response

    The reform agenda answers each cause directly. Preregistration records hypotheses and analysis plans before data are seen, separating confirmatory tests from exploratory ones and curbing p-hacking. Registered Reports go further: a journal peer-reviews the introduction and methods and grants in-principle acceptance before results exist, so publication no longer hinges on whether the result is positive—directly tackling publication bias. Data and materials sharing lets others reanalyse and reuse work, and adequately powered designs reduce false positives at source.

    The role of the Center for Open Science

    Much of this infrastructure is coordinated by the Center for Open Science, the non-profit behind the Open Science Framework, a platform for preregistration, data sharing and project management. By making transparent practice easy and rewarded—through badges, registries and tooling—it has helped shift norms across psychology and beyond. The movement aligns closely with CASRAI’s interest in reproducibility and clear research metadata.

    The difference between direct and conceptual replication

    Not all replications are the same, and the distinction matters for interpreting the crisis. A direct replication repeats the original method as closely as possible to test whether the same procedure yields the same result. A conceptual replication tests the same underlying idea using a different method or measure. Conceptual replications are valuable for generalisation, but they cannot substitute for direct ones: if a different method fails, it is ambiguous whether the original finding was false or the new method simply tapped a different construct. Part of what the reform movement restored was respect for direct replication, which had been undervalued by journals that prized novelty over verification.

    Beyond p-values: estimation and transparency

    A recurring theme is over-reliance on the binary question “is p below 0.05?”. A single significant p-value says little about how large or reliable an effect is, and the threshold is easy to cross by chance or by flexible analysis. Reformers therefore emphasise reporting effect sizes with confidence intervals, planning sample sizes in advance through power analysis, and distinguishing pre-specified confirmatory tests from exploratory ones. None of this forbids exploration; it simply asks researchers to label it honestly so readers can weight the evidence appropriately. These habits depend on sound measurement, since unreliable instruments undermine even a well-powered, preregistered design—linking the crisis back to reliability and validity.

    A cultural shift, not just a checklist

    The most durable change has been cultural. Open practices—sharing data, code and materials, posting preprints, and crediting replication work—are increasingly expected rather than exceptional, and funders and journals now reward them. Many psychology journals offer Registered Reports, and badges for open data and open materials have become common. The shift reframes transparency as a normal part of doing science well rather than an optional extra, and it has begun to spread to neighbouring fields facing similar pressures.

    What it means for everyday research practice

    The crisis has practical consequences for how studies are designed and read. Single, striking results deserve caution until replicated; effect sizes and confidence intervals matter more than a lone p-value; and vivid claims—the kind that circulate as popular psychology, such as strong readings of the Dunning-Kruger effect—warrant scrutiny against replication evidence. These habits sit alongside responsible assessment of the instruments a study relies upon.

    What the crisis does and does not imply

    It is important to state the limits of the lesson. A failed replication does not automatically prove the original effect is false; replications themselves can be underpowered, can differ subtly in method, or can be run on different populations. Equally, the crisis is not unique to psychology—medicine, economics and other empirical fields have confronted comparable problems—nor does it mean that nothing in psychology is true. Many core findings replicate robustly. The accurate reading is that the proportion of fragile results in the literature was higher than assumed, that publishing incentives rewarded surprising single studies over careful verification, and that the remedy is structural rather than a matter of individual blame. Framed this way, the crisis is a sign of a discipline maturing, not collapsing.

    Standards, terminology and authors

    Reproducibility also depends on mundane infrastructure: consistent terms, well-described methods and shareable metadata. Defining concepts in a controlled research dictionary reduces ambiguity across studies, and clear expectations for authors—preregister where possible, report all measures, share data—turn the lessons of the crisis into routine. The goal is not to publish less but to publish findings that hold up.

    Frequently asked questions

    What is the replication crisis?

    It is the finding that many published results, especially in psychology, do not reproduce when independent teams repeat the studies. It exposed weaknesses in research and publishing practices and sparked reform.

    What did the 2015 Open Science Collaboration project find?

    The Reproducibility Project: Psychology replicated 100 studies and found that a large proportion did not reproduce, with replicated effects typically smaller than the originals.

    What causes findings to fail replication?

    Key causes include p-hacking, publication bias against null results, low statistical power and undisclosed analytic flexibility, often compounded by measures with weak reliability and validity.

    What are preregistration and Registered Reports?

    Preregistration logs hypotheses and analysis plans before data collection. Registered Reports take this further, with journals accepting a study based on its methods before results are known, reducing publication bias.

  • Preregistration and Registered Reports Explained

    Preregistration is the practice of publicly specifying a study’s hypotheses, methods and analysis plan before any data are collected or examined. By fixing these decisions in advance and time-stamping them, preregistration draws a clear line between confirmatory tests planned ahead of time and exploratory analyses discovered along the way — a distinction that curbs questionable research practices and strengthens reproducibility. The plan is registered publicly so that it cannot be quietly revised once results are known, which is what gives the time stamp its force.

    The problem it addresses is well documented. When analysis choices are made after seeing the data, researchers can — often unconsciously — select the specification that yields a significant result, a practice known as p-hacking. Separately, studies with positive findings are more likely to be published than null results, producing publication bias that distorts the literature. Preregistration tackles the first; Registered Reports tackle both. The two practices grew out of the wider reproducibility movement, which found that a worrying share of published findings did not hold up when independent teams tried to repeat them — a problem driven in part by exactly these analytic and publication pressures. By making the research plan public and time-stamped before results exist, both practices restore a clear distinction between what was predicted and what was merely found.

    What preregistration involves

    A preregistration typically states the research question, the hypotheses, the sample size and stopping rule, the variables, and the precise analysis plan, lodged in a public registry with a time stamp. Templates and registries hosted on the Open Science Framework (OSF), maintained by the Center for Open Science, make this routine. Clinical trials have long used dedicated public registries for the same reason, and the practice has since spread across the social and life sciences. Because the plan is fixed, readers can verify that the reported confirmatory analysis is the one that was promised, and exploratory work is labelled as such rather than dressed up as a prediction. A good preregistration is specific enough that a third party could, in principle, run the planned analysis without further instruction.

    Confirmatory versus exploratory research

    The conceptual heart of preregistration is the distinction between confirmatory and exploratory research. Confirmatory research tests a specific, pre-stated hypothesis with a pre-specified analysis; its statistical guarantees — including the meaning of a p-value — depend on the analysis having been fixed in advance. Exploratory research, by contrast, searches the data for patterns and generates new hypotheses; it is valuable and necessary, but its findings are provisional and must be confirmed in fresh data. Problems arise when exploratory results are dressed up as confirmatory ones, lending them a false air of statistical rigour. Preregistration keeps the two honest by recording, with a time stamp, exactly which analyses were planned. Anything beyond that plan is legitimate exploration, simply labelled as such rather than presented as a prediction that came true.

    Registered Reports go further

    A Registered Report is a publication format in which the introduction, methods and analysis plan are peer-reviewed before data collection. If the question and design are judged sound, the journal grants in-principle acceptance — a commitment to publish the completed study regardless of whether the results are positive, negative or null, provided the authors follow the approved protocol. This decouples the publication decision from the outcome, directly attacking publication bias. A useful side effect is that reviewers can improve a study before it is run, when flaws can still be fixed, rather than critiquing an unchangeable design after the fact. This shifts peer review from gatekeeping to genuine quality improvement, and reduces the waste of running studies whose weaknesses only surface at submission.

    How each curbs bias

    Practice Reviewed before data? Mainly curbs
    Preregistration No (registry only) p-hacking, hidden analytic flexibility
    Registered Report Yes (stage-one peer review) p-hacking and publication bias

    The shared mechanism is timing: committing to decisions before outcomes are known removes the temptation, and the opportunity, to reshape a study around a desired result. This complements the rigour built into experimental designs such as the randomised controlled trial, where preregistered protocols make ITT and primary-outcome commitments verifiable.

    The two-stage Registered Report workflow

    What makes Registered Reports distinctive is their two-stage review. At stage one, reviewers evaluate the question’s importance and the soundness of the proposed methods and analysis before any data exist; sound proposals earn in-principle acceptance. At stage two, after the study is run, reviewers check that the authors followed the approved protocol and that conclusions match the registered plan — but they do not get to reject the paper simply because the results were null or unexciting. This sequencing is what severs the link between a study’s outcome and its publishability.

    Stage What is reviewed Decision
    Stage one Question, methods, analysis plan In-principle acceptance
    Data collection Conducted per approved protocol
    Stage two Adherence to plan, valid conclusions Publication regardless of result

    How they curb publication bias and p-hacking

    Publication bias arises when the literature over-represents positive findings because null results are harder to publish. By guaranteeing publication at stage one, Registered Reports ensure null and negative results enter the record, giving a more honest picture of the evidence. P-hacking — selecting the analysis that happens to reach significance — is curbed by both formats, because the analytic decisions are fixed and public before the data are seen. Together these mechanisms protect the integrity of confirmatory claims, much as the pre-specified primary outcomes of a randomised controlled trial protect its causal conclusions.

    Benefits and honest limits

    Preregistration improves transparency, makes exploratory work explicit and supports the reproducibility goals at the heart of the research lifecycle. It does not forbid exploration; it simply requires that exploratory findings be reported as such. Deviations from a plan are permitted when justified and disclosed, and preregistration cannot by itself guarantee a study is well designed — a poor plan, preregistered, is still a poor plan. Used alongside the standardised documentation described in the CASRAI dictionary and our guidance for authors, it makes the chain from hypothesis to result auditable.

    Frequently asked questions

    What is the difference between preregistration and a Registered Report?

    Preregistration time-stamps a plan in a public registry but is not peer-reviewed in advance. A Registered Report adds stage-one peer review and in-principle acceptance before data are collected, committing the journal to publish the results.

    How does preregistration reduce p-hacking?

    By fixing the hypotheses and analysis plan before the data are seen, it removes the ability to choose, after the fact, the specification that happens to produce a significant result.

    Does preregistration ban exploratory analysis?

    No. Exploration is encouraged, but it must be reported as exploratory rather than presented as a pre-planned confirmatory test. Justified deviations from the plan are allowed when disclosed.

    What is the Center for Open Science’s role?

    The Center for Open Science maintains the Open Science Framework, which hosts preregistration templates and registries and supports the Registered Reports format adopted by many journals.

  • Registered reports and pre-registration: planning research in the open

    Most research is reviewed and published after the results are known. That ordering, so obvious it usually goes unexamined, quietly distorts the literature: studies with striking positive results get published and studies with null results get filed away, and analyses can drift, after the fact, toward whatever story the data happen to tell. Pre-registration and its more rigorous cousin, the registered report, flip the order — committing to the plan before the data exist — and in doing so address some of the deepest threats to reproducibility. They are a central concern of the reproducibility domain and connect directly to the research-integrity domain.

    The problems they are designed to solve

    Two well-documented distortions motivate planning research in the open.

    The first is publication bias: the tendency for positive, “significant” results to be published while null or negative results disappear. The literature that results is not a fair sample of the research that was done — it over-represents flukes and under-represents the disconfirmations that science depends on. A field can end up confidently believing an effect that the full body of evidence, published and unpublished, would not support.

    The second is the family of analytic flexibility problems, of which HARKing — Hypothesising After the Results are Known — is the clearest example. When the hypothesis is written after seeing the data, and when there is freedom to choose among many possible analyses, it becomes easy, often unintentionally, to present an exploratory finding as if it had been predicted, and to select the analysis that produces the most publishable result. None of this need involve any intent to deceive; it is the natural consequence of making decisions while looking at the outcome.

    Pre-registration: committing to the plan

    Pre-registration is the practice of specifying, in a public, time-stamped record before data collection or analysis, what the study will do: its hypotheses, its design, its sampling and stopping rules, its outcome measures, and its planned analysis. The record is created in advance and cannot be quietly altered afterwards, which draws a clean line between what was confirmatory (predicted in advance) and what was exploratory (discovered in the data). Exploratory analysis remains entirely legitimate and valuable — pre-registration does not forbid it; it simply makes it honest by preventing exploratory findings from being dressed up as confirmatory ones.

    The Open Science Framework (OSF), maintained by the non-profit Center for Open Science, is the most widely used infrastructure for this. OSF lets researchers create a registration — a frozen, time-stamped, citable snapshot of the study plan — and control when it becomes public. The plan is fixed; the credibility of any later claim to have predicted a result can be checked against it.

    Registered reports: review before the results

    A registered report takes the logic further and builds it into the publishing process itself, through a two-stage peer review designed and promoted by the Center for Open Science and now offered by a large and growing number of journals.

    • Stage 1 is the protocol. Before any data are collected, the authors submit the introduction, the hypotheses, and a detailed methods and analysis plan. Reviewers assess the importance of the question and the soundness of the method — not the results, because there are none yet. If the protocol passes, the journal grants in-principle acceptance: a commitment to publish the completed study regardless of how the results turn out, provided the authors carry out the registered plan and the work is sound.
    • Stage 2 is the completed study. The authors execute the plan, report what they found — positive, null, or mixed — clearly distinguish any exploratory analyses from the pre-registered confirmatory ones, and the paper is published.

    The consequences are precise. Because the decision to publish is made before the results are known, publication bias is removed at its source — a null result is just as publishable as a positive one. Because the analysis plan is fixed and reviewed up front, HARKing and selective analysis are structurally prevented. And because reviewers shape the design while it can still be improved, peer review does its most useful work before the study is run rather than after, when nothing can be changed.

    What this strengthens, and what it does not

    Registered reports and pre-registration are powerful but not universal. They suit hypothesis-testing, confirmatory research best; they fit awkwardly onto genuinely exploratory, discovery-driven, or qualitative work, where the questions emerge from the material and a rigid pre-specified plan would be a forced fit. The honest position is that they are an excellent tool for a particular and very common kind of research, not a mandate for all of it. Used where they fit, they directly serve reproducibility: a study whose plan was fixed and public in advance is far easier for others to evaluate, replicate, and build on.

    Crediting the planning work

    Planning a study rigorously is itself a substantial contribution, and contributor-role metadata can record it. The CRediT taxonomy‘s Conceptualization and Methodology roles capture the intellectual work of formulating the research goals and designing the methods — precisely the work that a registered report front-loads and makes visible. Recording these roles ensures that the design effort, which a registered report elevates from invisible preparation to peer-reviewed output, is credited to the people who did it.

    Where shared vocabulary fits

    “Pre-registration”, “registered report”, “in-principle acceptance”, “Stage 1 protocol”, and “confirmatory analysis” are used loosely and sometimes interchangeably, which muddies what a given journal or record actually guarantees. A shared, federated vocabulary that defines these terms precisely — and points back to the Center for Open Science and the OSF registration infrastructure — is what lets a registered report in one venue be understood the same way in another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the reproducibility domain, with adjacent entries in the research-integrity domain.

    Related reading