Tag: preregistration

  • The Replication Crisis in Psychology and the Open Science Response

    The replication crisis is the recognition that a substantial share of published findings, notably in psychology, fail to reproduce when independent teams repeat the studies. It prompted a wide-ranging reform movement built around transparency and pre-specified methods. Rather than discrediting the discipline, the crisis has driven psychology to strengthen the reliability of its evidence.

    The 2015 reproducibility project

    A landmark moment was the Open Science Collaboration’s Reproducibility Project: Psychology, published in 2015. A large network of researchers attempted to replicate 100 studies from leading psychology journals using high-powered designs. A considerable proportion of the original effects did not replicate, and where effects did appear they were on average markedly smaller than in the original reports. The result was a wake-up call: publication did not guarantee a finding was robust. Crucially, the project was itself a model of open practice—its protocols were shared, its analyses were transparent, and its data were made public—so its own conclusions could be scrutinised and re-examined by others. It demonstrated that large-scale, coordinated replication was feasible, and it gave the reform movement a concrete, quantified anchor rather than anecdote. Subsequent multi-lab projects in psychology and adjacent fields extended the approach, confirming that the pattern was systemic rather than confined to a handful of studies.

    What drives non-replication

    Several interacting causes are now well understood:

    Cause How it inflates false findings
    P-hacking Flexible analysis choices made until results cross significance, producing false positives
    Publication bias Journals favour positive, novel results, so null findings stay unpublished (the “file drawer”)
    Low statistical power Small samples yield unstable estimates and exaggerated effect sizes
    Researcher degrees of freedom Undisclosed choices in design and analysis enable selective reporting

    These pressures interact with weak measurement: instruments with poor reliability and validity add noise that low-powered studies are ill-equipped to handle.

    The Open Science response

    The reform agenda answers each cause directly. Preregistration records hypotheses and analysis plans before data are seen, separating confirmatory tests from exploratory ones and curbing p-hacking. Registered Reports go further: a journal peer-reviews the introduction and methods and grants in-principle acceptance before results exist, so publication no longer hinges on whether the result is positive—directly tackling publication bias. Data and materials sharing lets others reanalyse and reuse work, and adequately powered designs reduce false positives at source.

    The role of the Center for Open Science

    Much of this infrastructure is coordinated by the Center for Open Science, the non-profit behind the Open Science Framework, a platform for preregistration, data sharing and project management. By making transparent practice easy and rewarded—through badges, registries and tooling—it has helped shift norms across psychology and beyond. The movement aligns closely with CASRAI’s interest in reproducibility and clear research metadata.

    The difference between direct and conceptual replication

    Not all replications are the same, and the distinction matters for interpreting the crisis. A direct replication repeats the original method as closely as possible to test whether the same procedure yields the same result. A conceptual replication tests the same underlying idea using a different method or measure. Conceptual replications are valuable for generalisation, but they cannot substitute for direct ones: if a different method fails, it is ambiguous whether the original finding was false or the new method simply tapped a different construct. Part of what the reform movement restored was respect for direct replication, which had been undervalued by journals that prized novelty over verification.

    Beyond p-values: estimation and transparency

    A recurring theme is over-reliance on the binary question “is p below 0.05?”. A single significant p-value says little about how large or reliable an effect is, and the threshold is easy to cross by chance or by flexible analysis. Reformers therefore emphasise reporting effect sizes with confidence intervals, planning sample sizes in advance through power analysis, and distinguishing pre-specified confirmatory tests from exploratory ones. None of this forbids exploration; it simply asks researchers to label it honestly so readers can weight the evidence appropriately. These habits depend on sound measurement, since unreliable instruments undermine even a well-powered, preregistered design—linking the crisis back to reliability and validity.

    A cultural shift, not just a checklist

    The most durable change has been cultural. Open practices—sharing data, code and materials, posting preprints, and crediting replication work—are increasingly expected rather than exceptional, and funders and journals now reward them. Many psychology journals offer Registered Reports, and badges for open data and open materials have become common. The shift reframes transparency as a normal part of doing science well rather than an optional extra, and it has begun to spread to neighbouring fields facing similar pressures.

    What it means for everyday research practice

    The crisis has practical consequences for how studies are designed and read. Single, striking results deserve caution until replicated; effect sizes and confidence intervals matter more than a lone p-value; and vivid claims—the kind that circulate as popular psychology, such as strong readings of the Dunning-Kruger effect—warrant scrutiny against replication evidence. These habits sit alongside responsible assessment of the instruments a study relies upon.

    What the crisis does and does not imply

    It is important to state the limits of the lesson. A failed replication does not automatically prove the original effect is false; replications themselves can be underpowered, can differ subtly in method, or can be run on different populations. Equally, the crisis is not unique to psychology—medicine, economics and other empirical fields have confronted comparable problems—nor does it mean that nothing in psychology is true. Many core findings replicate robustly. The accurate reading is that the proportion of fragile results in the literature was higher than assumed, that publishing incentives rewarded surprising single studies over careful verification, and that the remedy is structural rather than a matter of individual blame. Framed this way, the crisis is a sign of a discipline maturing, not collapsing.

    Standards, terminology and authors

    Reproducibility also depends on mundane infrastructure: consistent terms, well-described methods and shareable metadata. Defining concepts in a controlled research dictionary reduces ambiguity across studies, and clear expectations for authors—preregister where possible, report all measures, share data—turn the lessons of the crisis into routine. The goal is not to publish less but to publish findings that hold up.

    Frequently asked questions

    What is the replication crisis?

    It is the finding that many published results, especially in psychology, do not reproduce when independent teams repeat the studies. It exposed weaknesses in research and publishing practices and sparked reform.

    What did the 2015 Open Science Collaboration project find?

    The Reproducibility Project: Psychology replicated 100 studies and found that a large proportion did not reproduce, with replicated effects typically smaller than the originals.

    What causes findings to fail replication?

    Key causes include p-hacking, publication bias against null results, low statistical power and undisclosed analytic flexibility, often compounded by measures with weak reliability and validity.

    What are preregistration and Registered Reports?

    Preregistration logs hypotheses and analysis plans before data collection. Registered Reports take this further, with journals accepting a study based on its methods before results are known, reducing publication bias.

  • Preregistration and Registered Reports Explained

    Preregistration is the practice of publicly specifying a study’s hypotheses, methods and analysis plan before any data are collected or examined. By fixing these decisions in advance and time-stamping them, preregistration draws a clear line between confirmatory tests planned ahead of time and exploratory analyses discovered along the way — a distinction that curbs questionable research practices and strengthens reproducibility. The plan is registered publicly so that it cannot be quietly revised once results are known, which is what gives the time stamp its force.

    The problem it addresses is well documented. When analysis choices are made after seeing the data, researchers can — often unconsciously — select the specification that yields a significant result, a practice known as p-hacking. Separately, studies with positive findings are more likely to be published than null results, producing publication bias that distorts the literature. Preregistration tackles the first; Registered Reports tackle both. The two practices grew out of the wider reproducibility movement, which found that a worrying share of published findings did not hold up when independent teams tried to repeat them — a problem driven in part by exactly these analytic and publication pressures. By making the research plan public and time-stamped before results exist, both practices restore a clear distinction between what was predicted and what was merely found.

    What preregistration involves

    A preregistration typically states the research question, the hypotheses, the sample size and stopping rule, the variables, and the precise analysis plan, lodged in a public registry with a time stamp. Templates and registries hosted on the Open Science Framework (OSF), maintained by the Center for Open Science, make this routine. Clinical trials have long used dedicated public registries for the same reason, and the practice has since spread across the social and life sciences. Because the plan is fixed, readers can verify that the reported confirmatory analysis is the one that was promised, and exploratory work is labelled as such rather than dressed up as a prediction. A good preregistration is specific enough that a third party could, in principle, run the planned analysis without further instruction.

    Confirmatory versus exploratory research

    The conceptual heart of preregistration is the distinction between confirmatory and exploratory research. Confirmatory research tests a specific, pre-stated hypothesis with a pre-specified analysis; its statistical guarantees — including the meaning of a p-value — depend on the analysis having been fixed in advance. Exploratory research, by contrast, searches the data for patterns and generates new hypotheses; it is valuable and necessary, but its findings are provisional and must be confirmed in fresh data. Problems arise when exploratory results are dressed up as confirmatory ones, lending them a false air of statistical rigour. Preregistration keeps the two honest by recording, with a time stamp, exactly which analyses were planned. Anything beyond that plan is legitimate exploration, simply labelled as such rather than presented as a prediction that came true.

    Registered Reports go further

    A Registered Report is a publication format in which the introduction, methods and analysis plan are peer-reviewed before data collection. If the question and design are judged sound, the journal grants in-principle acceptance — a commitment to publish the completed study regardless of whether the results are positive, negative or null, provided the authors follow the approved protocol. This decouples the publication decision from the outcome, directly attacking publication bias. A useful side effect is that reviewers can improve a study before it is run, when flaws can still be fixed, rather than critiquing an unchangeable design after the fact. This shifts peer review from gatekeeping to genuine quality improvement, and reduces the waste of running studies whose weaknesses only surface at submission.

    How each curbs bias

    Practice Reviewed before data? Mainly curbs
    Preregistration No (registry only) p-hacking, hidden analytic flexibility
    Registered Report Yes (stage-one peer review) p-hacking and publication bias

    The shared mechanism is timing: committing to decisions before outcomes are known removes the temptation, and the opportunity, to reshape a study around a desired result. This complements the rigour built into experimental designs such as the randomised controlled trial, where preregistered protocols make ITT and primary-outcome commitments verifiable.

    The two-stage Registered Report workflow

    What makes Registered Reports distinctive is their two-stage review. At stage one, reviewers evaluate the question’s importance and the soundness of the proposed methods and analysis before any data exist; sound proposals earn in-principle acceptance. At stage two, after the study is run, reviewers check that the authors followed the approved protocol and that conclusions match the registered plan — but they do not get to reject the paper simply because the results were null or unexciting. This sequencing is what severs the link between a study’s outcome and its publishability.

    Stage What is reviewed Decision
    Stage one Question, methods, analysis plan In-principle acceptance
    Data collection Conducted per approved protocol
    Stage two Adherence to plan, valid conclusions Publication regardless of result

    How they curb publication bias and p-hacking

    Publication bias arises when the literature over-represents positive findings because null results are harder to publish. By guaranteeing publication at stage one, Registered Reports ensure null and negative results enter the record, giving a more honest picture of the evidence. P-hacking — selecting the analysis that happens to reach significance — is curbed by both formats, because the analytic decisions are fixed and public before the data are seen. Together these mechanisms protect the integrity of confirmatory claims, much as the pre-specified primary outcomes of a randomised controlled trial protect its causal conclusions.

    Benefits and honest limits

    Preregistration improves transparency, makes exploratory work explicit and supports the reproducibility goals at the heart of the research lifecycle. It does not forbid exploration; it simply requires that exploratory findings be reported as such. Deviations from a plan are permitted when justified and disclosed, and preregistration cannot by itself guarantee a study is well designed — a poor plan, preregistered, is still a poor plan. Used alongside the standardised documentation described in the CASRAI dictionary and our guidance for authors, it makes the chain from hypothesis to result auditable.

    Frequently asked questions

    What is the difference between preregistration and a Registered Report?

    Preregistration time-stamps a plan in a public registry but is not peer-reviewed in advance. A Registered Report adds stage-one peer review and in-principle acceptance before data are collected, committing the journal to publish the results.

    How does preregistration reduce p-hacking?

    By fixing the hypotheses and analysis plan before the data are seen, it removes the ability to choose, after the fact, the specification that happens to produce a significant result.

    Does preregistration ban exploratory analysis?

    No. Exploration is encouraged, but it must be reported as exploratory rather than presented as a pre-planned confirmatory test. Justified deviations from the plan are allowed when disclosed.

    What is the Center for Open Science’s role?

    The Center for Open Science maintains the Open Science Framework, which hosts preregistration templates and registries and supports the Registered Reports format adopted by many journals.

  • Open science across the research lifecycle: from preregistration to preservation

    Open science is often encountered as a set of separate practices: a journal’s open-access policy, a funder’s data-sharing requirement, a colleague’s preregistered study. Treated piecemeal, each can feel like an isolated obligation. But open science is most powerful, and most coherent, when its practices are understood as connected stages in the arc of a single project — when openness runs through the whole research lifecycle rather than appearing only at the end. Seen this way, preregistration, open data, open access and preservation are not unrelated requirements but successive expressions of one principle: that research is more trustworthy, more useful and more cumulative when it is conducted in the open. This article traces openness across the lifecycle through the research lifecycle domain of the CASRAI Dictionary.

    A global framework: the UNESCO Recommendation

    That open science is a connected whole rather than a collection of separate practices is reflected in the most significant international statement on the subject: the UNESCO Recommendation on Open Science, adopted by member states as a shared global framework. It treats open science not as a single act of sharing but as an integrated set of practices and values — open access to publications, open research data, open-source software, open infrastructures, open engagement with society — underpinned by transparency, equity and inclusion. Its scope is the point: it frames openness as a culture spanning the entire research process, not a box ticked at publication, and provides a common reference for understanding open science as a coherent lifecycle.

    The beginning: preregistration

    Openness can begin before any data are collected. Preregistration is the practice of specifying a study’s hypotheses, methods and analysis plan in advance, and recording that plan in a way that cannot be quietly changed later. Its purpose is to strengthen the integrity of research by making clear what was planned before the results were known, which guards against practices such as reshaping hypotheses to fit the data or selectively reporting only what worked. A particularly developed form is the registered report, in which a study’s plan is peer-reviewed and accepted in principle before the results exist, so that publication depends on the quality of the question and method rather than on whether the findings turn out to be striking. Preregistration makes the research process transparent from the outset and sets the foundation for everything that follows.

    The middle: open and FAIR data

    As a project generates data, openness shifts to how that data is managed and shared. The widely adopted FAIR principles hold that data should be Findable, Accessible, Interoperable and Reusable — properties that let data be discovered, understood and built upon by others rather than locked away or lost. Making data FAIR, and as open as is responsible, transforms it from a private by-product of one study into a lasting resource for the community. This stage connects backwards and forwards: data shared openly allows the results derived from it to be checked, and it allows the data itself to feed new research it was never collected for. Openness in the middle of the lifecycle is what gives a project value beyond its own conclusions.

    The output: open access

    When findings are written up, openness turns to open access — making the resulting publications freely available to read rather than locked behind paywalls. It can be achieved through different routes, including publishing in open-access venues and depositing accepted manuscripts in repositories, but the principle is constant: research that anyone can read can be verified, used and built upon by the widest possible audience. Open access is the most visible face of open science, but within the lifecycle it is one stage among several. A paper that is open but rests on hidden data and an undisclosed plan is less open than it appears; open access is most meaningful when it sits atop preregistration and open data.

    The long term: preservation

    The lifecycle does not end at publication, because outputs that are open today are worthless tomorrow if they vanish. Digital preservation is the work of ensuring that data, publications, software and other outputs remain accessible, intact and usable over the long term, against the threats of format obsolescence, link rot, storage failure and institutional change. There is little point making research open if it cannot be found or opened a decade later. Trusted repositories, persistent identifiers and active preservation practices are what keep the open record open over time, closing the loop so that the openness built earlier actually endures.

    The lifecycle as a connected whole

    The deeper point is that these stages reinforce one another. Preregistration makes the eventual open data and open publication more meaningful, because the plan they can be checked against is on record. Open data makes the open publication verifiable. Preservation makes all of it durable. Openness at one stage is weakened when a stage is missing — open access over secret data, or open data with no preservation, each falls short of the whole. This is why open science is best understood as a lifecycle rather than a checklist: its value is cumulative and connected, exactly the vision the UNESCO Recommendation articulates. Our learning resources explore each practice in more depth.

    A consistent vocabulary across the lifecycle

    For openness to connect across stages and systems, the information describing each stage must mean the same thing everywhere — the status of a preregistration, the access conditions of data, the licence on a publication, the preservation state of an output. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the open-science attributes of a project are understood identically across the systems that record them. And because contribution runs through every stage, the work done at each can be described in the same shared framework — the CRediT taxonomy and its full set of contribution roles. Open science is not a single act but a way of working across the whole life of a project; its power lies in the connection of its parts.