Tag: open science

  • Assessing Researchers Differently: The Hong Kong Principles

    How institutions assess researchers shapes how researchers behave. If hiring, promotion, and funding decisions reward sheer publication volume and the prestige of journals, then researchers will rationally pursue volume and prestige, sometimes at the expense of the careful, transparent, reproducible work that science depends on. The Hong Kong Principles for assessing researchers were developed to confront this misalignment head-on, by describing what responsible assessment should reward.

    Where the principles came from

    The Hong Kong Principles emerged from the 6th World Conference on Research Integrity, held in Hong Kong, and were set out in a paper led by David Moher and colleagues. They sit within a wider movement for reform in research assessment that includes the San Francisco Declaration on Research Assessment (DORA) and the Leiden Manifesto, but their particular contribution is to link assessment criteria explicitly to the practices that uphold research integrity. Rather than only cautioning against the misuse of metrics, they propose positive things that institutions should look for and credit.

    The five principles

    The framework rests on five principles, each describing a dimension of responsible research that assessment should recognise:

    • Assess responsible research practices. Reward researchers for conducting rigorous work and reporting it transparently and completely, including the methods that make a study trustworthy.
    • Value complete reporting. Recognise full and transparent reporting of research, so that what was done and what was found can be understood and scrutinised, regardless of the direction of the results.
    • Reward open science. Credit the practices of open research, including open methods, open data, and open materials, that allow others to verify and build on the work.
    • Recognise a broad range of contributions. Acknowledge the diverse roles and activities that constitute research, including peer review, mentorship, and the production of datasets and software, not only authorship of articles.
    • Reward open and reproducible practices in early-career assessment. Ensure that the criteria applied to those entering and progressing in research encourage, rather than discourage, openness and reproducibility.

    Rewarding what actually matters

    The common thread is a shift in what counts as evidence of a good researcher. Under conventional assessment, a long list of papers in well-regarded journals serves as a proxy for quality. The Hong Kong Principles argue that this proxy is weak and sometimes perverse: it can reward selective reporting, discourage the sharing of data and code, and overlook the many forms of work that make research robust. By asking assessors to look directly at responsible practices, the framework tries to make the proxy unnecessary.

    This connects to familiar machinery for describing contributions. Frameworks such as CRediT make it possible to record who contributed what to a piece of work, including roles like data curation, software, and methodology that rarely show up in an author list alone. Recognising those contributions is exactly the kind of broadened view the principles call for.

    What this asks of institutions

    Adopting the Hong Kong Principles is not a matter of swapping one metric for another. It requires institutions to rethink the questions they ask when evaluating people. Instead of asking only how many papers a candidate has published and where, an assessment committee might ask whether the candidate shares data and code openly, whether their reporting is complete and transparent, whether their methods are sound, and whether they contribute to the research community through review, mentorship, and the stewardship of FAIR data. These are qualitative judgements, and they take more effort than counting, but they are closer to what assessment is supposed to measure.

    Integrity by design

    The deeper argument is that integrity cannot be bolted on through codes of conduct alone. If the incentive structure rewards the wrong things, exhortations to behave well will struggle against the grain. By aligning the rewards of a research career with the practices that make research trustworthy, the Hong Kong Principles try to build integrity into the system rather than treating it as an afterthought. They put openness, transparency, and reproducibility on the side of career success rather than in tension with it.

    A practical starting point

    No single framework will reform research assessment overnight, and the Hong Kong Principles are explicit that their adoption is a journey. But they give institutions a concrete vocabulary for change and a checklist of practices worth rewarding. For an organisation revising its promotion criteria, or a funder reconsidering how it judges applicants, they offer a defensible answer to a question that has too often gone unasked: when we assess researchers, what are we actually trying to measure, and are our criteria measuring it? The vocabularies catalogued in the CASRAI data dictionary can help express the contributions that result.

  • The Replication Crisis in Psychology and the Open Science Response

    The replication crisis is the recognition that a substantial share of published findings, notably in psychology, fail to reproduce when independent teams repeat the studies. It prompted a wide-ranging reform movement built around transparency and pre-specified methods. Rather than discrediting the discipline, the crisis has driven psychology to strengthen the reliability of its evidence.

    The 2015 reproducibility project

    A landmark moment was the Open Science Collaboration’s Reproducibility Project: Psychology, published in 2015. A large network of researchers attempted to replicate 100 studies from leading psychology journals using high-powered designs. A considerable proportion of the original effects did not replicate, and where effects did appear they were on average markedly smaller than in the original reports. The result was a wake-up call: publication did not guarantee a finding was robust. Crucially, the project was itself a model of open practice—its protocols were shared, its analyses were transparent, and its data were made public—so its own conclusions could be scrutinised and re-examined by others. It demonstrated that large-scale, coordinated replication was feasible, and it gave the reform movement a concrete, quantified anchor rather than anecdote. Subsequent multi-lab projects in psychology and adjacent fields extended the approach, confirming that the pattern was systemic rather than confined to a handful of studies.

    What drives non-replication

    Several interacting causes are now well understood:

    Cause How it inflates false findings
    P-hacking Flexible analysis choices made until results cross significance, producing false positives
    Publication bias Journals favour positive, novel results, so null findings stay unpublished (the “file drawer”)
    Low statistical power Small samples yield unstable estimates and exaggerated effect sizes
    Researcher degrees of freedom Undisclosed choices in design and analysis enable selective reporting

    These pressures interact with weak measurement: instruments with poor reliability and validity add noise that low-powered studies are ill-equipped to handle.

    The Open Science response

    The reform agenda answers each cause directly. Preregistration records hypotheses and analysis plans before data are seen, separating confirmatory tests from exploratory ones and curbing p-hacking. Registered Reports go further: a journal peer-reviews the introduction and methods and grants in-principle acceptance before results exist, so publication no longer hinges on whether the result is positive—directly tackling publication bias. Data and materials sharing lets others reanalyse and reuse work, and adequately powered designs reduce false positives at source.

    The role of the Center for Open Science

    Much of this infrastructure is coordinated by the Center for Open Science, the non-profit behind the Open Science Framework, a platform for preregistration, data sharing and project management. By making transparent practice easy and rewarded—through badges, registries and tooling—it has helped shift norms across psychology and beyond. The movement aligns closely with CASRAI’s interest in reproducibility and clear research metadata.

    The difference between direct and conceptual replication

    Not all replications are the same, and the distinction matters for interpreting the crisis. A direct replication repeats the original method as closely as possible to test whether the same procedure yields the same result. A conceptual replication tests the same underlying idea using a different method or measure. Conceptual replications are valuable for generalisation, but they cannot substitute for direct ones: if a different method fails, it is ambiguous whether the original finding was false or the new method simply tapped a different construct. Part of what the reform movement restored was respect for direct replication, which had been undervalued by journals that prized novelty over verification.

    Beyond p-values: estimation and transparency

    A recurring theme is over-reliance on the binary question “is p below 0.05?”. A single significant p-value says little about how large or reliable an effect is, and the threshold is easy to cross by chance or by flexible analysis. Reformers therefore emphasise reporting effect sizes with confidence intervals, planning sample sizes in advance through power analysis, and distinguishing pre-specified confirmatory tests from exploratory ones. None of this forbids exploration; it simply asks researchers to label it honestly so readers can weight the evidence appropriately. These habits depend on sound measurement, since unreliable instruments undermine even a well-powered, preregistered design—linking the crisis back to reliability and validity.

    A cultural shift, not just a checklist

    The most durable change has been cultural. Open practices—sharing data, code and materials, posting preprints, and crediting replication work—are increasingly expected rather than exceptional, and funders and journals now reward them. Many psychology journals offer Registered Reports, and badges for open data and open materials have become common. The shift reframes transparency as a normal part of doing science well rather than an optional extra, and it has begun to spread to neighbouring fields facing similar pressures.

    What it means for everyday research practice

    The crisis has practical consequences for how studies are designed and read. Single, striking results deserve caution until replicated; effect sizes and confidence intervals matter more than a lone p-value; and vivid claims—the kind that circulate as popular psychology, such as strong readings of the Dunning-Kruger effect—warrant scrutiny against replication evidence. These habits sit alongside responsible assessment of the instruments a study relies upon.

    What the crisis does and does not imply

    It is important to state the limits of the lesson. A failed replication does not automatically prove the original effect is false; replications themselves can be underpowered, can differ subtly in method, or can be run on different populations. Equally, the crisis is not unique to psychology—medicine, economics and other empirical fields have confronted comparable problems—nor does it mean that nothing in psychology is true. Many core findings replicate robustly. The accurate reading is that the proportion of fragile results in the literature was higher than assumed, that publishing incentives rewarded surprising single studies over careful verification, and that the remedy is structural rather than a matter of individual blame. Framed this way, the crisis is a sign of a discipline maturing, not collapsing.

    Standards, terminology and authors

    Reproducibility also depends on mundane infrastructure: consistent terms, well-described methods and shareable metadata. Defining concepts in a controlled research dictionary reduces ambiguity across studies, and clear expectations for authors—preregister where possible, report all measures, share data—turn the lessons of the crisis into routine. The goal is not to publish less but to publish findings that hold up.

    Frequently asked questions

    What is the replication crisis?

    It is the finding that many published results, especially in psychology, do not reproduce when independent teams repeat the studies. It exposed weaknesses in research and publishing practices and sparked reform.

    What did the 2015 Open Science Collaboration project find?

    The Reproducibility Project: Psychology replicated 100 studies and found that a large proportion did not reproduce, with replicated effects typically smaller than the originals.

    What causes findings to fail replication?

    Key causes include p-hacking, publication bias against null results, low statistical power and undisclosed analytic flexibility, often compounded by measures with weak reliability and validity.

    What are preregistration and Registered Reports?

    Preregistration logs hypotheses and analysis plans before data collection. Registered Reports take this further, with journals accepting a study based on its methods before results are known, reducing publication bias.

  • Preregistration and Registered Reports Explained

    Preregistration is the practice of publicly specifying a study’s hypotheses, methods and analysis plan before any data are collected or examined. By fixing these decisions in advance and time-stamping them, preregistration draws a clear line between confirmatory tests planned ahead of time and exploratory analyses discovered along the way — a distinction that curbs questionable research practices and strengthens reproducibility. The plan is registered publicly so that it cannot be quietly revised once results are known, which is what gives the time stamp its force.

    The problem it addresses is well documented. When analysis choices are made after seeing the data, researchers can — often unconsciously — select the specification that yields a significant result, a practice known as p-hacking. Separately, studies with positive findings are more likely to be published than null results, producing publication bias that distorts the literature. Preregistration tackles the first; Registered Reports tackle both. The two practices grew out of the wider reproducibility movement, which found that a worrying share of published findings did not hold up when independent teams tried to repeat them — a problem driven in part by exactly these analytic and publication pressures. By making the research plan public and time-stamped before results exist, both practices restore a clear distinction between what was predicted and what was merely found.

    What preregistration involves

    A preregistration typically states the research question, the hypotheses, the sample size and stopping rule, the variables, and the precise analysis plan, lodged in a public registry with a time stamp. Templates and registries hosted on the Open Science Framework (OSF), maintained by the Center for Open Science, make this routine. Clinical trials have long used dedicated public registries for the same reason, and the practice has since spread across the social and life sciences. Because the plan is fixed, readers can verify that the reported confirmatory analysis is the one that was promised, and exploratory work is labelled as such rather than dressed up as a prediction. A good preregistration is specific enough that a third party could, in principle, run the planned analysis without further instruction.

    Confirmatory versus exploratory research

    The conceptual heart of preregistration is the distinction between confirmatory and exploratory research. Confirmatory research tests a specific, pre-stated hypothesis with a pre-specified analysis; its statistical guarantees — including the meaning of a p-value — depend on the analysis having been fixed in advance. Exploratory research, by contrast, searches the data for patterns and generates new hypotheses; it is valuable and necessary, but its findings are provisional and must be confirmed in fresh data. Problems arise when exploratory results are dressed up as confirmatory ones, lending them a false air of statistical rigour. Preregistration keeps the two honest by recording, with a time stamp, exactly which analyses were planned. Anything beyond that plan is legitimate exploration, simply labelled as such rather than presented as a prediction that came true.

    Registered Reports go further

    A Registered Report is a publication format in which the introduction, methods and analysis plan are peer-reviewed before data collection. If the question and design are judged sound, the journal grants in-principle acceptance — a commitment to publish the completed study regardless of whether the results are positive, negative or null, provided the authors follow the approved protocol. This decouples the publication decision from the outcome, directly attacking publication bias. A useful side effect is that reviewers can improve a study before it is run, when flaws can still be fixed, rather than critiquing an unchangeable design after the fact. This shifts peer review from gatekeeping to genuine quality improvement, and reduces the waste of running studies whose weaknesses only surface at submission.

    How each curbs bias

    Practice Reviewed before data? Mainly curbs
    Preregistration No (registry only) p-hacking, hidden analytic flexibility
    Registered Report Yes (stage-one peer review) p-hacking and publication bias

    The shared mechanism is timing: committing to decisions before outcomes are known removes the temptation, and the opportunity, to reshape a study around a desired result. This complements the rigour built into experimental designs such as the randomised controlled trial, where preregistered protocols make ITT and primary-outcome commitments verifiable.

    The two-stage Registered Report workflow

    What makes Registered Reports distinctive is their two-stage review. At stage one, reviewers evaluate the question’s importance and the soundness of the proposed methods and analysis before any data exist; sound proposals earn in-principle acceptance. At stage two, after the study is run, reviewers check that the authors followed the approved protocol and that conclusions match the registered plan — but they do not get to reject the paper simply because the results were null or unexciting. This sequencing is what severs the link between a study’s outcome and its publishability.

    Stage What is reviewed Decision
    Stage one Question, methods, analysis plan In-principle acceptance
    Data collection Conducted per approved protocol
    Stage two Adherence to plan, valid conclusions Publication regardless of result

    How they curb publication bias and p-hacking

    Publication bias arises when the literature over-represents positive findings because null results are harder to publish. By guaranteeing publication at stage one, Registered Reports ensure null and negative results enter the record, giving a more honest picture of the evidence. P-hacking — selecting the analysis that happens to reach significance — is curbed by both formats, because the analytic decisions are fixed and public before the data are seen. Together these mechanisms protect the integrity of confirmatory claims, much as the pre-specified primary outcomes of a randomised controlled trial protect its causal conclusions.

    Benefits and honest limits

    Preregistration improves transparency, makes exploratory work explicit and supports the reproducibility goals at the heart of the research lifecycle. It does not forbid exploration; it simply requires that exploratory findings be reported as such. Deviations from a plan are permitted when justified and disclosed, and preregistration cannot by itself guarantee a study is well designed — a poor plan, preregistered, is still a poor plan. Used alongside the standardised documentation described in the CASRAI dictionary and our guidance for authors, it makes the chain from hypothesis to result auditable.

    Frequently asked questions

    What is the difference between preregistration and a Registered Report?

    Preregistration time-stamps a plan in a public registry but is not peer-reviewed in advance. A Registered Report adds stage-one peer review and in-principle acceptance before data are collected, committing the journal to publish the results.

    How does preregistration reduce p-hacking?

    By fixing the hypotheses and analysis plan before the data are seen, it removes the ability to choose, after the fact, the specification that happens to produce a significant result.

    Does preregistration ban exploratory analysis?

    No. Exploration is encouraged, but it must be reported as exploratory rather than presented as a pre-planned confirmatory test. Justified deviations from the plan are allowed when disclosed.

    What is the Center for Open Science’s role?

    The Center for Open Science maintains the Open Science Framework, which hosts preregistration templates and registries and supports the Registered Reports format adopted by many journals.

  • Open science across the research lifecycle: from preregistration to preservation

    Open science is often encountered as a set of separate practices: a journal’s open-access policy, a funder’s data-sharing requirement, a colleague’s preregistered study. Treated piecemeal, each can feel like an isolated obligation. But open science is most powerful, and most coherent, when its practices are understood as connected stages in the arc of a single project — when openness runs through the whole research lifecycle rather than appearing only at the end. Seen this way, preregistration, open data, open access and preservation are not unrelated requirements but successive expressions of one principle: that research is more trustworthy, more useful and more cumulative when it is conducted in the open. This article traces openness across the lifecycle through the research lifecycle domain of the CASRAI Dictionary.

    A global framework: the UNESCO Recommendation

    That open science is a connected whole rather than a collection of separate practices is reflected in the most significant international statement on the subject: the UNESCO Recommendation on Open Science, adopted by member states as a shared global framework. It treats open science not as a single act of sharing but as an integrated set of practices and values — open access to publications, open research data, open-source software, open infrastructures, open engagement with society — underpinned by transparency, equity and inclusion. Its scope is the point: it frames openness as a culture spanning the entire research process, not a box ticked at publication, and provides a common reference for understanding open science as a coherent lifecycle.

    The beginning: preregistration

    Openness can begin before any data are collected. Preregistration is the practice of specifying a study’s hypotheses, methods and analysis plan in advance, and recording that plan in a way that cannot be quietly changed later. Its purpose is to strengthen the integrity of research by making clear what was planned before the results were known, which guards against practices such as reshaping hypotheses to fit the data or selectively reporting only what worked. A particularly developed form is the registered report, in which a study’s plan is peer-reviewed and accepted in principle before the results exist, so that publication depends on the quality of the question and method rather than on whether the findings turn out to be striking. Preregistration makes the research process transparent from the outset and sets the foundation for everything that follows.

    The middle: open and FAIR data

    As a project generates data, openness shifts to how that data is managed and shared. The widely adopted FAIR principles hold that data should be Findable, Accessible, Interoperable and Reusable — properties that let data be discovered, understood and built upon by others rather than locked away or lost. Making data FAIR, and as open as is responsible, transforms it from a private by-product of one study into a lasting resource for the community. This stage connects backwards and forwards: data shared openly allows the results derived from it to be checked, and it allows the data itself to feed new research it was never collected for. Openness in the middle of the lifecycle is what gives a project value beyond its own conclusions.

    The output: open access

    When findings are written up, openness turns to open access — making the resulting publications freely available to read rather than locked behind paywalls. It can be achieved through different routes, including publishing in open-access venues and depositing accepted manuscripts in repositories, but the principle is constant: research that anyone can read can be verified, used and built upon by the widest possible audience. Open access is the most visible face of open science, but within the lifecycle it is one stage among several. A paper that is open but rests on hidden data and an undisclosed plan is less open than it appears; open access is most meaningful when it sits atop preregistration and open data.

    The long term: preservation

    The lifecycle does not end at publication, because outputs that are open today are worthless tomorrow if they vanish. Digital preservation is the work of ensuring that data, publications, software and other outputs remain accessible, intact and usable over the long term, against the threats of format obsolescence, link rot, storage failure and institutional change. There is little point making research open if it cannot be found or opened a decade later. Trusted repositories, persistent identifiers and active preservation practices are what keep the open record open over time, closing the loop so that the openness built earlier actually endures.

    The lifecycle as a connected whole

    The deeper point is that these stages reinforce one another. Preregistration makes the eventual open data and open publication more meaningful, because the plan they can be checked against is on record. Open data makes the open publication verifiable. Preservation makes all of it durable. Openness at one stage is weakened when a stage is missing — open access over secret data, or open data with no preservation, each falls short of the whole. This is why open science is best understood as a lifecycle rather than a checklist: its value is cumulative and connected, exactly the vision the UNESCO Recommendation articulates. Our learning resources explore each practice in more depth.

    A consistent vocabulary across the lifecycle

    For openness to connect across stages and systems, the information describing each stage must mean the same thing everywhere — the status of a preregistration, the access conditions of data, the licence on a publication, the preservation state of an output. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the open-science attributes of a project are understood identically across the systems that record them. And because contribution runs through every stage, the work done at each can be described in the same shared framework — the CRediT taxonomy and its full set of contribution roles. Open science is not a single act but a way of working across the whole life of a project; its power lies in the connection of its parts.

  • The replication crisis and large-scale replication projects: what systematic replication has taught us

    For most of the twentieth century, the published literature was treated, in practice, as a reasonably trustworthy record: if a finding appeared in a peer-reviewed journal, it was presumed to be real until something specific cast doubt on it. That presumption rested on an assumption rarely tested directly — that published results would reappear if someone repeated the study. Beginning in the early 2010s, a series of deliberate, large-scale efforts set out to test exactly that assumption by repeating published studies systematically, and what they found unsettled whole disciplines. The episode came to be called the replication crisis, and the work it provoked has reshaped how research thinks about its own reliability. This article looks at the major replication projects and the lessons they taught, drawing on the reproducibility domain of the CASRAI Dictionary.

    From unease to evidence

    Concerns that some published findings might be fragile were not new; what was new was the decision to measure the problem rather than merely worry about it. The crucial move was to treat replication itself as a research programme — to take a defined set of published studies, repeat them carefully using the original methods, and report honestly how many produced consistent results. This turned a diffuse anxiety into an empirical question rather than a matter of faith.

    The Reproducibility Project: Psychology

    The best-known of these efforts is the Reproducibility Project: Psychology, coordinated by the Open Science Collaboration and led through the Center for Open Science. A large group of researchers worked together to repeat a substantial sample of studies drawn from prominent psychology journals, following the original methods as closely as possible and, where they could, working with the original authors to get the protocols right. The headline finding was sobering: a considerable proportion of the replication attempts did not reproduce the original results, and where effects did appear again, they were often smaller than first reported. The project did not claim that the original findings were necessarily false — a failed replication can have many causes — but it demonstrated, at scale and in public, that a worrying share of published findings could not simply be taken on trust. It became a reference point for the entire debate.

    The Many Labs studies

    A complementary approach came from the Many Labs projects. Rather than each replication being attempted once by one team, Many Labs had numerous laboratories around the world each attempt the same set of studies using shared protocols. This answered a different question: not just whether a finding replicates once, but how consistent it is across many independent settings, samples and contexts. Some effects proved robust, reappearing reliably across nearly all the participating laboratories; others were inconsistent or largely absent. Many Labs also helped separate genuine variability in a phenomenon from the noise of any single replication attempt. The lesson was that replication is not a simple pass or fail but a way of mapping how dependable and how context-sensitive a finding really is.

    Cancer biology and beyond

    The replication question was not confined to psychology. The Reproducibility Project: Cancer Biology, a collaboration involving the Center for Open Science and an independent laboratory network, set out to repeat key experiments from high-profile preclinical cancer studies. Replicating biological experiments proved genuinely difficult, often because the original papers lacked enough methodological detail to repeat the work without extensive back-and-forth with the original authors — and sometimes that detail could not be recovered at all. Where replications could be completed, the picture was mixed, with many original effects appearing weaker than first reported. The Brazilian Reproducibility Initiative extended the same spirit to biomedical research within a national research system, coordinating multiple laboratories to repeat a common set of experimental methods. Across these efforts a recurring finding emerged: incomplete reporting is itself a major obstacle to reproducibility, quite apart from whether the underlying result is real.

    What the projects taught

    Taken together, the large replication projects yielded several durable lessons:

    • The problem is real and measurable. A meaningful proportion of published findings do not straightforwardly replicate, and this can be demonstrated rather than merely asserted.
    • Reporting matters enormously. Many replication difficulties stem not from false results but from methods described too thinly to repeat.
    • Replication is informative, not punitive. A single failed replication rarely settles anything; replication is most valuable for estimating how robust and context-dependent an effect is.
    • Practices can be reformed. The findings spurred pre-registration, registered reports, open data and better reporting standards.

    The rise of metascience

    Perhaps the most lasting consequence is the maturing of metascience — the scientific study of science itself. The replication projects showed that the research process can be studied empirically: that questions about how reliable findings are, what practices improve reliability, and how incentives shape behaviour can be investigated with the same rigour applied to any other subject. Metascience has since examined publication bias, statistical practice and the effects of pre-registration. The replication crisis, in this light, was not an embarrassment to be buried but the beginning of research becoming more willing to examine its own foundations. Reproducibility ceased to be assumed and became something to be designed for, measured and improved.

    A shared vocabulary for reliability

    For reproducibility to be improved across disciplines, institutions and publishers, the concepts involved must be described consistently — what a replication is, what counts as the materials and methods needed to repeat a study, and how outputs such as data and protocols are identified and shared. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the elements underpinning reproducible research are understood the same way wherever they are recorded. And because conducting replications and sharing the data and methods behind them is genuine, recognisable work, it can be described in the same framework used for every other contribution — the CRediT taxonomy, whose full set of contribution roles covers investigation, data curation and the rest. Building replication into practice is part of research administration. The replication projects taught research to test its own claims; a shared vocabulary helps ensure the lessons travel.