Category: Guides & Explainers

Practical how-to guides, templates, checklists, and career pathways for research administrators, authors, and institutional teams.

  • Randomised Controlled Trials: The Gold Standard Explained

    A randomised controlled trial (RCT) is an experimental study in which participants are allocated to an intervention group or a comparison group purely by chance, so that the only systematic difference between groups is the treatment under test. By combining randomisation, a control or comparison arm and, where possible, blinding, the RCT isolates the effect of an intervention from confounding factors, making it the methodological gold standard for answering causal questions.

    The core insight is simple but powerful: if allocation is genuinely random and groups are large enough, known and unknown confounders are distributed evenly across arms. Any difference in outcome can then be attributed to the intervention rather than to pre-existing differences between participants.

    Randomisation

    Randomisation is the process of assigning participants to groups by chance — for example, by computer-generated sequence. Its purpose is to balance characteristics such as age, severity and unmeasured risk factors across arms, removing selection bias from the comparison. Without it, sicker or healthier participants might cluster in one group, distorting the result.

    Allocation concealment

    Allocation concealment ensures that those enrolling participants cannot foresee or influence which group a person will join. It is distinct from blinding: concealment protects the randomisation process at the point of assignment, whereas blinding operates after assignment. Poor concealment is one of the most consistently demonstrated sources of exaggerated treatment effects.

    Control and comparison

    A control or comparison arm provides the counterfactual — what would have happened without the intervention. Comparators may be a placebo, standard care or an active alternative. The placebo arm in particular controls for expectation effects, a topic explored in our article on the placebo and placebo effect.

    Blinding

    Blinding (or masking) prevents participants, clinicians or assessors from knowing group assignment, reducing conscious and unconscious bias. The mechanics of single, double and triple blinding, and the specific biases they address, are set out in our companion guide to double-blind studies and bias control.

    Intention-to-treat analysis

    Intention-to-treat (ITT) analysis evaluates participants in the groups to which they were randomised, regardless of whether they completed the assigned treatment. This preserves the benefits of randomisation and gives a realistic estimate of effectiveness in practice, where adherence is imperfect. The contrasting per-protocol analysis, which includes only those who followed the protocol, can reintroduce bias and is usually treated as secondary.

    Why the RCT is the gold standard

    For causal questions about whether an intervention works, the RCT’s design controls the main threats to validity in one structure. It sits at the heart of the confirmatory stage of drug development, as described in our overview of the pharmaceutical R&D pipeline, and underpins evidence-based decision-making across the research lifecycle.

    Anatomy of a well-conducted RCT

    A robust trial weaves these elements together rather than relying on any single one. The table below summarises the core components and the threat each addresses.

    Component Purpose Threat addressed
    Randomisation Balance groups by chance Confounding, selection bias
    Allocation concealment Hide upcoming assignment Manipulation of enrolment
    Control arm Provide a counterfactual Mistaking change for effect
    Blinding Conceal group membership Performance and detection bias
    Intention-to-treat Analyse as randomised Attrition and post-hoc selection

    Power, sample size and pre-specification

    Randomisation only balances groups reliably when the sample is large enough, which is why trials specify a target sample size derived from the smallest difference worth detecting. Too small a study may miss a real effect or produce an unstable estimate; an adequately powered one gives the result interpretive weight. Equally important is pre-specifying the primary outcome and analysis plan before the data are seen, so that a single confirmatory test is fixed in advance rather than chosen afterwards. This connects directly to the practice of preregistration and Registered Reports, which protects the trial’s confirmatory status from later analytic flexibility.

    Where the RCT sits in the evidence hierarchy

    A single trial, however well conducted, is rarely the final word. Findings gain strength when they are replicated and when multiple RCTs are combined in systematic reviews and meta-analyses, which sit above the individual trial in the evidence hierarchy. Conversely, a well-designed observational study can sometimes be more informative than a flawed or under-powered RCT. The design is a powerful tool, not an automatic guarantee of truth, and its value depends on execution and transparent reporting.

    Internal versus external validity

    Two distinct questions decide whether a trial is useful. Internal validity asks whether the result is true for the participants studied — whether the design genuinely isolated the intervention’s effect from bias and confounding. External validity asks whether that result generalises to other people, settings and conditions. The RCT excels at the first: randomisation, concealment, control and blinding are precisely the tools that secure internal validity. It is weaker on the second, because the controlled conditions and selected participants that protect internal validity can make a trial less representative of routine practice. Strong evidence requires attention to both, and the two sometimes pull in opposite directions.

    Pragmatic versus explanatory trials

    This tension has produced two broad trial styles. Explanatory trials test whether an intervention can work under ideal, tightly controlled conditions — maximising internal validity and answering questions of efficacy. Pragmatic trials test whether it does work in everyday clinical settings with broader participants and fewer restrictions — favouring external validity and answering questions of effectiveness. Neither is superior in the abstract; the right choice depends on the question being asked. A regulator confirming a causal effect may want an explanatory design, while a health system deciding whether to adopt a treatment may learn more from a pragmatic one. Reporting which style a trial used helps readers interpret how far its findings should travel.

    Limits of the design

    RCTs are not universally applicable. They can be expensive, may exclude populations seen in routine practice, and are sometimes unethical or impractical — you cannot randomise people to harmful exposures. Tightly controlled conditions can also limit generalisability, the gap between efficacy (does it work in the trial?) and effectiveness (does it work in the real world?). Transparent reporting and good documentation, as encouraged in our guidance for authors, help readers judge how far a trial’s findings extend.

    Frequently asked questions

    What makes randomisation so important?

    Randomisation distributes both known and unknown confounders evenly across groups, so that observed differences in outcome can be attributed to the intervention rather than to pre-existing imbalances.

    How is allocation concealment different from blinding?

    Allocation concealment hides the upcoming assignment from those enrolling participants, protecting the randomisation itself. Blinding hides group membership after assignment to prevent biased behaviour and assessment.

    Why use intention-to-treat analysis?

    Analysing participants in their assigned groups preserves randomisation and gives a pragmatic estimate of effect under realistic adherence, avoiding bias introduced by excluding non-completers.

    When is an RCT not appropriate?

    When randomisation would be unethical, impractical or impossible — for example for harmful exposures or rare conditions — observational designs may be the only feasible option, accepting their greater vulnerability to confounding.

  • Large Language Models in Research: An Explainer

    A large language model (LLM) is a type of artificial-intelligence model, built on the transformer neural-network architecture, that is trained on very large quantities of text to predict and generate language. At its core, an LLM learns the statistical patterns of language by repeatedly predicting the next token in a sequence; after training on enough text, this simple objective yields a system that can answer questions, summarise, translate and draft prose. Understanding how LLMs work — and where they fail — is now essential for researchers who use or evaluate them.

    Transformers and tokens

    The transformer, introduced in 2017, is the architecture underlying modern LLMs. Its key innovation is the attention mechanism, which lets the model weigh the relevance of different parts of the input when processing each element, capturing long-range relationships in text efficiently and in parallel. This made it practical to train far larger models than earlier sequence architectures allowed.

    LLMs do not read words directly. Text is broken into tokens — units that may be whole words, parts of words or punctuation — and each token is converted into a numerical vector. The model processes sequences of these tokens and predicts the next one, assigning probabilities across its vocabulary. Generation proceeds token by token. Because models have a finite context window, the amount of text they can consider at once is bounded, which matters when working with long documents.

    Pretraining and fine-tuning

    LLMs are typically built in two stages. Pretraining exposes the model to a vast, broad corpus, during which it learns general language patterns through next-token prediction — this is the costly, compute-intensive stage. Fine-tuning then adapts the pretrained model to specific tasks or behaviours using smaller, targeted datasets. A widely used form of alignment further tunes models with human feedback so their responses are more helpful and follow instructions. This two-stage design is why a single pretrained base can be specialised for many downstream uses, connecting LLMs to the broader story of neural networks and deep learning.

    Capabilities and limitations

    LLMs are capable assistants for drafting, summarising, translating, extracting information and explaining concepts. But their limitations are intrinsic, not incidental, and researchers must keep them in view.

    Capability Corresponding limitation
    Fluent, plausible text generation Hallucination — confident but false statements
    Broad knowledge from training data Knowledge cut-off; no awareness of newer events
    Summarising and synthesising sources Weak provenance — cannot reliably cite where claims came from
    Following instructions Sensitivity to phrasing; potential to reflect training-data bias

    The most important limitation for scholarship is hallucination: because an LLM generates statistically likely text rather than retrieving verified facts, it can produce fabricated references, false figures and incorrect claims stated with full confidence. It also lacks reliable provenance — it cannot, by default, tell you which source a statement came from. Outputs must therefore be independently verified, not trusted at face value.

    Responsible use and disclosure in research

    Used responsibly, LLMs can accelerate literature triage, drafting and coding. Used uncritically, they introduce errors, fabricated citations and undisclosed authorship concerns. Many journals and funders now require disclosure of generative-AI use in manuscripts, and most editorial policies hold that an LLM cannot be an author because it cannot take responsibility for the work. Good practice is to verify every factual claim and reference, keep a record of how the tool was used, and report that use transparently. Outputs produced or assisted by LLMs should be treated as research outputs subject to the same scrutiny and documentation as any other, described with consistent terminology. Our guidance for authors covers disclosure and documentation expectations, and reliable handling of model outputs intersects with sound data infrastructure and metadata practice.

    Frequently asked questions

    What is a token in a large language model?

    A token is the unit of text an LLM processes — a whole word, part of a word, or punctuation. Text is split into tokens and converted to numerical vectors; the model predicts the next token in sequence. A model’s context window limits how many tokens it can consider at once.

    What is the difference between pretraining and fine-tuning?

    Pretraining teaches a model general language patterns from a vast, broad corpus and is computationally expensive. Fine-tuning then adapts that pretrained model to specific tasks or behaviours using smaller, targeted datasets, so one base model can be specialised for many uses.

    Why do large language models hallucinate?

    Because they generate statistically likely text rather than retrieving verified facts. An LLM predicts plausible continuations, so it can state fabricated references or false figures with full confidence. Outputs must be independently verified, since the model has no built-in mechanism guaranteeing factual accuracy.

    Should I disclose using an LLM in my research?

    Yes. Many journals and funders require disclosure of generative-AI use, and most hold that an LLM cannot be a named author. Verify all claims and references, record how the tool was used, and report that use transparently in line with relevant editorial policy.

  • IEEE and AMA Citation Styles Explained

    IEEE citation uses bracketed numbers in the text that point to a numbered reference list, and is standard across engineering and computer science. AMA citation, used widely in medicine, uses superscript numbers instead. Both are numeric systems, but they differ in formatting, ordering and discipline.

    This guide explains how each style handles in-text markers and reference entries, with worked examples and a side-by-side table.

    IEEE: numbers in square brackets

    In IEEE style, each source is assigned a number the first time it is cited, in square brackets, and that number is reused for every later citation of the same source. References are listed in the order they first appear — not alphabetically.

    • In-text: Recent work on neural search has improved recall [1], and later studies confirmed it [2], [3].
    • Reused number: The original architecture [1] remains the baseline.
    • As a noun: As shown in [4], latency dropped sharply.

    A reference-list entry abbreviates author first initials and places the number in brackets:

    [1] J. Smith and A. Jones, “A scalable indexing method,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 4, pp. 110–128, 2021.

    AMA: superscript numbers

    AMA style places superscript numerals after the relevant text, again numbered in order of first appearance. The reference list follows the same numeric order. AMA dominates clinical and biomedical journals.

    • In-text: Adherence improved across the cohort.1
    • Multiple sources: Several trials reported the same effect.2,3
    • Range: The pattern held across studies.4-6

    A reference entry uses journal abbreviations and a specific punctuation pattern:

    1. Smith J, Jones A. Outcomes in the treatment cohort. J Clin Res. 2021;12(3):110-128.

    IEEE versus AMA at a glance

    Feature IEEE AMA
    Discipline Engineering, computer science Medicine, biomedicine
    In-text marker Square brackets [1] Superscript 1
    List order Order of appearance Order of appearance
    Author names Initials before surname: J. Smith Surname then initials: Smith J
    Title style Article title in quotes Article title, no quotes
    Journal name Abbreviated, italic Abbreviated, italic

    Why discipline drives style choice

    Numeric styles keep the running text uncluttered, which suits dense technical and clinical writing where a single sentence may lean on several sources. IEEE’s bracketed numbers double as compact cross-references to equations, figures and prior work; AMA’s superscripts keep medical prose readable at speed. Compare this with author-date approaches in our guide to Harvard referencing, where the author’s name carries into the sentence.

    For a wider map of the field, see citation styles compared, and for general technique, our practitioner guide to citing sources.

    Common pitfalls

    The most frequent IEEE error is alphabetising the reference list — it must follow first-appearance order. The most frequent AMA error is mixing in author-date phrasing (“Smith showed¹”) inconsistently; keep the superscript doing the work. In both styles, every number in the list must be cited at least once in the text, and vice versa. Our for authors guidance covers reference hygiene before submission.

    How citation style fits research outputs metadata

    Citation style governs the visible reference; controlled vocabulary in our dictionary and contributor attribution through CRediT govern the structured metadata around it. Together they make a paper’s outputs machine-readable. Explore more in research outputs.

    Frequently asked questions

    Are IEEE and Vancouver the same?

    They are close cousins — both numeric, both ordered by appearance — but differ in formatting detail, and Vancouver is associated with biomedicine while IEEE is associated with engineering. AMA is itself a Vancouver-derived medical style.

    Do IEEE numbers go inside or outside punctuation?

    IEEE brackets typically sit before the full stop, treated as part of the sentence: “…confirmed the result [2].”

    Can I cite the same AMA source twice?

    Yes — reuse its original number every time it appears, just as in IEEE.

    Which style should a computer science thesis use?

    IEEE is the conventional default for computer science and electrical engineering, but always follow your department’s or publisher’s stated requirement.

  • ORCID for researchers: connecting your identifier to your contributions

    Most researchers now have an ORCID iD, often created in a hurry because a journal or funder asked for one. Far fewer have a record that actually does the work an identifier is meant to do. An ORCID iD that sits empty, or that you copy facts into by hand, delivers almost none of its value. The point of the identifier is connection — to your publications, your grants, your affiliations, and the wider identifier ecosystem — and that is what this guide is about. The foundational explainer lives at persistent identifiers for authors, and this article is the practical companion.

    What an ORCID iD actually solves

    An ORCID iD is a persistent, unique identifier for an individual researcher — a sixteen-digit number, expressed as an HTTPS URI, that stays with you across name changes, institution moves, and career stages. The problem it solves is name disambiguation: in a literature full of common surnames, initial variations, and transliterations, a string name cannot reliably tell two researchers apart, and cannot reliably tie one researcher’s scattered outputs together. The iD does both. It distinguishes you from every other researcher who shares your name, and it gathers your contributions under one unambiguous, machine-readable identity.

    This is why funders and publishers increasingly require it. An ORCID iD on a submission or grant application means the work, the award, and the person can be linked without guesswork — the difference between a name a human must interpret and an identifier a system can resolve.

    Step 1: register and complete the core of your record

    Registration is free and takes minutes at orcid.org. The valuable part is what comes next: populating the record so it represents you. Add your employment and education affiliations, ideally selected from ORCID’s organisation lookup so they carry an organisation identifier rather than a free-typed string. Where the lookup is backed by ROR — the Research Organization Registry — your affiliation is anchored to a persistent organisation identifier, which is what lets systems reliably connect you to your institution. (For the organisation side of the ecosystem, see what is ROR.) Add alternative name forms and a short biography so that the record disambiguates you even where systems still rely on names.

    Step 2: let trusted organisations write to your record

    This is the step that turns a static profile into a living one, and it is the step most researchers skip. ORCID has a permissions model: you can grant a trusted organisation — a publisher, a funder, a repository, your institution’s research-information system — permission to read from and write to your record. Once granted, these systems can add works, grants, and affiliations for you, automatically and with provenance attached.

    • Authorise Crossref and DataCite auto-update so that when you publish an article or deposit a dataset with your iD, the output appears on your record without manual entry.
    • Grant your funders permission so that awards are written to your record from the authoritative source.
    • Connect your institution’s system so affiliations and outputs stay synchronised.

    The principle is enter-once, reuse-everywhere. A contribution asserted with your iD by a trusted source is more credible than one you typed yourself, because the assertion carries the provenance of the organisation that made it. The record stops being a CV you maintain and becomes a verified, auto-updating account of your work.

    The single highest-value action most researchers can take with ORCID is to turn on auto-update permissions for Crossref and DataCite. After that, publishing with your iD maintains your record for you.

    Step 3: use your iD everywhere it is asked for — and where it is not

    An identifier only disambiguates if it is attached at the moment of contribution. Enter your ORCID iD on every manuscript submission, every grant application, every dataset deposit, and every peer-review record. Each time you do, you create a verified link between the work and your identity that flows into the connected systems. Conversely, an output published without your iD is one your record cannot automatically claim, and one that name-based systems may attach to the wrong person.

    Step 4: connect ORCID to the rest of the identifier graph

    ORCID is one node in a connected ecosystem, and its value compounds when it is linked to the others. Your iD identifies you; ROR identifies your organisations; a DOI identifies your outputs; a grant identifier identifies your funding; and a project identifier such as RAID identifies the activity that ties them together. When your outputs carry your ORCID iD and your institution’s ROR ID, and your awards carry grant identifiers linked to your iD, the graph assembles itself: a query can move from you to your works to your funders to your institution without a single hand-typed reconciliation.

    This graph is also where contribution metadata lives. When a publisher records a CRediT statement and writes the relevant roles to your ORCID record alongside the publication, your iD begins to carry not just what you have published but what you did on each output — the richer, contribution-aware picture that responsible assessment depends on.

    A note on what ORCID will and will not do

    ORCID disambiguates and connects; it does not, by itself, validate the quality of a contribution or decide authorship. An auto-updated record is only as good as the assertions trusted sources write to it, and you remain responsible for reviewing your record and correcting errors. Keep the public-visibility settings deliberate, review incoming auto-updates periodically, and treat the record as something you curate, not something that runs entirely without you.

    Where shared vocabulary fits

    The identifier ecosystem works only when systems agree on what each identifier means and how they connect — what a “trusted organisation” permission grants, how an affiliation is asserted, how an output links to a person. A shared, federated vocabulary that defines these relationships and points back to ORCID and ROR for the authoritative infrastructure is what lets the graph hold together across systems. Supplying that definitional layer is the role the CASRAI dictionary is designed to play; the relevant terms sit in the persistent-identifiers domain.

    Related reading

  • Variance in Statistics: Definition and Formula

    Variance is a measure of how spread out a set of values is, defined as the average of the squared deviations of each value from the mean. A large variance means the data points are widely dispersed; a small variance means they cluster tightly around the mean. Because the deviations are squared, variance is always non-negative and is expressed in squared units of the original measurement.

    The definition of variance

    To calculate variance, you first find the mean of the data, then subtract the mean from each value to get the deviations. Squaring each deviation removes the sign (so positive and negative deviations do not cancel) and gives greater weight to values far from the mean. The average of these squared deviations is the variance.

    Variance is the foundation of many statistical methods, including the analysis of variance (ANOVA), regression diagnostics and the construction of confidence intervals. Reporting it transparently supports the goals set out in our reproducibility coverage.

    Population variance versus sample variance

    The formula depends on whether your data are the entire population or a sample drawn from it. For a population, you divide the sum of squared deviations by the number of values, N. For a sample, you divide by n − 1 instead of n. This adjustment, known as Bessel’s correction, produces an unbiased estimate of the population variance, because using the sample mean slightly underestimates the spread.

    Quantity Symbol Divisor
    Population variance σ² N
    Sample variance n − 1

    A worked conceptual example

    Suppose five replicate measurements give 4, 8, 6, 5 and 2. The mean is (4 + 8 + 6 + 5 + 2) / 5 = 5. The deviations from the mean are −1, 3, 1, 0 and −3. Squaring these gives 1, 9, 1, 0 and 9, which sum to 20. Treating the five values as a population, the variance is 20 / 5 = 4. Treating them as a sample, the variance is 20 / 4 = 5. The sample figure is slightly larger, reflecting Bessel’s correction.

    Variance and the standard deviation

    Variance and the standard deviation describe the same property of spread, but in different units. The standard deviation is simply the square root of the variance, which returns the measure to the original units of the data. In our worked example the population standard deviation is √4 = 2. Because the standard deviation is easier to interpret alongside the mean, it is often reported in papers; see our companion piece on the standard deviation for detail. Variance, however, has convenient mathematical properties, which is why it underlies so many statistical procedures.

    Interpreting variance correctly

    Because variance is in squared units, its absolute size is hard to interpret in isolation. A variance of 4 cm² is meaningful only relative to the scale of the measurement. Variance is also sensitive to outliers: squaring magnifies the effect of extreme values, so a single anomalous point can inflate the variance substantially. Always inspect your data distribution before reporting variance, and define the term consistently in your methods. The CASRAI dictionary and our author guidance encourage precise, reproducible statistical reporting.

    Frequently asked questions

    Why is variance squared rather than absolute?

    Squaring the deviations keeps the measure mathematically tractable and differentiable, which makes it the natural basis for least squares estimation and many other techniques. The absolute deviation is an alternative but lacks these convenient properties.

    When should I divide by n − 1 instead of n?

    Divide by n − 1 whenever your data are a sample used to estimate the variance of a wider population. Divide by N only when your data genuinely represent the entire population of interest.

    Is a high variance bad?

    Not inherently. High variance simply means greater spread. Whether that is good or bad depends on context: high variance in measurement error is undesirable, but natural biological variation may be expected and informative.

  • ANOVA (Analysis of Variance) Explained: Comparing Means Across Groups

    Analysis of variance (ANOVA) is a statistical method that tests whether the means of three or more groups differ by more than would be expected from random variation alone. It does this by comparing the variance between group means against the variance within groups, summarised in a single F-statistic. ANOVA is one of the most widely used inferential tests in experimental research, and reporting it transparently is central to reproducible analysis.

    Why ANOVA instead of multiple t-tests?

    A t-test compares two group means. When you have three or more groups, it is tempting to run a separate t-test for every pair. The problem is the family-wise error rate: each test carries its own chance of a false positive, and those chances accumulate. With three groups there are three pairwise comparisons; at a 5% significance level the probability of at least one false positive rises to roughly 14%, and it climbs further as groups are added. ANOVA solves this by performing a single omnibus test that asks one question: are any of the group means different?

    This control of error is why ANOVA underpins so much of experimental design. For a refresher on what significance thresholds mean in practice, see our explainer on p-values and statistical significance.

    The F-statistic and how it works

    ANOVA partitions the total variability in the data into two components. The between-groups variance reflects how far each group mean sits from the overall (grand) mean. The within-groups variance reflects the natural spread of observations inside each group. The F-statistic is the ratio of these two:

    F = between-groups variance / within-groups variance

    If the groups truly share a common mean, both quantities estimate the same underlying variability and F sits near 1. When real differences exist, the between-groups term grows and F rises. A large F, evaluated against the F-distribution with the appropriate degrees of freedom, yields a small p-value and signals that at least one mean differs.

    One-way versus two-way ANOVA

    The design depends on how many factors you are manipulating.

    Feature One-way ANOVA Two-way ANOVA
    Number of factors One independent variable Two independent variables
    Example question Does diet type affect plant growth? Do diet type and watering frequency affect plant growth?
    Main effects One Two (one per factor)
    Interaction Not assessed Tests whether factors combine non-additively
    Output Single F-statistic F-statistic for each main effect plus interaction

    The key advantage of two-way ANOVA is the interaction effect: it reveals whether the influence of one factor depends on the level of another, something separate analyses would miss.

    Assumptions you must check

    ANOVA rests on three core assumptions. Observations should be independent. The residuals should be approximately normally distributed. And the groups should show roughly equal variances, a property called homogeneity of variance (homoscedasticity). When variances differ markedly, a Welch ANOVA is a robust alternative; when normality fails, a non-parametric Kruskal-Wallis test may be more appropriate. Stating which assumptions were tested, and how, is good practice and supports replication, as we discuss across our reproducibility coverage.

    Post-hoc tests: locating the difference

    A significant ANOVA tells you that some mean differs, but not which one. Post-hoc tests answer that follow-up while still controlling the family-wise error rate. Tukey’s HSD is the standard choice for all pairwise comparisons with equal sample sizes; Bonferroni correction is conservative and simple; Scheffe’s test is flexible for complex contrasts. Crucially, you should not revert to uncorrected t-tests after a significant ANOVA, as that reintroduces the inflated error the test was designed to prevent.

    Equally important, statistical significance does not measure how large a difference is. Always pair ANOVA results with an effect size such as eta-squared, as covered in our companion piece on why effect size matters beyond significance. Authors planning a study should also budget adequate sample size and statistical power so a real effect can actually be detected.

    Frequently asked questions

    What does a significant ANOVA result actually tell you?

    It tells you that at least one group mean differs from the others by more than chance would explain. It does not identify which groups differ or how large the difference is; you need post-hoc tests and effect sizes to answer those questions.

    Can ANOVA be used for only two groups?

    Yes. With two groups a one-way ANOVA gives results mathematically equivalent to an independent-samples t-test (F equals t squared). ANOVA’s real value appears with three or more groups, where it prevents the error inflation of multiple t-tests.

    What is the difference between a main effect and an interaction?

    A main effect is the overall influence of one factor averaged across the others. An interaction means the effect of one factor changes depending on the level of another. Detecting interactions is the principal reason to use two-way rather than one-way designs.

    How should ANOVA results be reported for reproducibility?

    Report the F-statistic with both degrees of freedom, the p-value, an effect size, the post-hoc method used, and confirmation that assumptions were checked. The CASRAI dictionary and our guidance for authors set out the metadata that makes such results auditable.

  • Ethics review and the IRB/REC process: what researchers should expect

    For research that involves people — their bodies, their behaviour, their data, their tissue — ethics review is not a bureaucratic hoop to clear before the real work begins. It is a substantive safeguard, the mechanism by which a community of researchers commits, in advance, that the people they study will be respected, protected and treated fairly. Researchers who approach it as a formality tend to find it frustrating; those who understand what it is trying to achieve usually find it navigable. This article explains what an ethics committee does, the review tiers a researcher will encounter, and the principles that underpin the whole system, drawing on the framework set out in the compliance and regulatory domain of the CASRAI Dictionary.

    What the committee is called, and what it does

    The body that conducts this review goes by different names in different places. In the United States it is the Institutional Review Board (IRB); in the United Kingdom and much of Europe it is the Research Ethics Committee (REC); in Australia it is the Human Research Ethics Committee (HREC). The names differ but the function is the same: an independent group, including both expert and lay members, that reviews proposed research involving human participants to ensure it is ethically acceptable before it proceeds.

    What the committee weighs is consistent across these systems. It assesses whether the risks to participants are reasonable in relation to the anticipated benefits; whether participants will give genuinely informed and voluntary consent; whether the selection of participants is fair; whether privacy and confidentiality are adequately protected; and whether any vulnerable groups involved have additional safeguards. The committee’s independence matters because it is precisely the people closest to a project — its own investigators — who are least able to judge its risks dispassionately.

    The tiers of review

    One of the most useful things a researcher can understand early is that review is not one-size-fits-all. Most systems operate graded tiers of review scaled to the risk a study poses, and knowing which tier applies sets realistic expectations for time and scrutiny.

    • Exempt review is for certain categories of low-risk research — for example some research using anonymised existing data, or certain educational and survey studies — that meet defined criteria. ‘Exempt’ does not mean no review at all; it usually means the committee, not the investigator, confirms that the exemption applies.
    • Expedited review is for research that poses no more than minimal risk and falls within specified categories. It is conducted by one or a few experienced reviewers rather than the full committee, which makes it quicker without lowering the standard for the questions asked.
    • Full board review is for research that involves more than minimal risk, vulnerable populations, or sensitive interventions. The whole convened committee considers it, and this is the most thorough — and necessarily the slowest — route.

    The single most common cause of frustration is a mismatch of expectation: submitting a higher-risk protocol and expecting an expedited timeline. Identifying the likely tier at the planning stage, and building the corresponding time into the project, prevents most of that friction.

    The Declaration of Helsinki and its lineage

    None of this arose in a vacuum. The modern ethics-review system rests on a series of foundational documents written in response to historical abuses. The Declaration of Helsinki, developed by the World Medical Association, is the central statement of ethical principles for medical research involving human subjects, and it is periodically revised to keep pace with new challenges. It articulates duties that have become the bedrock of review: the wellbeing of the individual participant takes precedence over the interests of science and society; participation must be voluntary and informed; risks must be minimised and justified; and research must be conducted by suitably qualified people under proper protocols.

    Alongside Helsinki sit other touchstones — in the United States, the principles articulated in the Belmont Report (respect for persons, beneficence and justice) and the federal Common Rule that operationalises them. A researcher does not need to memorise these documents, but understanding that the committee’s questions descend from them helps make sense of why it asks what it asks.

    Informed consent, done properly

    If one element sits at the centre of review, it is informed consent. Consent is not a signature on a form; it is a process by which a potential participant comes to understand what the research involves, what risks and benefits it carries, that participation is voluntary, and that they may withdraw without penalty. Committees scrutinise consent materials closely — for readability, completeness and honesty — and pay particular attention where consent is complicated: research with children, with adults who lack capacity, in emergency settings, or across cultural and language differences. The recurring expectation is that the participant genuinely understands and genuinely chooses, not merely that a box has been ticked.

    Working with the process, not against it

    Researchers get the most out of ethics review by treating the committee as a collaborator in protecting participants rather than as an obstacle. That means engaging early, before a protocol is locked; writing the application for an intelligent non-specialist, since lay members are part of the point; being candid about risks rather than minimising them, because a committee trusts an application that confronts its own weaknesses; and remembering that review continues after approval, through reporting of adverse events, amendments and, often, continuing review. Recording ethics approvals and their status as structured compliance metadata — alongside other obligations and the recognition of contributors through the CRediT taxonomy — helps keep this information visible across the research record rather than buried in a filing cabinet. The consistent vocabulary for describing ethics review, approval status and the wider compliance landscape is maintained in the CASRAI Dictionary.