Category: Guides & Explainers

Practical how-to guides, templates, checklists, and career pathways for research administrators, authors, and institutional teams.

  • Citing Secondary Sources: The ‘As Cited In’ Rule

    A secondary citation occurs when you refer to a source you have not read yourself, having encountered it only through another author’s discussion. Scholarly convention requires you to be transparent about this using the “as cited in” (or “qtd. in”) formula. The guiding principle is simple: cite what you actually read, and wherever possible track down and cite the original source instead.

    Why the rule exists

    If author B quotes or summarises author A, and you have read only B, you cannot vouch for what A really said. B may have paraphrased loosely, quoted selectively or made an error. Citing A directly as if you had read it misrepresents your sources and risks propagating a mistake. The “as cited in” convention keeps the record honest by showing the reader the chain: the original idea came from A, but you read it in B. This honesty is part of the integrity of the scholarly record.

    Read the original where you can

    Secondary citation is a fallback, not a convenience. Before using it, try to obtain the original — through your library, interlibrary loan, or a DOI lookup. Reading the original lets you confirm the quotation, see its context and cite it directly. Use “as cited in” only when the original is genuinely unavailable (out of print, untranslated, lost).

    How major styles handle it

    The styles agree on the principle but differ in wording and in which source goes in the reference list. The general rule across styles is that the reference-list entry is for the work you actually read (the secondary source).

    Style In-text form Reference list
    APA (Smith, 1999, as cited in Jones, 2020) Jones (the source you read) only
    MLA (qtd. in Jones 45) Jones (the source you read) only
    Chicago (notes) Smith, [work], quoted in Jones, [work] Both may appear, with the relationship shown
    Harvard (author–date) (Smith 1999, cited in Jones 2020) Jones (the source you read) only

    Always confirm the exact punctuation against your specific style edition, as details vary between versions. A reference manager can format the entry, but secondary citations are a classic case where you must check the output by hand.

    Worked examples

    APA, in text: Early work on data reuse argued that incentives drive deposit (Smith, 1999, as cited in Jones, 2020). Only Jones (2020) appears in your reference list.

    MLA, in text: One critic calls the dataset “the backbone of reproducibility” (qtd. in Jones 45). Only Jones appears on the Works Cited page.

    In both cases the message to the reader is identical: the idea originates with Smith, but you read it in Jones, and Jones is what you can actually verify. For how this fits into building a full reference list, see our guide to compiling a bibliography.

    Good practice

    Minimise secondary citations; prefer originals; quote B’s reading of A accurately; and never silently cite A as if you read it. When you must use “as cited in”, be precise about page numbers from the source you read. These habits, alongside accurate reference checking, support honest scholarship. See our author resources, the dictionary and the research-outputs hub for more.

    Frequently asked questions

    What does ‘as cited in’ mean?

    It signals that you are citing an original source (the primary) that you encountered only through another work (the secondary) which you actually read. It keeps your sourcing honest by showing you did not read the original directly.

    Which source goes in my reference list?

    In most author–date styles, only the source you actually read — the secondary source — appears in the reference list. The original is named in the in-text citation but not listed, because you cannot verify it directly. Chicago notes style may show both.

    Is using ‘qtd. in’ the same as ‘as cited in’?

    Effectively yes. “Qtd. in” (quoted in) is the MLA wording, while “as cited in” is the APA and Harvard wording. Both indicate a secondary citation; use the form your style requires.

    When should I avoid secondary citation entirely?

    Whenever you can obtain the original. Reading the primary source lets you verify the quotation and context and cite it directly, which is always preferable. Reserve secondary citation for sources that are genuinely unavailable.

  • Big Data and the Vs of Data Explained for Research

    Big data refers to datasets so large, fast-moving or varied that traditional database tools cannot capture, store or analyse them within a reasonable time. It is defined less by an exact size threshold than by a set of characteristics, usually summarised as the “Vs”, and by the distributed computing methods needed to process it. In research, big data spans genomics, sensor networks, clinical records, social media and large-scale simulations.

    The defining Vs of big data

    The concept began with three Vs and has since expanded. The table below sets out the five most widely cited.

    Characteristic Meaning Research example
    Volume The sheer quantity of data, from terabytes to petabytes and beyond Whole-genome sequencing across cohorts
    Velocity The speed at which data is generated and must be processed Real-time readings from environmental sensors
    Variety The mix of formats: structured, semi-structured and unstructured Combining tables, images, text and audio
    Veracity The trustworthiness, accuracy and completeness of the data Cleaning noisy or missing clinical records
    Value The usefulness of insights that can be extracted Identifying disease risk factors at scale

    Volume, velocity and variety were the original three, capturing the scale, speed and heterogeneity that overwhelm conventional tools. Veracity was added to stress that more data is not automatically better data; noise, bias and gaps must be managed. Value reminds us that the point of all this effort is actionable insight, not collection for its own sake.

    Distributed processing: how big data is handled

    No single machine can hold or analyse a petabyte efficiently, so big data relies on distributed processing: spreading storage and computation across clusters of many machines that work in parallel. The foundational pattern was MapReduce, which splits a task into pieces, processes them across nodes, then combines the results. Frameworks such as Apache Hadoop and, later, Apache Spark made this approach mainstream, with Spark adding in-memory processing for far greater speed. Cloud platforms now offer this elasticity on demand, letting researchers scale resources to the dataset rather than the other way round.

    Big data in research, and its pitfalls

    Used well, big data lets researchers detect patterns invisible at small scale, model complex systems and test hypotheses across enormous samples. But scale brings risks. Large datasets can be biased or unrepresentative despite their size, and the volume can lull analysts into ignoring how the data was collected. Crucially, big data does not suspend statistical thinking: with millions of observations, almost any difference becomes statistically significant, which is exactly why effect size matters more than ever, and why a small p-value on its own means little. Big data also fuels machine learning, where larger samples help guard against the overfitting that plagues models trained on too little.

    Big data and FAIR principles

    The promise of big data depends on the data being usable, and that is where the FAIR principles, that data should be Findable, Accessible, Interoperable and Reusable, become essential. Findability requires rich metadata and persistent identifiers. Interoperability requires shared vocabularies, the kind standardised in the CASRAI dictionary, so that varied sources can be combined meaningfully. Reusability requires clear provenance and licensing. Without these foundations, a large dataset is merely a large liability. Our broader work on standards and metadata, including our guidance for authors and our reproducibility coverage, sets out how to make big research data dependable rather than just big.

    Frequently asked questions

    How big does data have to be to count as big data?

    There is no fixed size. Big data is defined by characteristics, the Vs, rather than a threshold. The practical test is whether traditional tools struggle to store or process it within a useful timeframe.

    What are the original three Vs?

    Volume, velocity and variety: the scale of the data, the speed at which it arrives, and the diversity of its formats. Veracity and value were added later to address quality and usefulness.

    Why is veracity important?

    Because size does not guarantee quality. Large datasets can contain errors, bias, duplicates and missing values. Veracity emphasises assessing and improving trustworthiness before drawing conclusions.

    How does big data relate to FAIR data?

    FAIR principles make big data usable by ensuring it is Findable, Accessible, Interoperable and Reusable. Shared vocabularies and persistent identifiers, such as those in the CASRAI dictionary, let varied large datasets be combined and reused reliably.

  • Incidence vs Prevalence: Key Epidemiological Measures

    Incidence and prevalence are two foundational measures in epidemiology that answer different questions about how a condition affects a population. Incidence measures how many new cases of a condition arise in a population over a period of time, capturing the rate at which cases occur. Prevalence measures how many cases exist in a population at a point in time or over a defined period, capturing the burden present. Confusing the two leads to serious misinterpretation, so the distinction is a methodological essential rather than a matter of terminology.

    Both measures rest on the same underlying ideas of a case, a population at risk, and a time reference, but they assemble those ingredients differently. Getting the definitions right is the first step to choosing the correct measure for a given research or planning question.

    How incidence is calculated

    Incidence quantifies new cases relative to a population at risk over time, and it comes in two common forms. Cumulative incidence divides the number of new cases by the number of people at risk at the start of the period, giving a proportion that approximates the average risk of developing the condition over that period. Incidence rate, sometimes called incidence density, divides new cases by the total person-time at risk, which accounts for individuals being observed for different lengths of time and for people entering or leaving the population. Both forms require defining the population at risk precisely, excluding those who already have the condition, and stating the observation window clearly. The person-time approach is particularly useful in studies where people are followed for varying durations, because each individual contributes time at risk only for as long as they are observed and remain capable of developing the condition. Expressing the result, for example, as cases per 1,000 person-years makes the time dimension explicit and allows fair comparison between groups followed for different lengths of time.

    How prevalence is calculated

    Prevalence divides the number of existing cases by the total population, counting everyone who currently has the condition regardless of when it began. Point prevalence refers to a single point in time, answering how many cases exist right now, while period prevalence covers a defined interval and counts anyone who had the condition at any time during that interval. Because prevalence includes both long-standing and recently arisen cases, it reflects the accumulated stock of cases in the population rather than the flow of new ones.

    Incidence and prevalence compared

    Feature Incidence Prevalence
    What it counts New cases arising Existing cases present
    Time element Over a period (flow) At a point or period (stock)
    Denominator Population at risk or person-time Total population
    Best for Studying causes and risk Describing burden and planning

    Data sources and case ascertainment

    Both measures depend on how reliably cases are identified, a process known as case ascertainment. Cases may be captured through disease registers, routine health records, notification systems for certain conditions, or purpose-designed studies, and each source has its own coverage and biases. Incidence is especially sensitive to the timing and completeness of detection, because it counts new cases within a defined window; if detection is delayed or incomplete, new cases may be missed or assigned to the wrong period. Prevalence is sensitive to whether long-standing cases remain on the source from which counts are drawn. For both measures, a clearly stated and consistently applied case definition is essential, because changes in definition or in how actively cases are sought can move the numbers independently of any real change. This is why epidemiological reporting standards emphasise documenting the data source, the case definition and the ascertainment method together with the measure itself.

    The relationship between them

    Incidence and prevalence are linked, and the link is intuitive once framed as flow and stock. In broad terms, prevalence reflects both how quickly new cases arise (incidence) and how long cases persist (duration). When a condition lasts a long time, even a modest incidence can produce a high prevalence, because cases accumulate faster than they leave the population through recovery or death. When cases resolve quickly, prevalence stays low even if incidence is high, because cases flow out almost as fast as they arrive. This conceptual relationship explains why the two measures can move in different directions: a change that shortens how long cases persist can lower prevalence even while incidence is unchanged or rising. For that reason the two measures must never be used interchangeably.

    Common pitfalls in interpretation

    Because the two measures are so often reported side by side, several errors recur. Treating prevalence as if it indicated risk is a frequent mistake: a high prevalence may reflect that cases persist for a long time rather than that the condition arises frequently, so prevalence alone says little about the chance of developing a condition. Comparing an incidence figure from one study with a prevalence figure from another, as though they were the same quantity, produces meaningless conclusions. A further pitfall is failing to define the population at risk consistently; if people who already have the condition are not excluded from the incidence denominator, the calculated incidence will be understated. Finally, both measures are sensitive to how a case is defined and detected: broadening the case definition or improving detection can raise measured incidence or prevalence without any real change in the underlying occurrence, which is why the case definition should always be reported alongside the figure.

    When to use which

    Use incidence when studying the development of a condition, investigating its causes, or evaluating risk, because it captures the flow of new cases and is the natural measure for cause-and-effect questions. Use prevalence when describing the existing burden, planning services and resources, or characterising how widespread a condition is at a moment in time, because it reflects the total caseload a system must manage. Reporting which measure was used, together with its denominator and time frame, is critical, and reporting guidelines such as STROBE prompt exactly this kind of clarity for observational studies.

    Both measures depend on accurate population denominators, which come from a census or population register, underscoring their place in research data infrastructure. The same denominators underpin death rates. Consistent terminology drawn from the CASRAI dictionary helps keep these definitions stable across studies, and authors can consult the guidance for authors when reporting them.

    Frequently asked questions

    Can incidence be higher than prevalence?

    It can, particularly for conditions that resolve quickly. Because prevalence reflects cases that persist, a condition with short duration may show high incidence but low prevalence, since new cases leave the population almost as fast as they arrive and do not accumulate.

    Why is the denominator different for each?

    Incidence uses the population at risk or person-time, because only those who can newly develop the condition are relevant to counting new cases. Prevalence uses the total population, because it counts all existing cases regardless of when they arose.

    Which measure should a study report?

    It depends on the question. Studies of causation and risk report incidence; studies of burden, planning and service provision report prevalence. The chosen measure, its denominator and its time frame should always be stated explicitly so readers can interpret it correctly.

  • Eigenfactor and Altmetrics: Beyond the Impact Factor

    Altmetrics are indicators of the online attention research attracts — mentions, shares, saves and references across the web and social platforms — while the Eigenfactor and its companion Article Influence score weight citations by the standing of the journals that make them. Together they extend evaluation beyond the traditional impact factor, but they measure attention and influence, not the intrinsic quality of any single study.

    Both families of indicator emerged from dissatisfaction with a single citation average. The Eigenfactor refines the citation signal itself; altmetrics capture engagement that citations miss entirely. Neither replaces careful reading, and both invite misinterpretation if treated as scores of merit. They are best thought of as additional lenses on a body of work, each illuminating something a single citation count obscures, rather than as rival verdicts competing to crown a winner.

    The Eigenfactor and Article Influence score

    The Eigenfactor score treats the scholarly literature as a network and ranks journals by the influence of the citations they receive, using an eigenvector method conceptually similar to how web pages are ranked by the importance of the pages linking to them. A citation from a heavily cited, influential journal counts for more than one from a peripheral source. Because the raw Eigenfactor scales with journal size, the Article Influence score normalises it per article, giving a per-paper measure of average influence that is comparable across journals of different sizes. A further refinement is that author self-citations between journals are typically discounted, so a journal cannot inflate its standing simply by citing itself. This network logic is shared with the prestige-weighted journal metrics covered in our guide to CiteScore, SNIP and SJR.

    Why network weighting changes the picture

    Network weighting matters because not all citations are equal. A flat count treats a citation from a marginal, rarely read journal exactly the same as one from a central, heavily cited venue, yet the two clearly carry different evidential weight. The eigenvector approach behind the Eigenfactor and the Article Influence score captures this by letting influence flow through the citation network: a journal cited by influential journals inherits some of that influence, recursively. The effect is to surface journals that are central to the scholarly conversation rather than merely voluminous. It also dampens the impact of citation farming and self-citation, because citations from low-influence sources contribute little. This is the same insight that powers the prestige-weighted journal metrics, and it is one reason network measures are harder to game than raw counts.

    What altmetrics measure

    Altmetrics aggregate diverse online signals: news coverage, policy-document references, social-media mentions, reference-manager saves and blog discussion. Their strengths are speed and breadth — attention accrues within days, long before citations appear, and captures reach into audiences such as practitioners, policymakers and the public that citation counts overlook. A paper influencing clinical guidance or public debate may register strongly in altmetrics while accumulating citations slowly. This timeliness makes altmetrics valuable for spotting emerging work and for evidencing societal reach in ways the slow accrual of citations cannot, particularly for research whose primary audience lies outside academia.

    The risk of gaming and manipulation

    Every metric that carries reward eventually attracts manipulation, and attention-based measures are especially vulnerable. Social-media mentions can be inflated by coordinated promotion, and raw counts can be padded by automated accounts, so a high altmetric score is not by itself evidence of genuine influence. Network-weighted citation measures are more robust, because influence must be conferred by sources that are themselves influential, but they are not immune to citation rings. The practical defence is the same in both cases: never treat a single number as decisive, look at the underlying sources, and combine quantitative signals with expert judgement of the work itself.

    What altmetric signals do and do not capture

    It helps to be precise about which signals carry which meaning. Some altmetric sources hint at scholarly or societal influence; others are pure visibility. The table below sketches the spectrum.

    Signal What it suggests How to read it
    Policy-document citations Uptake into practice or governance Strong societal-impact hint
    Reference-manager saves Scholarly interest from researchers Early engagement signal
    News coverage Public salience Reach, not validity
    Social-media mentions Topical attention Volatile; controversy-prone

    Attention is not impact

    The central caution is that online attention and scholarly impact are different things. A paper can be widely shared because it is controversial, surprising or even flawed; volume of mentions says nothing about validity. Altmetrics are best read as a measure of reach and engagement, complementary to citations rather than a substitute. Conflating the two risks rewarding visibility over rigour, and can even create perverse incentives to court attention rather than do careful work. Authors evidencing the reach of their own work — for example in narrative impact statements — can find guidance in our resources for authors, which encourage describing influence in context rather than leaning on a single attention score.

    Where these metrics complement citation counts

    Used well, the Eigenfactor family and altmetrics fill different gaps left by a simple citation average. The Eigenfactor refines the citation signal itself, distinguishing influential citations from peripheral ones — a logic it shares with the prestige-weighted journal indicators in our guide to CiteScore, SNIP and SJR. Altmetrics, by contrast, capture timely engagement and societal reach that citations record only slowly, if at all. The two are most useful in combination: citations for scholarly influence over time, altmetrics for early and broader attention, neither standing in for a reading of the work.

    Reading these indicators responsibly

    Both the Eigenfactor family and altmetrics should be interpreted within a responsible-assessment framework. The principles of DORA and responsible research assessment, alongside the Leiden Manifesto, stress quantitative indicators as support for — not a replacement of — expert judgement, transparency about what each metric does and does not capture, and avoidance of single-number rankings of people. The longstanding critique of the journal impact factor applies equally here: an indicator’s value depends entirely on using it for the question it can actually answer. Our broader coverage of responsible assessment sets out how these tools fit together.

    Frequently asked questions

    What does the Eigenfactor add over a citation count?

    It weights citations by the influence of the citing journal, so a citation from a highly cited source counts for more, capturing standing within the citation network rather than a flat tally.

    Why normalise to the Article Influence score?

    The raw Eigenfactor grows with journal size. Dividing by the number of articles yields a per-paper average influence that can be compared fairly across large and small journals.

    Do altmetrics show that research is good?

    No. Altmetrics show attention and engagement, not quality. A paper may attract mentions because it is controversial or flawed, so altmetrics complement rather than replace careful evaluation.

    How should these metrics be used responsibly?

    Use them as context alongside expert judgement, be transparent about what each measures, and avoid reducing researchers or papers to a single number — the core of DORA and the Leiden Manifesto.

  • Preregistration and Registered Reports Explained

    Preregistration is the practice of publicly specifying a study’s hypotheses, methods and analysis plan before any data are collected or examined. By fixing these decisions in advance and time-stamping them, preregistration draws a clear line between confirmatory tests planned ahead of time and exploratory analyses discovered along the way — a distinction that curbs questionable research practices and strengthens reproducibility. The plan is registered publicly so that it cannot be quietly revised once results are known, which is what gives the time stamp its force.

    The problem it addresses is well documented. When analysis choices are made after seeing the data, researchers can — often unconsciously — select the specification that yields a significant result, a practice known as p-hacking. Separately, studies with positive findings are more likely to be published than null results, producing publication bias that distorts the literature. Preregistration tackles the first; Registered Reports tackle both. The two practices grew out of the wider reproducibility movement, which found that a worrying share of published findings did not hold up when independent teams tried to repeat them — a problem driven in part by exactly these analytic and publication pressures. By making the research plan public and time-stamped before results exist, both practices restore a clear distinction between what was predicted and what was merely found.

    What preregistration involves

    A preregistration typically states the research question, the hypotheses, the sample size and stopping rule, the variables, and the precise analysis plan, lodged in a public registry with a time stamp. Templates and registries hosted on the Open Science Framework (OSF), maintained by the Center for Open Science, make this routine. Clinical trials have long used dedicated public registries for the same reason, and the practice has since spread across the social and life sciences. Because the plan is fixed, readers can verify that the reported confirmatory analysis is the one that was promised, and exploratory work is labelled as such rather than dressed up as a prediction. A good preregistration is specific enough that a third party could, in principle, run the planned analysis without further instruction.

    Confirmatory versus exploratory research

    The conceptual heart of preregistration is the distinction between confirmatory and exploratory research. Confirmatory research tests a specific, pre-stated hypothesis with a pre-specified analysis; its statistical guarantees — including the meaning of a p-value — depend on the analysis having been fixed in advance. Exploratory research, by contrast, searches the data for patterns and generates new hypotheses; it is valuable and necessary, but its findings are provisional and must be confirmed in fresh data. Problems arise when exploratory results are dressed up as confirmatory ones, lending them a false air of statistical rigour. Preregistration keeps the two honest by recording, with a time stamp, exactly which analyses were planned. Anything beyond that plan is legitimate exploration, simply labelled as such rather than presented as a prediction that came true.

    Registered Reports go further

    A Registered Report is a publication format in which the introduction, methods and analysis plan are peer-reviewed before data collection. If the question and design are judged sound, the journal grants in-principle acceptance — a commitment to publish the completed study regardless of whether the results are positive, negative or null, provided the authors follow the approved protocol. This decouples the publication decision from the outcome, directly attacking publication bias. A useful side effect is that reviewers can improve a study before it is run, when flaws can still be fixed, rather than critiquing an unchangeable design after the fact. This shifts peer review from gatekeeping to genuine quality improvement, and reduces the waste of running studies whose weaknesses only surface at submission.

    How each curbs bias

    Practice Reviewed before data? Mainly curbs
    Preregistration No (registry only) p-hacking, hidden analytic flexibility
    Registered Report Yes (stage-one peer review) p-hacking and publication bias

    The shared mechanism is timing: committing to decisions before outcomes are known removes the temptation, and the opportunity, to reshape a study around a desired result. This complements the rigour built into experimental designs such as the randomised controlled trial, where preregistered protocols make ITT and primary-outcome commitments verifiable.

    The two-stage Registered Report workflow

    What makes Registered Reports distinctive is their two-stage review. At stage one, reviewers evaluate the question’s importance and the soundness of the proposed methods and analysis before any data exist; sound proposals earn in-principle acceptance. At stage two, after the study is run, reviewers check that the authors followed the approved protocol and that conclusions match the registered plan — but they do not get to reject the paper simply because the results were null or unexciting. This sequencing is what severs the link between a study’s outcome and its publishability.

    Stage What is reviewed Decision
    Stage one Question, methods, analysis plan In-principle acceptance
    Data collection Conducted per approved protocol
    Stage two Adherence to plan, valid conclusions Publication regardless of result

    How they curb publication bias and p-hacking

    Publication bias arises when the literature over-represents positive findings because null results are harder to publish. By guaranteeing publication at stage one, Registered Reports ensure null and negative results enter the record, giving a more honest picture of the evidence. P-hacking — selecting the analysis that happens to reach significance — is curbed by both formats, because the analytic decisions are fixed and public before the data are seen. Together these mechanisms protect the integrity of confirmatory claims, much as the pre-specified primary outcomes of a randomised controlled trial protect its causal conclusions.

    Benefits and honest limits

    Preregistration improves transparency, makes exploratory work explicit and supports the reproducibility goals at the heart of the research lifecycle. It does not forbid exploration; it simply requires that exploratory findings be reported as such. Deviations from a plan are permitted when justified and disclosed, and preregistration cannot by itself guarantee a study is well designed — a poor plan, preregistered, is still a poor plan. Used alongside the standardised documentation described in the CASRAI dictionary and our guidance for authors, it makes the chain from hypothesis to result auditable.

    Frequently asked questions

    What is the difference between preregistration and a Registered Report?

    Preregistration time-stamps a plan in a public registry but is not peer-reviewed in advance. A Registered Report adds stage-one peer review and in-principle acceptance before data are collected, committing the journal to publish the results.

    How does preregistration reduce p-hacking?

    By fixing the hypotheses and analysis plan before the data are seen, it removes the ability to choose, after the fact, the specification that happens to produce a significant result.

    Does preregistration ban exploratory analysis?

    No. Exploration is encouraged, but it must be reported as exploratory rather than presented as a pre-planned confirmatory test. Justified deviations from the plan are allowed when disclosed.

    What is the Center for Open Science’s role?

    The Center for Open Science maintains the Open Science Framework, which hosts preregistration templates and registries and supports the Registered Reports format adopted by many journals.

  • How to Write a Research Abstract

    A research abstract is a concise, self-contained summary of an entire study — usually 150 to 300 words — that lets a reader grasp the purpose, methods, findings and conclusion without reading the full paper. It is often the only part indexed, read or searched, so it carries disproportionate weight.

    Follow the steps below to write one that is accurate, complete and discoverable.

    Step 1: Decide structured or unstructured

    Two formats exist:

    • Structured — explicit labelled sections (Background, Methods, Results, Conclusion). Common in medicine and many sciences; easy to scan.
    • Unstructured — a single continuous paragraph covering the same ground without headings. Common in the humanities and some social sciences.

    Check the target journal’s instructions first; the choice is usually dictated, not free.

    Step 2: Cover the IMRaD content

    Whether structured or not, a strong abstract mirrors the IMRaD shape of the paper itself — Introduction, Methods, Results and Discussion. Map each to a sentence or two:

    IMRaD element Abstract content
    Introduction Background and the gap or question
    Methods Design, participants, what was measured
    Results Key findings, including direction of effect
    Discussion What it means and the main conclusion

    For the full-paper version of this shape, see the anatomy of a journal article.

    Step 3: Respect the word limit

    Most journals set a limit between 150 and 300 words; conferences are often tighter. Write to the limit deliberately rather than trimming at the end — every sentence should earn its place. Cut background that the reader can infer, and never include citations, figures or undefined abbreviations.

    Step 4: Choose keywords

    Most journals ask for three to six keywords beneath the abstract. Choose terms a searcher would actually type, avoid repeating words already in the title where possible, and prefer recognised vocabulary. Controlled terms from our dictionary help here by aligning your keywords with terminology others use.

    Step 5: Write it last, edit it hardest

    Draft the abstract after the paper is complete, so it reflects what you actually found, then edit it more carefully than any other section because it is the most read. Read it aloud; if a sentence cannot stand alone, it is not abstract-ready. Our for authors guidance covers the final pre-submission pass.

    Common mistakes to avoid

    • Promising results in vague terms (“results are discussed”) instead of stating them.
    • Including information not present in the paper.
    • Adding citations or references — the abstract must stand alone.
    • Exceeding the word limit or padding to reach it.
    • Using undefined acronyms.

    Where your study reports an observational design, state it precisely — see cohort and case-control study designs for the terminology. And keep references in the body, formatted to your style; our guide to citation styles compared covers the options.

    How the abstract fits the research output

    The abstract is the front door to your output’s metadata. Contributor roles via CRediT and controlled terms in our dictionary describe the rest, making the work discoverable and attributable. Browse more in research outputs.

    Frequently asked questions

    How long should an abstract be?

    Usually 150 to 300 words, but always follow the specific journal or conference limit, which can be shorter.

    Should I write the abstract first or last?

    Last. Drafting it after the paper is finished ensures it accurately reflects the methods and findings.

    Can I include references in an abstract?

    Generally no. An abstract must be self-contained, so avoid citations, footnotes and figures.

    What is the difference between structured and unstructured abstracts?

    A structured abstract uses labelled sections such as Background and Methods; an unstructured abstract covers the same content as a single flowing paragraph. The journal usually specifies which to use.

  • Reliability and Validity in Psychological Measurement

    Reliability is the consistency of a measurement, while validity is whether the measurement captures what it is intended to capture. Together they are the two pillars of psychometrics. A psychological test is only as trustworthy as these properties allow, and reporting them is a basic expectation of credible, reproducible research.

    The three faces of reliability

    Reliability concerns whether a measure gives consistent results. It comes in several forms depending on the source of consistency being examined:

    • Test-retest reliability: do the same people get similar scores when measured again after a delay? High test-retest reliability suggests the instrument captures a stable attribute rather than transient noise.
    • Inter-rater reliability: when human raters score the same behaviour, do they agree? Strong inter-rater reliability shows that the result reflects the thing observed, not the observer.
    • Internal consistency: do items on a scale that are meant to measure one construct correlate with each other? This is commonly summarised by Cronbach’s alpha, which indexes how well a set of items hang together.

    The three faces of validity

    Validity concerns meaning—whether the score corresponds to the intended construct. The main types are:

    • Construct validity: does the test actually measure the abstract concept it targets, such as anxiety or numerical ability? Evidence accumulates from how scores relate to other measures as theory predicts.
    • Content validity: do the items adequately sample the full domain? A maths test that only covered addition would have poor content validity for general numeracy.
    • Criterion validity: does the score predict or correspond to an external benchmark, such as later performance or an established gold-standard measure?

    Reliability and validity at a glance

    Property Type Key question
    Reliability Test-retest Are scores stable over time?
    Reliability Inter-rater Do different raters agree?
    Reliability Internal consistency (Cronbach’s alpha) Do items measure one thing together?
    Validity Construct Does it measure the intended concept?
    Validity Content Do items cover the whole domain?
    Validity Criterion Does it predict a relevant outcome?

    Why a measure can be reliable but not valid

    This is the most important conceptual point in psychometrics, and it is worth stating carefully. Reliability is necessary but not sufficient for validity. A bathroom scale that always reads three kilograms heavy is perfectly reliable—it gives the same answer every time—yet it is not a valid measure of weight, because it is consistently wrong. Likewise, a personality questionnaire can produce stable scores that nonetheless do not correspond to the trait it claims to assess. A measure cannot be valid without being reliable, but it can be reliable without being valid. Validity is therefore the higher bar. The practical implication is that demonstrating consistency is only the first step; an instrument must additionally be shown to track the construct it names before its scores can support any substantive claim.

    How reliability is estimated in practice

    Each form of reliability has a characteristic study design. Test-retest reliability is estimated by administering the same measure to the same people twice and correlating the two sets of scores; the delay must be long enough that memory of the first sitting does not inflate agreement, but short enough that the trait itself has not genuinely changed. Inter-rater reliability is assessed by having two or more trained raters score the same material independently and computing their agreement, often with a coefficient that corrects for chance. Internal consistency is calculated from a single administration by examining how the items intercorrelate, with Cronbach’s alpha the most familiar summary. Reporting which coefficient was used, and its value, lets readers judge whether a measure is fit for purpose.

    A note on Cronbach’s alpha

    Alpha is ubiquitous but frequently misread. A high value does not by itself prove a scale measures a single construct; it is sensitive to the number of items, so long scales can post a respectable alpha even when their items are only loosely related. Conversely, a very high alpha may signal redundant, near-duplicate items rather than a well-rounded measure. Alpha is therefore best treated as one piece of evidence about internal structure, interpreted alongside the scale’s design and its factor structure, not as a single pass-or-fail threshold.

    Validity is an accumulating argument

    Modern psychometrics treats validity less as a fixed property a test “has” and more as an evidence-based argument that builds over time. Construct, content and criterion evidence each contribute, and a measure earns confidence as independent studies show its scores behaving as theory predicts—correlating with related measures, diverging from unrelated ones and predicting relevant outcomes. This framing explains why a brand-new instrument cannot simply be declared valid; validity is demonstrated through replication, which ties measurement quality directly to the field’s reproducibility agenda.

    Implications for research and assessment

    These properties are not academic niceties; they determine whether a finding will replicate. Instruments with poor reliability add noise that can mask real effects or generate spurious ones, a concern at the heart of the field’s work on reproducibility. Many critiques of popular tools reduce to validity questions—for example, the measurement objections to the Myers-Briggs Type Indicator concern reliability and construct validity. Sound responsible assessment requires that both properties be measured and disclosed.

    Reliability, error and the individual score

    Reliability has a direct, practical meaning for how much trust to place in a single person’s score. Every observed score can be thought of as a true score plus measurement error, and the lower the reliability, the larger that error band. The standard error of measurement translates a reliability coefficient into a margin of uncertainty around an individual’s result, which is why responsible test reports present scores as ranges rather than precise points. Ignoring this band is a common misuse: treating a one-point difference between two people as meaningful when it falls well within measurement error. For consequential decisions, the size of the error band can matter as much as the score itself, and it should be reported alongside the headline number.

    Reporting psychometrics transparently

    Researchers should report which reliability and validity evidence supports each measure, ideally with the relevant coefficients. Consistent terminology helps: defining terms in a shared research dictionary lets readers compare studies, and clear guidance for authors turns good intentions into routine practice. Transparency about measurement is one of the cheapest ways to improve the reliability of the literature as a whole.

    Frequently asked questions

    What is the difference between reliability and validity?

    Reliability is consistency—getting the same result repeatedly—while validity is accuracy—measuring the intended construct. A test must be reliable to be valid, but reliability alone does not guarantee validity.

    Can a test be reliable but not valid?

    Yes. A scale that consistently reads three kilograms too heavy is reliable but not valid. The result is stable yet systematically wrong, so it does not measure true weight.

    What is Cronbach’s alpha?

    Cronbach’s alpha is a common index of internal consistency. It estimates how well the items on a scale that are meant to measure one construct correlate with one another.

    Why do reliability and validity matter for reproducibility?

    Measures with weak reliability or validity add noise and bias, making findings harder to replicate. Reporting these properties is part of producing reproducible, trustworthy research.

  • Confidence Intervals in Research, Explained Precisely

    A confidence interval is a range of values, calculated from sample data, that is designed to contain the true value of an unknown population parameter with a stated level of confidence. A 95% confidence interval is produced by a procedure that, over many repeated samples, would capture the true parameter in about 95% of those intervals. It conveys both an estimate of the parameter and the uncertainty around that estimate, expressed as the width of the interval.

    The correct interpretation

    The confidence level is a property of the long-run procedure, not of any single interval. Once a specific interval has been calculated, the true parameter either lies inside it or it does not; there is no probability left to assign. It is therefore incorrect to say there is a 95% probability that the parameter lies within a particular calculated interval. The accurate statement is that if the study were repeated many times and an interval computed each time, about 95% of those intervals would contain the true value. This frequentist interpretation is subtle but important, and misstating it is one of the most common errors in applied statistics.

    Statement Correct?
    95% of intervals from repeated samples contain the true parameter Yes
    There is a 95% probability this specific interval contains the parameter No
    The interval shows a range of plausible values for the parameter Yes, a reasonable informal reading
    95% of the data fall within the interval No, that confuses it with a data range

    Width, precision and sample size

    The width of a confidence interval reflects the precision of the estimate. A narrow interval indicates a precise estimate; a wide one signals substantial uncertainty. Width depends chiefly on the variability in the data and on the sample size. Larger samples generally produce narrower intervals because the standard error shrinks as the sample grows. Raising the confidence level, say from 95% to 99%, widens the interval, because demanding greater confidence requires admitting a broader range of plausible values.

    Relationship to statistical significance

    Confidence intervals and significance tests are closely linked. For a comparison such as a difference between two means, if a 95% confidence interval for the difference excludes zero, the result is statistically significant at the 0.05 level; if the interval includes zero, it is not. The interval therefore conveys the same information as a p-value while adding crucial context: the estimated size of the effect and the range of values compatible with the data.

    Why intervals are often more informative

    Reporting a confidence interval communicates more than a bare p-value because it shows magnitude and precision together. A result may be statistically significant yet have an interval spanning only trivial effects, or be non-significant yet have an interval wide enough to include important ones. Many methodologists, including the authors of the American Statistical Association’s 2016 guidance on p-values, encourage reporting estimates with intervals rather than relying on significance thresholds alone. This practice supports clearer interpretation and stronger reproducibility, themes tracked in our reproducibility category. The underlying methods belong to the broader discipline of statistics, and consistent reporting terminology is documented in the CASRAI dictionary.

    Frequently asked questions

    What does a 95% confidence interval really mean?

    It means that the method used to build the interval would capture the true population value in about 95% of repeated samples. It is not a 95% probability that the true value lies in one particular calculated interval.

    Does a narrower interval always mean a better study?

    A narrow interval indicates a precise estimate, usually from a large or low-variability sample, but precision is not the same as validity. A precise estimate from a biased study can still be wrong. Width describes uncertainty from sampling, not freedom from bias.

    Should I report a confidence interval or a p-value?

    Where possible, report an effect estimate with its confidence interval, optionally alongside a p-value. The interval shows both the size and the precision of the effect, which is generally more informative for readers. See the CASRAI author guidance for reporting recommendations.

  • Correlation vs Causation in Research: Knowing the Difference

    Correlation describes the degree to which two variables move together, while causation means that a change in one variable actually produces a change in another. The central principle, often summarised as “correlation does not imply causation”, is that observing two things vary together is not sufficient to conclude that one causes the other. Distinguishing the two is one of the most important and most frequently neglected tasks in research.

    Measuring correlation with Pearson’s r

    The most common measure of linear correlation is Pearson’s correlation coefficient, written r. It ranges from minus one to plus one. A value of plus one indicates a perfect positive linear relationship, minus one a perfect negative linear relationship, and zero no linear relationship at all. Pearson’s r captures only the strength and direction of a straight-line association; it can miss strong but non-linear relationships, and it is sensitive to outliers. A high r tells you two variables track each other closely, but says nothing about why.

    Pearson’s r Interpretation
    +1.0 Perfect positive linear relationship
    0 No linear relationship
    −1.0 Perfect negative linear relationship

    Why correlation does not imply causation

    Two variables can be correlated for several reasons that have nothing to do with one causing the other. The direction of causation may be reversed, both may be driven by a third factor, or the association may simply be a coincidence in the data. The classic example is the correlation between ice cream sales and drowning incidents. Neither causes the other; both rise in hot weather, which is a confounding variable. A confounder is a variable associated with both the supposed cause and the supposed effect, creating a spurious link.

    Reason for correlation Example
    Genuine causation Smoking raises lung cancer risk
    Reverse causation Assuming illness causes a behaviour when the behaviour causes the illness
    Confounding Ice cream sales and drownings both driven by hot weather
    Coincidence Two unrelated trends that happen to move together

    Criteria for causal inference

    Because correlation alone is insufficient, researchers use additional reasoning to assess causation. In epidemiology the Bradford Hill considerations, set out by Austin Bradford Hill in 1965, offer a widely cited framework. They include the strength of the association, its consistency across studies, specificity, the correct temporal sequence (the cause must precede the effect), a biological gradient or dose-response relationship, plausibility, coherence with existing knowledge, experimental evidence and analogy. These are considerations to weigh, not a checklist to tick mechanically, and no single one proves causation on its own.

    Randomisation and experiments

    The strongest evidence for causation usually comes from a randomised controlled experiment. By randomly assigning participants to conditions, randomisation tends to balance both known and unknown confounders across groups, so that a difference in outcomes can more credibly be attributed to the intervention. Where experiments are impossible, careful observational designs attempt to control for confounders statistically, but they remain more vulnerable to hidden bias. Extreme data points can also distort correlation estimates, which connects to the separate task of outlier handling.

    Sound causal reasoning draws on the wider discipline of statistics and on transparent reporting of methods, both essential for reproducible findings. Related concepts such as statistical significance describe whether an association is unlikely under chance, but significance is still not causation. For terminology, see the CASRAI dictionary, the reproducibility category and the author guidance.

    Frequently asked questions

    What does Pearson’s r actually measure?

    Pearson’s r measures the strength and direction of a linear relationship between two continuous variables, on a scale from minus one to plus one. It does not capture non-linear relationships and does not establish that one variable causes the other.

    What is a confounding variable?

    A confounder is a third variable associated with both the supposed cause and the supposed effect. It can create a correlation between two variables that are not causally linked, which is why controlling for confounders is central to causal inference.

    How can researchers establish causation?

    Randomised controlled experiments provide the strongest evidence by balancing confounders across groups. Where experiments are not feasible, frameworks such as the Bradford Hill considerations, combined with careful adjustment for confounders, help build a case for causation, though no single study proves it conclusively.

  • CRISPR-Cas9: How Gene Editing Works as a Research Tool

    CRISPR-Cas9 is a programmable gene-editing system that uses a short guide RNA to direct the Cas9 enzyme to a matching DNA sequence, where Cas9 makes a precise cut so the sequence can be altered. As a research tool, it lets laboratories target specific genes for study; the foundational work on harnessing it as a programmable system is associated with Jennifer Doudna and Emmanuelle Charpentier.

    This article describes the mechanism and its use as a research method. It makes no clinical or therapeutic claims; the framing throughout is how CRISPR works as a laboratory tool and how its use is documented and governed.

    The bacterial origin of CRISPR

    CRISPR originates as a natural defence system in bacteria. The acronym stands for clustered regularly interspaced short palindromic repeats — segments of DNA that, together with associated (Cas) proteins, help bacteria recognise and cut the DNA of invading viruses. Researchers adapted this natural recognise-and-cut machinery into a programmable laboratory tool by supplying a custom guide RNA.

    How the guide RNA and Cas9 work together

    The system has two essential parts. The guide RNA is a short RNA sequence designed to match a chosen DNA target. The Cas9 enzyme is the molecular scissors that binds the guide RNA, locates the matching DNA, and introduces a cut at that site.

    Component Role
    Guide RNA Programmable sequence that directs the system to a specific DNA target
    Cas9 enzyme Binds the guide RNA and cuts the DNA at the targeted site
    Target DNA The genomic sequence selected for study or modification

    Because the guide RNA can be reprogrammed simply by changing its sequence, the same Cas9 enzyme can be directed to many different targets. That programmability is what makes CRISPR a flexible research method, and the precise notation of target sequences relies on standard conventions like those in the CASRAI dictionary.

    CRISPR as a research method

    In the laboratory, CRISPR-Cas9 is used to investigate gene function — for example, by disabling a gene and observing the result. Treating CRISPR as a method places it firmly within the research lifecycle: it must be planned, documented, executed and reported like any other experimental technique. Recording the exact guide-RNA sequences, target sites and reagents used is essential for others to interpret the work.

    Reproducibility and governance considerations

    Reproducibility depends on complete reporting. Independent researchers can only repeat or build on a CRISPR experiment if the guide-RNA design, target sequence, delivery method and verification approach are fully described. This connects CRISPR reporting to the standards-led thinking across our reproducibility coverage and to method-reporting frameworks such as those discussed in our guide to gene-expression reporting standards.

    Governance is the second consideration. Research use of gene editing is subject to institutional oversight and ethical review, and provenance — what was edited, how and under what approvals — should be documented. The same governance discipline appears in our coverage of stem-cell research registries and governance, and stable identifiers help link methods to outputs as set out in our note on persistent identifiers in 2026. For documentation practice, see our guidance for authors.

    Frequently asked questions

    How does CRISPR-Cas9 work?

    A short guide RNA is designed to match a chosen DNA sequence. The Cas9 enzyme binds the guide RNA, finds the matching DNA, and cuts it at that site, allowing the targeted sequence to be studied or altered in the laboratory.

    Where does CRISPR come from?

    CRISPR originates as a natural defence system in bacteria that recognises and cuts the DNA of invading viruses. Researchers adapted this recognise-and-cut machinery into a programmable laboratory tool by supplying a custom guide RNA.

    Who is associated with developing CRISPR as a tool?

    The foundational work on harnessing CRISPR-Cas9 as a programmable gene-editing system is associated with Jennifer Doudna and Emmanuelle Charpentier.

    What needs to be reported for a CRISPR experiment to be reproducible?

    Complete reporting should include the guide-RNA design and sequence, the target site, the delivery and verification methods, and the reagents used, so that independent researchers can interpret and repeat the work.