Category: Guides & Explainers

Practical how-to guides, templates, checklists, and career pathways for research administrators, authors, and institutional teams.

  • How FDA Drug Recalls Work: Classes and Process

    A drug recall is the removal or correction of a marketed medical product that is defective or potentially harmful. Recalls are a core mechanism of post-marketing safety: they let problems discovered after approval be addressed quickly. This article explains how recalls are classified and managed from a standards and process perspective; it is not clinical or regulatory advice.

    Who initiates a recall

    Most recalls in the United States are voluntary, carried out by the manufacturer either on its own initiative or at the FDA’s request. The FDA also has authority to require recalls of certain products. Either way, the agency oversees the process, classifies the recall by risk, and monitors that it is carried out effectively.

    The three recall classes

    Class Level of hazard
    Class I A reasonable probability that use will cause serious harm or death.
    Class II Use may cause temporary or medically reversible harm, or the probability of serious harm is remote.
    Class III Use is unlikely to cause harm but the product violates regulations (for example, a labelling or quality defect).

    A related action, market withdrawal, addresses a minor issue not subject to legal action, while a medical device safety alert warns of a risk from a device.

    What triggers a recall

    Recalls can be prompted by manufacturing defects discovered through quality control, contamination, incorrect labelling or dosing information, stability failures, or safety signals detected through pharmacovigilance and adverse-event reporting. The ability to act quickly depends on traceability — the GMP requirement that every batch can be identified and followed through the supply chain.

    How recalls are communicated and tracked

    The FDA publishes recalls in its Enforcement Report and issues public notices for higher-risk cases. Manufacturers must notify distributors and, where appropriate, the public, and demonstrate that affected product has been retrieved or corrected. The recall is closed only when the agency is satisfied it has been effective.

    Why recalls illustrate good record-keeping

    A recall is only possible because of disciplined documentation: batch records, distribution data and a quality system that links a defect back to its source. This is the same logic of provenance and traceability that underpins reproducible research and trustworthy metadata. For the manufacturing-quality foundations, see our Good Manufacturing Practice explainer; for the regulator’s wider role, see the FDA and drug approval.

    Frequently asked questions

    What is a drug recall?

    A drug recall is the removal or correction of a marketed medicine that is defective or potentially harmful, used to address safety or quality problems found after a product reaches the market.

    What are the FDA recall classes?

    Class I covers products that could cause serious harm or death; Class II covers products that may cause temporary or reversible harm; and Class III covers products unlikely to cause harm but that breach regulations.

    Are recalls usually ordered by the FDA?

    Most recalls are voluntary actions by manufacturers, often at the FDA’s request. The agency oversees, classifies and monitors recalls, and has authority to require them for certain products.

    How does traceability make recalls possible?

    Good Manufacturing Practice requires that every batch be identifiable and traceable through the supply chain, so a defect can be tracked to its source and affected product retrieved efficiently.

  • Anatomy of a Journal Article: The IMRaD Structure

    A journal article reporting empirical research is most often organised according to the IMRaD structure — Introduction, Methods, Results, and Discussion — a standardised arrangement that lets readers locate any part of the argument quickly and lets the work be evaluated and reproduced. Wrapped around this core are the abstract, the references and the metadata that make the article discoverable and citable.

    This guide walks through each component, explaining what it contains, how to read it efficiently and how to write it well.

    The abstract: the article in miniature

    The abstract is a concise summary, usually a single paragraph, that states the question, the approach, the key findings and their significance. It is the most-read part of any paper and is what appears in databases and search results, so it must stand alone. When reading, start here to decide whether the full paper is relevant; when writing, draft it last, once the rest of the article is settled.

    Introduction: why the work matters

    The Introduction sets the context, reviews relevant prior work, identifies the gap or problem the study addresses, and states the research question or hypothesis. A common and effective shape is a funnel: from the broad field, to the specific gap, to the precise aim of this study. Readers use it to understand motivation; writers should make the contribution and its novelty explicit by the end of the section.

    Methods: how it was done

    The Methods section describes the materials, participants, procedures and analyses in enough detail that a competent peer could reproduce the study. This section is the backbone of reproducibility and is scrutinised closely during peer review. When reading critically, this is where you judge whether the conclusions are actually supported; when writing, prioritise completeness and precision over narrative flair, and cite data and code where they are deposited.

    Results: what was found

    The Results section presents the findings — figures, tables and statistics — without interpretation. The discipline of separating results from their interpretation is what keeps the evidence distinguishable from the argument built upon it. Readers should compare the results against the stated aims; writers should report findings neutrally and let the Discussion do the interpretive work.

    Discussion: what it means

    The Discussion interprets the results, relates them back to the original question and the prior literature, acknowledges limitations, and considers implications and future directions. A frequently used shape mirrors the Introduction in reverse: from the specific findings outward to their broader meaning. This is where authors make their case, and where readers weigh whether the interpretation is justified by the evidence in Results.

    Section Question it answers Read it to…
    Abstract What is this, in brief? Decide relevance
    Introduction Why does it matter? Understand motivation
    Methods How was it done? Judge rigour and reproducibility
    Results What was found? See the evidence
    Discussion What does it mean? Weigh the interpretation

    References and metadata

    The references list the sources cited, anchoring the work in the existing literature and enabling citation indexing — the very links that systems such as Web of Science record. Surrounding the article is its metadata: title, authors, affiliations, the journal’s ISSN, and persistent identifiers. Most published articles carry a DOI that makes them permanently citable and resolvable, while author identifiers such as ORCID and organisation identifiers such as ROR — part of the wider PID stack — disambiguate who and where.

    Authorship and contribution are increasingly recorded with the CRediT taxonomy, which assigns standardised roles to each contributor. Guidance on preparing manuscripts is available on our for-authors page, and definitions of the structural terms used here appear in the CASRAI dictionary.

    Beyond IMRaD

    IMRaD is the dominant pattern for empirical reports, particularly in the sciences, but it is not universal. Review articles, theoretical papers and humanities scholarship often use different structures. Even so, the underlying logic — context, approach, evidence, interpretation — tends to persist in some form, which is why understanding IMRaD helps in reading almost any scholarly article.

    Frequently asked questions

    What does IMRaD stand for?

    IMRaD stands for Introduction, Methods, Results, and Discussion — the four core sections of a typical empirical research article, usually preceded by an abstract and followed by references.

    In what order should I write the sections?

    Many writers draft Methods and Results first, since they are most concrete, then the Discussion and Introduction, and finally the abstract once everything else is fixed. Reading order and writing order need not match.

    Why are Results and Discussion kept separate?

    Separating them keeps the evidence (Results) distinct from its interpretation (Discussion), so readers can evaluate the findings independently of the authors’ conclusions about what those findings mean.

    Do all journal articles follow IMRaD?

    No. IMRaD suits empirical studies, especially in the sciences, but reviews, theoretical pieces and humanities work often use other structures. The IMRaD logic, however, frequently underlies them in adapted form.

  • Citation Styles Compared: APA, MLA, Chicago, Vancouver

    Citation styles are standardised systems that prescribe how to format in-text citations and reference entries so that sources are credited consistently and can be retrieved reliably. The major styles — APA 7th edition, MLA 9th edition, the Chicago Manual of Style 17th edition, and Vancouver — share the same underlying data elements but differ in how those elements are ordered, punctuated and signalled in the text.

    Choosing a style is rarely a free decision: each discipline has settled on conventions, and journals, publishers and institutions specify which to use. The skill is applying the chosen style consistently, not memorising all of them.

    The four major styles at a glance

    Style Typical disciplines In-text format End-of-text list
    APA 7 Psychology, education, social sciences Author–date: (Author, Year) References, alphabetical
    MLA 9 Humanities, languages, literature Author–page: (Author 14) Works Cited, alphabetical
    Chicago 17 History, arts, some social sciences Notes-bibliography (footnotes) or author–date Bibliography or References
    Vancouver / ICMJE Medicine, biomedical sciences Numeric: [1] or superscript References, by order of citation

    APA 7th edition

    APA is an author–date style dominant in psychology and the social sciences. In-text citations carry the author surname and year, with a page number for direct quotations. The end list is titled “References”, alphabetised by surname, with a strong emphasis on the publication year because currency matters in empirical fields. DOIs are included as full https links.

    MLA 9th edition

    MLA serves the humanities, where the location of a phrase within a work often matters more than the year. Its in-text citation is author–page — (Author 14) — and the end list is titled “Works Cited”. MLA 9 organises an entry around a template of “core elements” (author, title of source, title of container, and so on), which makes it adaptable to non-traditional sources.

    Chicago Manual of Style 17th edition

    Chicago is distinctive for offering two complete systems:

    • Notes–bibliography. Used in history and the arts. Citations appear as numbered footnotes or endnotes, with a full bibliography at the end. This suits narrative disciplines where discursive notes add value.
    • Author–date. Used in the sciences and some social sciences, this variant works like APA — an in-text (Author Year) marker keyed to an alphabetical reference list.

    The existence of two Chicago systems is the most common source of confusion; always confirm which one a publisher expects.

    Vancouver and ICMJE

    Vancouver is the numeric style of medicine and the biomedical sciences, aligned with the recommendations of the International Committee of Medical Journal Editors (ICMJE). Sources are numbered in order of first appearance and listed in that order. This keeps dense clinical text uncluttered — a paper may cite dozens of sources per page — at the cost of hiding authorship behind a number. The mechanics of numeric versus author–date markers are detailed in in-text citations versus the reference list.

    Choosing the right style

    The decision usually follows a simple hierarchy:

    • Follow the journal or publisher first. Their author guidelines override personal preference.
    • Then follow your discipline. Social sciences default to APA, humanities to MLA, history to Chicago notes-bibliography, medicine to Vancouver.
    • Then follow your institution. A department or supervisor may mandate a house style.
    • Apply it consistently. Mixing styles within one document is itself an error, regardless of which you choose.

    Whatever style applies, the underlying data — author, year, title, container, persistent identifier — stays the same; only the presentation changes. Recording those elements accurately, and disambiguating authors with an ORCID iD, is what makes switching styles painless. This consistency also supports research integrity by keeping every claim traceable to a retrievable source. Practical help for applying a style is in our resources for authors.

    Frequently asked questions

    What is the difference between APA and MLA?

    APA uses author–date in-text citations and is standard in the social sciences, emphasising the year of publication. MLA uses author–page citations and is standard in the humanities, emphasising the location of material within a source. Their end lists are “References” and “Works Cited” respectively.

    Why does Chicago have two systems?

    Chicago offers a notes-bibliography system, suited to history and the arts where footnotes carry discursive comment, and an author–date system suited to the sciences. The two serve different writing cultures, so always confirm which variant a publisher requires.

    Which citation style should I use?

    Use the style your target journal, publisher or institution specifies; if none is mandated, follow your discipline’s convention. The most important rule is to apply a single style consistently throughout the document.

    Do all styles include a DOI?

    The major styles all accommodate persistent identifiers such as DOIs, because they make references durable and retrievable. The exact placement and formatting differ by style. See the CASRAI dictionary for standardised term definitions.

  • AI Model Documentation: Datasheets and Model Cards

    Model cards are short, structured documents that report what an AI model does, how it was evaluated, and the conditions under which it should and should not be used. Together with datasheets for datasets, which document the data a model is trained and tested on, they form the backbone of responsible-AI documentation. Both were proposed to bring the same rigour to AI artefacts that established disciplines bring to materials and reagents, and both directly support reproducibility, accountability and the integrity of the research record.

    Model cards (Mitchell et al. 2019)

    Model cards were introduced by Mitchell and colleagues in 2019 as a framework for transparent model reporting. A model card accompanies a trained model and records, in a consistent format, the essential facts a user needs to decide whether the model is appropriate for their purpose. Crucially, model cards emphasise disaggregated evaluation: reporting performance not only in aggregate but across relevant subgroups, so that uneven performance is visible rather than hidden behind a single headline number.

    A typical model card covers model details (who built it, version, architecture), intended use and out-of-scope uses, evaluation data and metrics, performance across conditions, and ethical considerations, limitations and caveats. By stating intended and prohibited uses explicitly, a model card reduces the risk of a model being deployed in a context it was never validated for.

    Datasheets for datasets (Gebru et al.)

    Datasheets for datasets, proposed by Gebru and colleagues, apply the same documentation philosophy to data. A datasheet answers questions about a dataset’s whole life cycle: the motivation for creating it, its composition (what the instances represent, how many, whether sensitive data is present), the collection process, any preprocessing, cleaning or labelling, intended and discouraged uses, distribution terms, and arrangements for maintenance. Because so many problems in machine learning originate in the data, documenting it is often more consequential than documenting the model.

    Artefact Documents Key contents
    Model card A trained model Intended use, evaluation, disaggregated performance, limitations
    Datasheet for datasets A dataset Motivation, composition, collection, preprocessing, uses, maintenance

    How they support reproducibility and accountability

    Documentation turns an opaque artefact into an auditable one. A model card tells a future researcher exactly which model version and evaluation protocol produced a published result, while a datasheet records the data provenance needed to interpret or rebuild that result. This is the documentation layer that complements the engineering practices in our guide to reproducibility of machine learning research: code and seeds make a result re-runnable, while cards and datasheets make it interpretable and accountable.

    These artefacts also support the broader disclosure expectations now common in scholarly publishing. When generative AI features in a study, documenting the model and its data complements the editorial requirements covered in our explainer on generative AI and research disclosure norms and across our GenAI disclosure coverage.

    Embedding documentation in the research record

    For documentation to be useful it must be findable and citable as part of the scholarly record, not buried in a code repository. Treating model cards and datasheets as first-class research outputs supports proper credit assignment through frameworks such as CRediT and consistent description through the casrai.org research dictionary. Doing so recognises the substantial work of data curation and evaluation that these documents describe.

    Frequently asked questions

    What is a model card?

    A model card is a structured document, proposed by Mitchell et al. in 2019, that reports an AI model’s intended use, evaluation results (including across subgroups), limitations and ethical considerations, so users can judge whether it suits their purpose.

    What is a datasheet for datasets?

    A datasheet, proposed by Gebru et al., documents a dataset’s motivation, composition, collection and preprocessing, intended uses and maintenance, capturing the data provenance needed to interpret or reproduce results.

    How do model cards differ from datasheets?

    Model cards document a trained model; datasheets document the dataset behind it. Used together, they describe both the artefact and the data that shaped it.

    Why does AI documentation matter for reproducibility?

    It records which model version, evaluation protocol and data produced a result, turning an opaque artefact into an auditable one that others can interpret, scrutinise and rebuild.

  • Reporting Analytical Methods Reproducibly

    Reporting an analytical method reproducibly means describing the measurement in enough detail that an independent researcher could repeat it and obtain comparable results. A method that cannot be reproduced from its description undermines the findings built on it, however careful the original work. This article sets out what belongs in an analytical methods report: the instrument and its settings, calibration and reference materials, validation, and the reporting guidelines and protocol repositories that structure good practice. It is guidance on documentation, applicable across techniques.

    Why the methods section carries the weight

    Every measurement technique converts a property of a sample into a signal through a chain of physical steps, each governed by parameters the operator chooses. Whether the instrument is an MRI scanner, a mass spectrometer or a thermal cycler running a PCR, the result depends on settings that another laboratory cannot guess. The methods section is where those choices are recorded. If it omits them, the experiment is effectively unrepeatable, and the published result becomes a claim rather than a verifiable observation.

    What to record about the instrument and parameters

    A reproducible report identifies the instrument precisely and lists the settings that affect the measurement. The level of detail should be enough that a competent reader could configure their own equipment to match.

    Category Examples of what to report
    Instrument Make, model and relevant configuration of the apparatus
    Acquisition settings The technique-specific parameters that govern signal generation
    Sample preparation How the sample was prepared, stored and presented to the instrument
    Data processing Software, versions, transforms and any filtering applied to raw data
    Environment Conditions such as temperature that materially affect the result

    Reporting the data-processing chain matters as much as the acquisition. Many techniques apply substantial mathematical transformation between raw signal and reported value, and an undocumented processing step can change results as much as a hardware setting. Naming software and versions makes the analysis traceable.

    Calibration and reference materials

    An instrument’s raw output is meaningful only against a known scale. Calibration ties the measurement to a reference, and reporting how and when calibration was performed lets others judge and reproduce the accuracy. Where certified reference materials exist, samples of known composition or known value, citing them anchors a method to a community-agreed standard and allows cross-laboratory comparison. A report should state what was used to calibrate, the reference materials employed, and how often calibration was checked over the course of the work.

    Validation and uncertainty

    Validation establishes that a method does what it claims under the conditions of use. Depending on the technique this may include assessing the limit of detection, the range over which response is reliable, repeatability between runs, and the method’s sensitivity to small changes in conditions. Reporting these characteristics, together with an honest statement of measurement uncertainty, tells readers how much weight a number can bear. A value quoted without any indication of its uncertainty invites overinterpretation and is difficult to reproduce meaningfully.

    Reporting guidelines and protocol repositories

    Researchers rarely have to design a reporting structure from scratch. Many fields maintain community reporting guidelines that enumerate the minimum information a methods section should contain for a given type of study, reducing the risk of leaving out a critical parameter. Alongside these, protocol repositories such as protocols.io let authors publish a step-by-step procedure as a citable object, with a persistent identifier, separate from the constraints of a paper’s word limit. Linking a manuscript to a deposited protocol gives readers the full operational detail and a stable reference. Using a recognised guideline and depositing the detailed protocol together address the two failure modes of methods reporting: omission and lack of granularity.

    Consistent terminology supports all of this; the CASRAI dictionary standardises the vocabulary used to describe research outputs and processes, and our reproducibility coverage explores related practices. Practical author-facing guidance is collected in our guidance for authors.

    Frequently asked questions

    How much detail is enough?

    The working test is whether a competent independent researcher could repeat the measurement from the description alone and expect comparable results. If any parameter that materially affects the outcome would have to be guessed, the report is incomplete. Depositing a full protocol alongside the paper is a reliable way to reach that bar.

    Why report data processing as well as acquisition?

    Many techniques transform raw signal substantially before producing a reported value, through Fourier transforms, baseline corrections, filtering or thresholding. An undocumented processing step can alter results as much as a hardware change, so software, versions and transforms should be recorded as part of the method.

    What role do reference materials play?

    Certified reference materials provide a known value against which an instrument can be calibrated and across which laboratories can compare. Citing them anchors a method to a shared standard, which is central to making measurements comparable and reproducible across sites.

    Where do reporting guidelines and protocols.io fit?

    Reporting guidelines define the minimum information a methods section should contain, guarding against omission. Protocol repositories such as protocols.io let authors publish granular, citable, versioned procedures that exceed a paper’s space limits. Used together they cover both completeness and detail, as discussed across our research lifecycle coverage.

  • Quantum Computing: Principles and Research Implications

    Quantum computing is a model of computation that uses quantum-mechanical phenomena — chiefly superposition and entanglement — to represent and manipulate information using quantum bits, or qubits. Unlike a classical bit, which is definitively 0 or 1, a qubit can exist in a superposition of both states until measured, and groups of entangled qubits exhibit correlations with no classical analogue. These properties allow certain problems to be expressed in ways that, in principle, require fewer operations than the best known classical algorithms.

    Quantum computing does not make every computation faster, and it does not replace classical computers. It offers potential advantage on a narrow class of structured problems. Understanding which problems — and recognising current hardware limits — matters for any researcher assessing the field.

    Qubits, superposition and entanglement

    A qubit is the basic unit of quantum information. Where a classical bit holds one value, a qubit’s state is a combination of the basis states usually written as |0⟩ and |1⟩. This superposition means a register of n qubits can represent a combination of 2n basis states simultaneously. Crucially, you cannot read all of those amplitudes out directly: measurement collapses the qubit to a single classical outcome with a probability set by its amplitude.

    Entanglement is a correlation between qubits such that the state of the whole system cannot be described as independent parts. Measuring one entangled qubit constrains the outcomes of others. Quantum algorithms exploit superposition and entanglement together with interference — arranging amplitudes so that wrong answers cancel and correct answers reinforce — to extract useful results from measurement.

    How it differs from classical computing

    Classical computers are deterministic machines built on bits and Boolean logic gates. Quantum computers use quantum gates that perform reversible, unitary operations on qubit states. The theoretical promise lies in specific algorithms: Shor’s algorithm factors large integers in polynomial time (with implications for cryptography), and Grover’s algorithm offers a quadratic speed-up for unstructured search. Quantum simulation — modelling molecules and materials whose behaviour is itself quantum — is widely regarded as the most natural near-term application.

    These advantages are problem-specific and proven only as algorithms; realising them at useful scale depends on hardware that does not yet exist. The distinction between theoretical algorithmic advantage and practical, demonstrated advantage is the single most common source of hype in the field.

    The NISQ era

    Today’s machines are described as NISQ — Noisy Intermediate-Scale Quantum — devices, a term coined by John Preskill in 2018. They have tens to a few hundred qubits, and those qubits are noisy: they lose coherence quickly and accumulate errors during gate operations. Fault-tolerant quantum computing, which uses quantum error correction to combine many physical qubits into fewer reliable logical qubits, remains a research goal rather than a deployed reality.

    Aspect NISQ devices (today) Fault-tolerant (goal)
    Qubit count Tens to low hundreds (physical) Many physical per logical qubit
    Error correction Limited / partial Full quantum error correction
    Coherence Short; noise dominates depth Long effective coherence via logical qubits
    Typical use Experiments, benchmarks, hybrid algorithms Shor-scale factoring, large-scale simulation

    Realistic research implications

    For most disciplines, the immediate implication of quantum computing is preparatory rather than transformative. Chemistry, materials science and condensed-matter physics have the clearest path to benefit through quantum simulation. Cryptography faces a long-horizon risk: because Shor’s algorithm threatens widely used public-key schemes, standards bodies have begun standardising post-quantum (quantum-resistant) cryptography now, even though a cryptographically relevant quantum computer does not yet exist. Research-data managers should track this as a future migration concern for long-lived encrypted archives.

    Quantum methods also intersect with machine learning, though claims of broad quantum-ML advantage remain unproven and contested. Researchers evaluating the field should treat results as research outputs requiring the same reproducibility scrutiny as any computational study, and describe quantum and classical components with consistent standardised terminology so claims can be compared. Sound documentation and metadata practice matters here exactly as it does for data infrastructure generally.

    Frequently asked questions

    Is quantum computing faster than classical computing?

    Only for specific, structured problems where a quantum algorithm exists. For most everyday computing tasks a quantum computer offers no advantage, and classical machines remain superior. Speed-ups are proven for particular algorithms, not for computing in general.

    What is the NISQ era?

    NISQ stands for Noisy Intermediate-Scale Quantum. It describes today’s devices, which have a modest number of error-prone qubits and lack full error correction. They support experiments and hybrid algorithms but cannot yet run large fault-tolerant computations.

    Should researchers worry about quantum computing breaking encryption?

    Not imminently, but it is a real long-term concern. Shor’s algorithm could break widely used public-key cryptography once sufficiently powerful, fault-tolerant machines exist. Migration to post-quantum cryptography is being standardised now to protect data that must stay secure for decades.

    How does this relate to machine learning?

    Quantum machine learning is an active research area, but broad advantages are unproven. For grounding in the classical methods quantum approaches are compared against, see our explainer on machine learning concepts and methods and the companion piece on supervised versus unsupervised learning.

  • Plagiarism Detection Software: How It Works and Its Limits

    A plagiarism checker is text-matching software that compares a submitted document against a large corpus of sources and reports where the wording overlaps. It produces a similarity score and highlights the matching passages. The essential point for responsible use: such software identifies matching text — it does not judge intent and it does not prove plagiarism. A human must interpret the report.

    How text-matching software works

    Tools such as Turnitin and iThenticate (described here neutrally) break a submission into fragments and compare them against a corpus that may include published articles, web pages, books and previously submitted student work. Where fragments match a source, the software records the overlap and aggregates it into a similarity percentage, with the matched passages colour-coded and linked to their apparent sources. The output is a similarity report, not a verdict.

    What the similarity score does and does not mean

    A high score is not automatically plagiarism, and a low score does not automatically mean originality. The score simply measures how much text matches the corpus. Legitimate writing produces matches all the time: quoted material, reference lists, common phrases, methods boilerplate and standard terminology all overlap with existing text. Conversely, paraphrased or translated copying may evade matching entirely. Interpreting the report requires reading which text matched and why.

    Sources of false positives

    Match type Why it appears Usually a concern?
    Quotations Correctly quoted and cited material matches its source No, if properly attributed
    Reference list Bibliographic entries match other papers’ references No
    Common phrases Standard discipline terminology recurs everywhere No
    Self-overlap Your own earlier draft is in the corpus Context-dependent
    Verbatim copying without citation Unattributed reuse of someone else’s words Yes — investigate

    Because quotations and references inflate the score, many tools let an assessor exclude them before reading the report. The judgement of whether a match represents misconduct is human, informed by the institution’s definition of plagiarism and how to avoid it.

    What the software cannot do

    Text-matching cannot determine intent — it cannot distinguish a careless missed citation from deliberate deception. It cannot detect ideas that have been paraphrased into entirely new wording. It cannot, by itself, prove that copying occurred rather than coincidence or common knowledge. And its corpus is finite, so a match to a source outside the corpus will not appear. These limits are why a similarity report is evidence to be weighed, not a decision.

    Responsible use

    Used well, text-matching supports research integrity by surfacing passages for human review and by helping students learn to quote and cite correctly. Used badly — treating a percentage as a guilty verdict — it produces unfair outcomes. Good practice is to exclude quotes and references when appropriate, read the matched passages in context, check whether matches are properly attributed, and apply institutional policy with due process. Sound citation habits, including using a DOI for durable links and verifying every generated reference, reduce avoidable matches. See our dictionary for definitions.

    Frequently asked questions

    Does a high similarity score mean I plagiarised?

    No. The score only measures how much text matches the corpus. Quotations, reference lists and common phrases all raise the score legitimately. A human must read the matched passages to judge whether anything is improperly attributed.

    Can plagiarism software prove plagiarism?

    No. It flags matching text; it cannot establish intent or prove misconduct. The report is evidence that an assessor interprets against the institution’s definition and policy, with due process.

    What is an acceptable similarity percentage?

    There is no universal threshold. Two papers with the same percentage can have very different reports — one full of properly cited quotations, another containing unattributed copying. The composition of the matches matters far more than the number.

    Can the software miss plagiarism?

    Yes. Heavily paraphrased copying, translated text and sources outside the tool’s corpus may not match. Text-matching is one aid among several, not a complete safeguard.

  • Mass Spectrometry in Research: An Explainer

    Mass spectrometry is a measurement technique that identifies and quantifies molecules by converting them into charged particles, sorting those particles according to their mass-to-charge ratio, and counting how many arrive at each value. The output is a mass spectrum, a plot of signal intensity against mass-to-charge ratio, from which the composition of a sample can be inferred. This article explains the physics and chemistry of how the measurement is produced. It is a methods explainer about the technique and its research uses.

    The three core stages

    Every mass spectrometer, whatever its design, performs three operations in sequence: ionisation, separation and detection. The sample molecules are first turned into ions, because the instrument manipulates particles using electric and magnetic fields, which act only on charged particles. The ions are then separated according to their mass-to-charge ratio, written m/z, the mass of the ion divided by its number of charges. Finally a detector records the ions arriving at each m/z value, producing the spectrum.

    Stage Function Physical basis
    Ionisation Convert neutral molecules into ions Adding or removing charge so fields can act on them
    Separation Sort ions by mass-to-charge ratio Response to electric or magnetic fields depends on m/z
    Detection Count ions at each m/z Charge collected and amplified into a signal

    Ionisation

    The ion source converts neutral molecules into gas-phase ions. The choice of method depends on the sample. Some methods impart a great deal of energy and tend to fragment molecules into characteristic pieces, which is informative for small, robust compounds. Others are gentle and produce intact ions of large, fragile molecules such as proteins, which would otherwise break apart. Soft ionisation methods made it possible to apply mass spectrometry to large biomolecules, vastly extending its reach. Whichever method is used, the result is a population of charged particles ready to be sorted.

    Separation by mass-to-charge ratio

    Once charged, ions are separated according to m/z. Different analyser designs achieve this through different physics. In a time-of-flight analyser, all ions are given the same kinetic energy and allowed to drift down a tube; lighter ions travel faster and arrive sooner, so arrival time encodes m/z. In a quadrupole, oscillating electric fields allow only ions of a selected m/z to pass through at a given moment, and scanning the fields sweeps through the range. Other designs trap ions and measure the frequency at which they orbit or oscillate, which depends on m/z. In all cases the separating principle is that an ion’s motion in a controlled field depends on its mass and charge.

    Detection and reading a spectrum

    The detector converts arriving ions into an electrical signal, typically amplifying the tiny charge of each ion into a measurable pulse and counting the pulses at each m/z. The accumulated counts form the mass spectrum. Reading it, each peak corresponds to an ion of a particular mass-to-charge ratio, and the height of the peak reflects how many such ions were detected, which relates to abundance. The pattern of peaks, including any fragment peaks, acts as a fingerprint that can identify a compound or, with calibration, quantify it. This conversion of a physical property into a counted signal places mass spectrometry alongside techniques such as PCR and ultrasound in the family of controlled-condition measurement methods.

    Common configurations and research uses

    Mass spectrometers are frequently coupled to a separation stage that feeds the sample in over time, such as a chromatography column that releases different components at different moments, so that the spectrometer analyses a mixture component by component. Tandem mass spectrometry chains two analysers so that a selected ion is fragmented and its pieces measured, giving structural detail. These configurations underpin major research fields. In proteomics, mass spectrometry identifies and quantifies the proteins in a sample by measuring peptide masses and fragments. In metabolomics, it profiles the small molecules of a biological system. It is equally central to environmental analysis, materials characterisation and many other areas.

    Because results depend heavily on instrument settings and calibration, reproducibility requires careful documentation, the subject of our guide on reporting analytical methods reproducibly. Standard terminology is held in the CASRAI dictionary, and the wider context appears across our research lifecycle coverage.

    Frequently asked questions

    Why must molecules be ionised first?

    A mass spectrometer separates particles using electric and magnetic fields, and those fields exert force only on charged particles. Neutral molecules would be unaffected and impossible to sort, so ionisation, adding or removing charge, is the necessary first step.

    What is mass-to-charge ratio and why is it the axis?

    Mass-to-charge ratio, m/z, is an ion’s mass divided by the number of charges it carries. The instrument separates ions by how they respond to fields, which depends on this combined quantity rather than mass alone, so m/z is the natural axis of a mass spectrum.

    How does a peak’s height relate to the sample?

    The height of a peak reflects how many ions of that mass-to-charge ratio reached the detector, which in turn relates to the abundance of the corresponding species. With appropriate calibration, peak intensities can be used to quantify amounts.

    What does tandem mass spectrometry add?

    Tandem mass spectrometry selects a specific ion, fragments it, and measures the fragments. The fragmentation pattern gives structural information that a single mass measurement cannot, which is valuable in fields like proteomics. Reproducibility of such workflows is discussed in our reproducibility coverage and the author guidance.

  • Cloud Computing for Research Infrastructure

    Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources — networks, servers, storage, applications and services — that can be rapidly provisioned and released with minimal management effort. This definition follows the influential model published by the US National Institute of Standards and Technology (NIST Special Publication 800-145), which remains the standard reference for what does and does not count as cloud.

    For research, the appeal is straightforward: scalable compute and storage without owning hardware, accessible from anywhere, and paid for as used. But cloud also introduces reproducibility, cost and governance considerations that researchers must plan for deliberately.

    The five essential characteristics

    The NIST model defines five characteristics that distinguish genuine cloud computing from ordinary remote servers. On-demand self-service lets users provision resources automatically. Broad network access makes them available over standard networks. Resource pooling serves multiple tenants from shared infrastructure. Rapid elasticity allows capacity to scale up or down quickly to match demand. Measured service meters usage transparently, enabling pay-as-you-go billing. A platform missing these — a single rented server, say — is hosting, not cloud.

    Service models: IaaS, PaaS and SaaS

    Cloud services are commonly grouped into three service models that differ in how much the provider manages versus the user. Choosing the right level shapes control, effort and reproducibility.

    Model What the provider manages What you manage Research example
    IaaS (Infrastructure as a Service) Physical hardware, virtualisation, networking Operating system, runtime, application, data Virtual machines for a custom analysis pipeline
    PaaS (Platform as a Service) Hardware plus OS, runtime and middleware Application code and data Managed notebook or database service
    SaaS (Software as a Service) Entire stack including the application Configuration and your data A hosted survey or reference-management tool

    A useful rule of thumb: as you move from IaaS to SaaS, you trade control and configurability for convenience and reduced operational burden. Reproducible research workflows often favour IaaS or PaaS, where the computational environment can be captured and versioned.

    Cloud’s role in research computing and data

    Cloud computing has reshaped research computing by lowering the barrier to large-scale analysis. A team can spin up a cluster for a week-long genomics run and release it afterwards, paying only for what they used. Cloud storage hosts large datasets close to the compute that processes them, and managed services reduce the systems-administration overhead that once consumed researcher time. Many funders and institutions now run or subscribe to cloud-based data infrastructure for exactly these reasons.

    Cloud also supports reproducibility when used well. Infrastructure-as-code, container images and environment specifications let others recreate the exact computational setup behind a result. This complements broader good practice in capturing and describing computational methods, and aligns with the goals of standardised description and discovery promoted across the CASRAI dictionary.

    Cost and governance considerations

    Elasticity cuts both ways. Pay-as-you-go can be economical for bursty workloads but expensive for sustained ones, and unmonitored resources can accrue surprising costs. Data egress charges — fees to move data out of a provider — can dominate budgets for data-heavy projects. Governance questions also matter: where data physically resides affects legal and ethical obligations, particularly for sensitive or personal data, and vendor lock-in can make migration costly. Researchers should plan data management, budgeting and exit strategies before committing, and should record provider, region and configuration alongside other metadata so collaborators and reviewers understand the environment. Guidance on documenting outputs is available in our resources for authors.

    Frequently asked questions

    What is the difference between cloud computing and a remote server?

    Cloud computing meets the five NIST characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service. A single rented remote server lacks elasticity and self-service provisioning, so it is hosting rather than cloud computing in the formal sense.

    Which service model should a research project use?

    It depends on the control you need. IaaS gives maximum control over the environment and suits custom, reproducible pipelines. PaaS reduces operational burden for application-focused work. SaaS is simplest when a ready-made tool already meets the need and the environment need not be captured.

    Does cloud computing help reproducibility?

    It can. Capturing environments as infrastructure-as-code or container images lets others recreate the exact setup behind a result. But reproducibility is not automatic — it requires deliberately versioning and sharing those specifications alongside data and code.

    What are the main governance risks?

    Key risks include unexpected costs (especially data egress), data residency and sovereignty constraints for sensitive data, and vendor lock-in. Address them with budgeting, a data-management plan, clear records of region and configuration, and a documented exit strategy.

  • Citing Secondary Sources: The ‘As Cited In’ Rule

    A secondary citation occurs when you refer to a source you have not read yourself, having encountered it only through another author’s discussion. Scholarly convention requires you to be transparent about this using the “as cited in” (or “qtd. in”) formula. The guiding principle is simple: cite what you actually read, and wherever possible track down and cite the original source instead.

    Why the rule exists

    If author B quotes or summarises author A, and you have read only B, you cannot vouch for what A really said. B may have paraphrased loosely, quoted selectively or made an error. Citing A directly as if you had read it misrepresents your sources and risks propagating a mistake. The “as cited in” convention keeps the record honest by showing the reader the chain: the original idea came from A, but you read it in B. This honesty is part of the integrity of the scholarly record.

    Read the original where you can

    Secondary citation is a fallback, not a convenience. Before using it, try to obtain the original — through your library, interlibrary loan, or a DOI lookup. Reading the original lets you confirm the quotation, see its context and cite it directly. Use “as cited in” only when the original is genuinely unavailable (out of print, untranslated, lost).

    How major styles handle it

    The styles agree on the principle but differ in wording and in which source goes in the reference list. The general rule across styles is that the reference-list entry is for the work you actually read (the secondary source).

    Style In-text form Reference list
    APA (Smith, 1999, as cited in Jones, 2020) Jones (the source you read) only
    MLA (qtd. in Jones 45) Jones (the source you read) only
    Chicago (notes) Smith, [work], quoted in Jones, [work] Both may appear, with the relationship shown
    Harvard (author–date) (Smith 1999, cited in Jones 2020) Jones (the source you read) only

    Always confirm the exact punctuation against your specific style edition, as details vary between versions. A reference manager can format the entry, but secondary citations are a classic case where you must check the output by hand.

    Worked examples

    APA, in text: Early work on data reuse argued that incentives drive deposit (Smith, 1999, as cited in Jones, 2020). Only Jones (2020) appears in your reference list.

    MLA, in text: One critic calls the dataset “the backbone of reproducibility” (qtd. in Jones 45). Only Jones appears on the Works Cited page.

    In both cases the message to the reader is identical: the idea originates with Smith, but you read it in Jones, and Jones is what you can actually verify. For how this fits into building a full reference list, see our guide to compiling a bibliography.

    Good practice

    Minimise secondary citations; prefer originals; quote B’s reading of A accurately; and never silently cite A as if you read it. When you must use “as cited in”, be precise about page numbers from the source you read. These habits, alongside accurate reference checking, support honest scholarship. See our author resources, the dictionary and the research-outputs hub for more.

    Frequently asked questions

    What does ‘as cited in’ mean?

    It signals that you are citing an original source (the primary) that you encountered only through another work (the secondary) which you actually read. It keeps your sourcing honest by showing you did not read the original directly.

    Which source goes in my reference list?

    In most author–date styles, only the source you actually read — the secondary source — appears in the reference list. The original is named in the in-text citation but not listed, because you cannot verify it directly. Chicago notes style may show both.

    Is using ‘qtd. in’ the same as ‘as cited in’?

    Effectively yes. “Qtd. in” (quoted in) is the MLA wording, while “as cited in” is the APA and Harvard wording. Both indicate a secondary citation; use the form your style requires.

    When should I avoid secondary citation entirely?

    Whenever you can obtain the original. Reading the primary source lets you verify the quotation and context and cite it directly, which is always preferable. Reserve secondary citation for sources that are genuinely unavailable.