Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Plain-language explainers · 139 topics

Learn the basics

Plain-language explainers for the core concepts of modern research administration. If you know what these mean, you can navigate the rest of the site (and the rest of the field) confidently.

By topic

Start here

Explainer

What is research data management (RDM)?

RDM covers how research data are looked after from planning through to long-term preservation or disposal — capturing, documenting, storing, securing, sharing, and archiving them. It is usually framed around the data lifecycle and underpinned by a data management plan (DMP). The aim is to keep data well-organised, safe, and as FAIR as possible so results can be verified and the data reused. RDM is increasingly an expectation of funders and is supported by repositories, data stewards, and institutional services.

Read →

Explainer

What is metadata?

Metadata is "data about data" — structured descriptions that tell you what a resource is, who created it, when, in what format, and how it may be used. It is usually grouped into descriptive metadata (for discovery), administrative metadata (for management and rights), and structural metadata (how parts fit together). Metadata follows shared schemas and standards — such as Dublin Core, DataCite, and schema.org — so that machines as well as people can interpret it consistently. Good metadata is the foundation of discoverability and the "Findable" in FAIR.

Read →

Explainer

What is a transformative agreement?

A transformative agreement bundles two things that were once paid for separately — access to read a publisher’s subscription content and the cost of publishing the institution’s authors open access — into a single contract. The point is transition: to move money and journals away from the subscription model towards open access over the term of the deal. cOAlition S’s Plan S endorses transformative arrangements as a time-limited bridge, expecting them to demonstrably increase the share of open content rather than simply preserve publisher revenue.

Read →

Explainer

What is citizen science?

Citizen science describes scientific work carried out, in whole or in part, by members of the public collaborating with or alongside professional researchers. Participation ranges from contributing observations and classifying data through to co-designing projects. It can dramatically extend data collection across time and geography, and it raises important questions about recognition and credit for contributors. The European Citizen Science Association’s Ten Principles set out widely cited good practice, and platforms such as Zooniverse host large numbers of public-participation projects.

Read →

Explainer

What is text and data mining (TDM)?

TDM uses software to process large corpora — articles, datasets, web content — extracting and analysing information at a scale beyond manual reading. Because mining usually requires making copies of in-copyright works, it sits at the intersection of research and copyright law. The EU’s Digital Single Market (DSM) Directive created TDM exceptions (Articles 3 and 4), and the UK has its own statutory TDM exception, though scope differs — particularly around commercial use and rights-holder opt-outs. Licensing and the use of mined content as AI training data are active areas of debate.

Read →

Explainer

What is an article processing charge (APC)?

An article processing charge is a publication fee that funds open access at the version of record. Under the Gold open-access model the cost shifts from the reader (subscriptions) to the author side — usually the author’s institution or funder. APCs vary widely between journals. Transformative agreements bundle them into institutional deals, waivers reduce them for authors in lower-income countries, and Diamond open access avoids them entirely.

Read →

Explainer

What is the Research Excellence Framework (REF)?

The REF is a periodic peer-review-based assessment of research quality across UK universities, run jointly by the four UK higher-education funding bodies. Submissions are assessed by expert panels against three elements — the quality of research outputs, the impact of research beyond academia, and the research environment. Results inform how block-grant ("QR") funding is distributed. The next exercise is REF 2029, which is reforming its approach in line with responsible-assessment principles.

Read →

Explainer

What is open data?

Open research data is data made freely available for others to access, reuse, and redistribute, typically under an open licence such as CC BY or a public-domain dedication (CC0). Openness is distinct from FAIR: data can be well-described and machine-actionable (FAIR) while access is appropriately restricted. The guiding principle is "as open as possible, as closed as necessary", reflecting that some data (e.g. sensitive personal data) cannot be fully open. Many funders now require open-data sharing where ethically and legally possible.

Read →

Explainer

What is a research knowledge graph?

A research knowledge graph models the scholarly world as nodes (entities such as researchers, institutions, publications, datasets, funders, and grants) connected by typed relationships (authored, funded, cites, affiliated-with). Persistent identifiers — ORCID for people, ROR for organisations, DOIs for outputs and grants — give each entity a stable anchor, and open metadata supplies the connections. Examples include the DataCite PID Graph, OpenAlex, and the OpenAIRE Graph.

Read →

Explainer

What is Creative Commons licensing?

Creative Commons licences are standard, ready-made copyright licences from the non-profit Creative Commons. They let a rights holder keep copyright while granting reuse permissions through modular conditions — Attribution (BY), ShareAlike (SA), NonCommercial (NC), and NoDerivatives (ND) — combined into licences such as CC BY and CC BY-SA. CC0 is a separate public-domain dedication that waives rights entirely. cOAlition S’s Plan S requires the CC BY licence (with limited exceptions) for funded open-access articles.

Read →

Explainer

What are the CARE Principles?

CARE is an acronym for Collective benefit, Authority to control, Responsibility, and Ethics. The CARE Principles for Indigenous Data Governance were articulated by the Global Indigenous Data Alliance (GIDA) in 2019 to ensure data about Indigenous Peoples is governed in line with their rights and interests. CARE complements the FAIR principles rather than replacing them.

Read →

Explainer

What is DORA?

DORA is the San Francisco Declaration on Research Assessment, drafted at the 2012 ASCB annual meeting and released in 2013. It asks funders, institutions, publishers, and metric providers to eliminate journal-based metrics (notably the Journal Impact Factor) from hiring, promotion, and funding decisions, and to assess research on its own merits. Thousands of organisations and individuals have signed it.

Read →

Explainer

What are the FAIR4RS Principles?

FAIR4RS is the application of the FAIR principles — Findable, Accessible, Interoperable, Reusable — to research software. Software is not data: it is executable, has dependencies, and changes through versions, so the principles were re-articulated by a community working group convened across the Research Data Alliance (RDA), FORCE11, and the Research Software Alliance (ReSA), and published in 2022.

Read →

Explainer

What is a data availability statement?

A data availability statement (also called a data access statement) is a statement in a paper describing whether and how the data supporting the results can be accessed — typically pointing to a repository, a persistent identifier such as a DOI, or explaining any restrictions. It is now required by many publishers (e.g. through their data policies) and funders.

Read →

Explainer

What is CRediT?

CRediT is a standardised list of 14 contributor roles that scholarly authors use to credit who did what on a research paper. It was standardised as ANSI/NISO Z39.104-2022. Over 50 publishers and thousands of journals now require it.

Read →

Explainer

What is a narrative CV?

A narrative CV is a researcher CV format that emphasises contribution narrative over publication lists. Versions include the UKRI R4RI, the Royal Society Résumé for Researchers, the Wellcome Trust narrative CV, and the Dutch NWO Evidence-of-Activity format.

Read →

Explainer

What is an ORCID iD?

ORCID is a free, persistent, unique identifier for you as a researcher. It distinguishes you from other researchers with similar names and connects your contributions across funders, publishers, repositories, and institutions. Required by most major funders since the late 2010s.

Read →

Explainer

What is FAIR data?

FAIR (Findable, Accessible, Interoperable, Reusable) is a set of four principles for making research data machine-actionable. Published by Wilkinson et al. in 2016 (Nature Scientific Data), now near-universally required in funder DMPs.

Read →

Explainer

What is a CRIS?

A Current Research Information System (CRIS) is the database of record at a university for everything research-related: publications, grants, people, projects. It connects to ORCID, Crossref, funders, repositories. Vendors: Pure (Elsevier), Symplectic Elements (Digital Science), Worktribe, Converis. Open source: VIVO.

Read →

Explainer

What is Plan S?

Plan S is a funder-driven open-access initiative requiring immediate CC BY open access for publications from funded research. Backed by ~30 major funders (UKRI, ERC, Wellcome, Gates, Templeton, etc.). Launched 2018; implementation from 2021.

Read →

Explainer

What is open access?

Open access is the practice of making peer-reviewed research articles freely available online, typically under a Creative Commons licence (most often CC BY). Distinct from "free to read" — true OA includes reuse rights.

Read →

Explainer

What is research integrity?

Research integrity is the practice of conducting research honestly, rigorously, transparently, and accountably. It covers FFP (fabrication / falsification / plagiarism), data management, authorship, peer review, and disclosure. Frameworks: ALLEA European Code; ORI in the US; UKRIO in the UK; COPE for publishers.

Read →

Explainer

What is a persistent identifier?

A persistent identifier (PID) is a unique, durable code that identifies a research entity — such as a researcher, organisation, or dataset — and resolves to up-to-date metadata via a managed resolver. PIDs survive URL changes, enabling reliable citation, linking, and a connected research-information graph.

Read →

Explainer

What is ROR?

ROR (Research Organization Registry) is an open, free registry that assigns a persistent identifier to each research organisation. A ROR iD disambiguates institutional affiliations across publishers, funders, and repositories — doing for organisations what ORCID does for individual researchers.

Read →

Explainer

What is a DOI?

A DOI (Digital Object Identifier) is a persistent, unique identifier assigned to a research output such as an article, dataset, or software release. It resolves via doi.org to the current landing page, so citations remain stable even when the underlying URL changes. DOIs are issued by registration agencies including Crossref and DataCite.

Read →

Explainer

What is RAiD?

RAiD (Research Activity Identifier) is a persistent identifier for a research project or activity. It connects the contributors, organisations, outputs, and tools involved across a project's lifespan — filling the "project" gap alongside ORCID (people), ROR (organisations), and DOI (outputs). RAiD is published as ISO 23527:2022.

Read →

Explainer

What is JATS XML?

JATS (Journal Article Tag Suite) is a NISO-standard XML model — ANSI/NISO Z39.96 — for encoding scholarly journal articles and their metadata. It is the common format publishers and archives such as PubMed Central use to store and exchange full text, and it encodes contributors and their CRediT roles.

Read →

Explainer

What is a data management plan?

A data management plan (DMP) is a structured document setting out how a project's research data will be created, organised, stored, protected, shared, and preserved. Funders such as NIH, UKRI, and Horizon Europe require DMPs, usually FAIR-aligned. Machine-actionable DMPs (maDMPs) make these plans structured and exchangeable between systems.

Read →

Explainer

What is CoARA?

CoARA (Coalition for Advancing Research Assessment) is an international coalition, launched in 2022, whose members commit to reforming research assessment — recognising diverse outputs and contributions and reducing inappropriate use of metrics such as the Journal Impact Factor and h-index. It builds on, and is complementary to, the earlier DORA declaration.

Read →

Explainer

What is a preprint?

A preprint is a full research manuscript posted publicly before peer review, usually on a preprint server such as arXiv, bioRxiv, or medRxiv. It is citable and often carries a DOI, but it is not the peer-reviewed version of record. Most journals allow prior preprinting, and many funders accept it as an open-access route.

Read →

Explainer

What is peer review?

Peer review is the evaluation of research by independent experts (peers) to judge its validity, rigour, and significance before publication. Common models include single-anonymous, double-anonymous, open, and post-publication review. It is the main way the version of record is certified, though it has well-documented limitations.

Read →

Explainer

What are altmetrics?

Altmetrics are indicators of the online attention and engagement around a research output, such as news coverage, policy citations, social-media mentions, and saves in reference managers. Provided by services like Altmetric and PlumX, they complement citations but measure attention, not quality — a distinction the Leiden Manifesto stresses.

Read →

Explainer

What is a retraction?

A retraction is a published notice that formally withdraws an article because its main findings are unreliable — through honest error or misconduct. It sits at the severe end of a scale that includes corrections and expressions of concern. The Committee on Publication Ethics (COPE) provides the widely used guidelines for when and how to retract.

Read →

Explainer

What is an h-index?

The h-index measures a researcher's output and impact together: you have an h-index of h if h of your papers each have at least h citations. Proposed by Jorge Hirsch in 2005, it is reported by Scopus, Web of Science, and Google Scholar — though the value differs between them, and responsible-assessment frameworks warn against over-relying on it.

Read →

Explainer

What is a systematic review?

A systematic review answers a focused question by systematically finding, critically appraising, and synthesising all relevant studies, using transparent methods set out in advance in a protocol. Where data allow, it may include a meta-analysis. Systematic reviews are reported using the PRISMA statement and protocols are often registered in PROSPERO.

Read →

Explainer

What is a data repository?

A data repository is a service that curates and preserves research datasets and makes them discoverable, citable, and reusable — typically minting a DOI and rich metadata for each deposit. Repositories may be generalist (e.g. Zenodo, Dryad, Figshare) or domain-specific, and trustworthiness can be signalled by certification such as CoreTrustSeal.

Read →

Explainer

What is research impact?

Research impact is the effect or benefit that research produces, both within academia (academic impact, such as advancing a field) and beyond it (societal impact on policy, health, the economy, culture, or the environment). The UK's Research Excellence Framework (REF) assesses societal impact through case studies, distinct from the quality of the underpinning research.

Read →

Explainer

What is a protocol paper?

A protocol paper presents the rationale, objectives, and detailed methodology of a study that has not yet been completed. Unlike a preregistration entry in a registry, a protocol paper goes through full peer review at a journal, meaning the methods themselves are assessed for scientific rigour before any data collection begins. Journals such as BMJ Open, Trials, and JMIR Research Protocols publish protocol papers and assign them a citable DOI. This allows other researchers to engage with, replicate, or build on a study's design independently of its results.

Read →

Explainer

What is responsible research and innovation (RRI)?

RRI is a framework that asks research and innovation to be anticipatory, reflexive, inclusive, and responsive to societal needs. The Engineering and Physical Sciences Research Council (EPSRC) in the UK developed the AREA framework — Anticipate, Reflect, Engage, Act — as a practical tool for applying RRI principles. At EU level, the European Commission embedded RRI in Horizon 2020 under the heading "Science with and for Society," identifying six keys: public engagement, open access, gender equality, ethics, science education, and governance. RRI differs from research ethics in that it extends scrutiny to the purpose, direction, and societal desirability of research, not only to how it is conducted.

Read →

Explainer

What is Software Heritage?

Software Heritage archives source code from public repositories across the web and assigns each unique software artefact a SoftWare Heritage persistent IDentifier (SWHID) — a cryptographic, content-based identifier that allows researchers to cite specific versions of code precisely and permanently. Unlike a DOI, a SWHID is intrinsic: it is computed from the content itself, so the same code always produces the same identifier regardless of where it is stored. Software Heritage supports the FAIR principles for research software and is integrated with Zenodo, enabling researchers to archive code alongside their datasets and publications.

Read →

Explainer

What is a publisher embargo period?

When a researcher publishes in a subscription journal, the publisher typically requires that the author's accepted manuscript (the peer-reviewed version before professional typesetting) not be deposited in an open repository until the embargo period has elapsed. During the embargo, only those with institutional subscriptions can read the article. After the embargo lifts, the manuscript can be shared freely in repositories such as Europe PMC or an institutional repository. Embargo policies vary widely: STEM journals often impose 6 or 12 months, whilst humanities journals may impose 24 months or longer. Rights retention strategies, developed by cOAlition S and adopted by institutions, provide a legal mechanism for bypassing embargoes for funded research.

Read →

Explainer

What is grey literature?

Grey literature includes any research or scholarly output that is not distributed through conventional peer-reviewed publishing. Classic examples are government departmental reports, clinical trial registrations, regulatory submissions, local authority evaluations, NGO briefings, unpublished dissertations, and white papers produced by research institutes. Because such documents are often not indexed in major databases, they require specific search strategies to locate. In systematic reviews, grey literature is sought deliberately to capture evidence — particularly null or negative findings — that might otherwise remain unpublished and thus invisible to a literature synthesis.

Read →

Explainer

What is the Leiden Manifesto?

The Leiden Manifesto responds to a growing concern that research evaluation had become over-reliant on bibliometric indicators such as the journal impact factor and h-index, often applied without regard for context. Its ten principles call for quantitative metrics to complement qualitative expert assessment; for performance to be measured against an institution's own stated mission; for locally relevant research to be protected; for data and analysis to be transparent; for those being evaluated to be able to verify the data used about them; for field-specific variation in citation practices to be respected; for individual researchers to be assessed on the basis of a qualitative judgement of their portfolio; for false precision to be avoided; for evaluators to recognise that metrics reshape the behaviour they measure; and for indicators to be regularly reviewed and updated.

Read →

Explainer

What is open peer review?

The most widely adopted typology of open peer review was developed by Tony Ross-Hellauer in a 2017 systematic review published in F1000Research. Ross-Hellauer identified seven distinct traits that can independently define an OPR model: open identities (reviewers sign their reports), open reports (reviews are published alongside the article), open participation (anyone can contribute a review), open interaction (dialogue between authors and reviewers is published), open pre-review manuscripts, open final-version commenting, and open platforms. Journals mix and match these traits — no single combination is universally standard.

Read →

Explainer

What is a data paper?

Where a conventional research article focuses on the results and interpretation of an analysis, a data paper focuses on the dataset itself: how it was collected, what it contains, its quality checks and limitations, how it is structured, and where it is deposited. The dataset is hosted in a recognised repository, assigned a persistent identifier (typically a DOI via DataCite), and the data paper provides the rich context that makes the dataset FAIR — Findable, Accessible, Interoperable, and Reusable. Dedicated data journals include Earth System Science Data (ESSD, Copernicus), Scientific Data (Nature Portfolio), Data in Brief (Elsevier), and GigaScience (Oxford University Press).

Read →

Explainer

What is a scoping review?

The scoping review methodology was first systematically described by Arksey and O'Malley in 2005 (International Journal of Social Research Methodology) and has since been updated by Levac, Colquhoun, and O'Brien (2010) and further developed by the Joanna Briggs Institute (JBI), which published updated methodological guidance through Peters et al. in 2020 (JBI Evidence Synthesis). The reporting standard for scoping reviews is PRISMA-ScR — the PRISMA extension for Scoping Reviews — published in the Annals of Internal Medicine in 2018. Importantly, PROSPERO does not accept scoping review registrations; researchers wishing to register scoping review protocols should use the Open Science Framework (OSF).

Read →

Explainer

What is preregistration?

A preregistration is typically submitted through a public registry such as the Open Science Framework (OSF) or the simpler AsPredicted platform, where it is time-stamped and either immediately public or held under an embargo until publication. For systematic reviews, PROSPERO is the specialist registration platform. A registered report is an extension of preregistration that adds peer review of the protocol before data collection: the journal provisionally accepts the paper based on the preregistered design, guaranteeing publication regardless of the outcome. Preregistration is mandatory for clinical trials under the FDA Amendments Act 2007 (FDAAA) and WHO International Clinical Trials Registry Platform (ICTRP), and is an increasingly expected norm in psychology, social science, and ecology.

Read →

Explainer

What is an institutional repository?

An institutional repository is a digital platform operated by a university, hospital, or other research institution to store and share its research outputs — including journal articles, theses, datasets, reports, and conference papers. The two dominant open-source platforms are DSpace (maintained by Lyrasis) and EPrints (developed at the University of Southampton). Repositories expose metadata via OAI-PMH so their content can be harvested by aggregators such as BASE (Bielefeld Academic Search Engine). Publisher self-archiving policies are navigated using SHERPA/RoMEO, which summarises what versions authors may deposit and under what embargo conditions.

Read →

Explainer

What is the European Open Science Cloud (EOSC)?

EOSC was formally launched by the European Commission in 2018 following a high-level expert group report, and the EOSC Association — the legal entity responsible for governing EOSC — was established in 2020. The EOSC-Future project (2021–2023, funded under Horizon 2020) delivered the first integrated EOSC core platform and Exchange marketplace. EOSC is built around a four-layer Interoperability Framework covering technical, semantic, organisational, and legal dimensions. It operates an Authentication and Authorisation Infrastructure (AAI) to enable cross-institutional access. EOSC is closely linked to ESFRI (European Strategy Forum on Research Infrastructures) landmarks, which contribute services to the EOSC Exchange.

Read →

Explainer

What is OpenAlex?

OpenAlex indexes over 250 million scholarly works, 250 million authors, 100,000 institutions, and 55,000 sources, drawing on Crossref metadata, Unpaywall OA status data, ROR for institution disambiguation, and ORCID for author disambiguation. Unlike Scopus and Web of Science, which require institutional subscriptions, OpenAlex is entirely free and its underlying data can be downloaded in bulk from Amazon S3. It provides a REST API and supports advanced filtering, grouping, and citation analysis. Its concept taxonomy (now migrated to a topics system) enables cross-disciplinary discovery and bibliometric analysis at scale.

Read →

Explainer

What is CoreTrustSeal?

CoreTrustSeal launched in 2017 through the merger of two predecessor certifications: the Data Seal of Approval (DSA), established in 2008 by DANS (Data Archiving and Networked Services) in the Netherlands, and the World Data System (WDS) certification managed by the International Science Council (ISC). The combined standard comprises 16 requirements across five domains: organisational background, digital object management, technology, security, and legal compliance. Major certified repositories include Zenodo, the UK Data Service, and PANGAEA. CoreTrustSeal is positioned as the mid-level certification in a tiered framework, sitting between informal quality guidelines and the full third-party audit standard ISO 16363.

Read →

Explainer

What is a FAIR Digital Object (FDO)?

The theoretical foundation for FDOs traces to Kahn and Wilensky's 2006 paper "A framework for distributed digital object services" (International Journal on Digital Libraries, doi:10.1007/s00799-005-0128-x). The FDO Forum, established in 2019, develops the FDO specifications and governance. An FDO has three essential components: a persistent identifier (PID), typically a handle or DOI; a type record (also called an FDO profile) that is machine-readable and defines the operations the FDO supports; and the bit sequences (the data or metadata payload). Operations on FDOs are carried out using the Digital Object Interface Protocol (DOIP), developed by the Corporation for National Research Initiatives (CNRI).

Read →

Explainer

What is DORA (the San Francisco Declaration on Research Assessment)?

DORA is a set of recommendations, first drafted at the December 2012 Annual Meeting of the American Society for Cell Biology (ASCB) and released in May 2013, calling for an end to the misuse of journal-based metrics — above all the Journal Impact Factor (JIF) — as a surrogate for the quality of individual articles or researchers. It contains one general recommendation plus targeted recommendations for funders, institutions, publishers, and organisations that supply metrics. More than 20,000 individuals and organisations across over 160 countries have signed it. DORA sits alongside the Leiden Manifesto (2015) and the Coalition for Advancing Research Assessment (CoARA, 2022) in the broader responsible-metrics and research-assessment-reform movement.

Read →

Explainer

What is a CRIS (Current Research Information System)?

A CRIS is a database and management system that records the research outputs and activities of an institution — publications, datasets, projects, grants and funding, researchers, and organisational units — and the relationships between them. The European standard for representing this information is CERIF (Common European Research Information Format), maintained by euroCRIS. Widely used CRIS platforms include Pure (Elsevier), Converis (Clarivate), Symplectic Elements (Digital Science), the open-source DSpace-CRIS, and Worktribe. A CRIS interoperates with institutional repositories and with persistent identifiers such as ORCID and ROR, and the broader practice of running such systems is known as research information management (RIM).

Read →

Explainer

What is SWORD (deposit protocol)?

SWORD stands for Simple Web-service Offering Repository Deposit. It is an interoperability standard that defines how one piece of software can deposit a package of content and metadata into a repository programmatically. The widely deployed version is SWORD v2; a newer specification, SWORDv3, was developed in collaboration with COAR and Jisc to modernise the protocol for current repository workflows. Typical use cases include publisher-to-repository deposit, CRIS-to-repository deposit, and depositing a single output into multiple repositories at once. SWORD is complementary to OAI-PMH (which harvests metadata) and aligns with COAR Notify for distributed repository communication.

Read →

Explainer

What is OAI-PMH?

OAI-PMH stands for Open Archives Initiative Protocol for Metadata Harvesting. It defines a simple, HTTP-based way for a "harvester" to request metadata from a repository ("data provider") using six verbs: Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord. Every OAI-PMH repository must expose its metadata in unqualified Dublin Core, ensuring a common baseline format. Aggregators such as BASE, CORE, OpenAIRE, and OpenDOAR use OAI-PMH to harvest from thousands of repositories. It is distinct from SWORD (which is for deposit) and ResourceSync (which is for synchronisation), and although it dates from 2001 it continues to play a central role in repository discovery.

Read →

Explainer

What is the REF (Research Excellence Framework)?

The REF is a periodic, expert-review-based assessment of research in UK universities, run on behalf of the four UK higher-education funding bodies. It first ran in 2014 (replacing the RAE), again in 2021, and is next planned as REF 2029. Submissions are organised into Units of Assessment and judged on three elements — Outputs, Impact, and Environment — which in REF 2021 were weighted 60%, 25%, and 15%. Assessors award a star rating to submitted material on a scale from 4* (world-leading) down to 1* and unclassified. The results drive the allocation of quality-related (QR) funding by bodies including Research England, and submitted outputs must comply with the REF open-access policy. REF 2029 introduces a stronger focus on people, research culture, and environment.

Read →

Explainer

What is SciENcv?

SciENcv is an NCBI-hosted system that lets researchers create and store the biographical and professional information used to produce funder-mandated CV documents. It generates several formats, including the NIH Biographical Sketch, the NSF Biographical Sketch, and the NSF Current and Pending (Other) Support document. Researchers can link an ORCID record and import data into SciENcv, and can also push selected information outward, reducing duplicate data entry. Since 2023 the NIH and NSF have required the use of SciENcv to prepare these documents for many applications, making it a central piece of US grant-submission infrastructure. It is operated by the National Center for Biotechnology Information within the US National Library of Medicine.

Read →

Explainer

What is the EQUATOR Network?

The EQUATOR Network is an international collaboration dedicated to improving the reliability and value of published health research by promoting transparent and accurate reporting. Its central resource is a freely available library of reporting guidelines — structured checklists for different study types, such as CONSORT for randomised trials, PRISMA for systematic reviews, STROBE for observational studies, ARRIVE for animal research, SPIRIT for trial protocols, COREQ for qualitative studies, CARE for case reports, and TRIPOD for prediction models. The network helps authors choose and apply the right guideline, and supports journals and educators. Coordinated through centres including the UK EQUATOR Centre at the University of Oxford, its overarching goal is to reduce avoidable research waste.

Read →

Explainer

What is a reporting guideline?

A reporting guideline is an evidence-informed set of items — typically a checklist — that tells authors the minimum content needed to report a given study type transparently and completely. Different designs have their own guidelines: CONSORT for randomised controlled trials, PRISMA for systematic reviews and meta-analyses, STROBE for observational studies, ARRIVE for animal research, SPIRIT for trial protocols, TRIPOD for prediction-model studies, and CARE for case reports, among others. These guidelines are collected and indexed by the EQUATOR Network, and many journals now require authors to follow the relevant one and submit a completed checklist. A reporting guideline is distinct from a methodological standard: it governs how the study is described, not how it must be carried out.

Read →

Explainer

What is the DDI (Data Documentation Initiative)?

DDI, the Data Documentation Initiative, is a widely adopted metadata standard for describing social, behavioural, and economic science data. It is XML-based and supports detailed, variable-level documentation — capturing question wording, response categories, codes, and value labels — so that datasets such as surveys and microdata can be understood, reused, and preserved. There are two principal specifications: DDI Codebook, which documents datasets primarily at a single point (often after collection), and DDI Lifecycle, which models the whole data life cycle from study conception through to archiving and reuse. Major data services and archives — including the UK Data Service, ICPSR, GESIS, and members of the CESSDA network — rely on DDI. It is far richer than the fifteen-element Dublin Core for this kind of detailed survey and microdata description.

Read →

Explainer

What is the Crossref Grant Linking System?

The Crossref Grant Linking System (GLS) enables funders to register their grants as DOI-identified records and to connect research outputs to those grants through funding metadata. Each registered award receives a grant DOI, providing a persistent, citable identifier for the funding itself, and outputs deposited with Crossref can reference the awards that supported them. This linking lets funders track the outputs of their investments and feeds the broader PID graph that ties together people, organisations, outputs, and funding. The system works alongside Crossref's funder identification — historically the Open Funder Registry — and the announced direction of using the Research Organization Registry (ROR) to identify funding organisations, improving consistency across the scholarly metadata ecosystem.

Read →

Explainer

What is predatory publishing?

Predatory publishers charge authors fees, typically APCs, but do not deliver the rigorous peer review, editorial oversight, and indexing that a legitimate journal provides. Warning signs include unsolicited flattering emails, unrealistically fast acceptance, opaque fees, fake or unverifiable editorial boards, and false indexing claims. Authors can check a journal against the Directory of Open Access Journals (DOAJ), look for publisher membership of bodies such as the Committee on Publication Ethics (COPE) and OASPA, and use the Think. Check. Submit. checklist. The phenomenon is best understood as a spectrum of quality rather than a simple predatory-versus-legitimate binary.

Read →

Explainer

What is differential privacy?

Differential privacy provides a rigorous, quantifiable guarantee that the result of an analysis is almost the same whether or not any one individual is included in the dataset, so individuals cannot be re-identified from released statistics. It achieves this by injecting calibrated noise, with a parameter epsilon governing the trade-off between privacy (smaller epsilon, more noise, stronger protection) and utility (larger epsilon, less noise, more accuracy). The concept was formalised by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. It has been deployed at scale by the US Census Bureau for the 2020 Census, and by companies including Apple and Google.

Read →

Explainer

What is data anonymisation?

Anonymisation transforms personal data so that individuals can no longer be identified, directly or indirectly, after which the data falls outside the UK GDPR. It involves removing or altering both direct identifiers (such as names and identification numbers) and indirect identifiers (such as combinations of postcode, date of birth, and occupation that could single someone out). Techniques include aggregation, generalisation, suppression, and approaches such as k-anonymity and l-diversity. It differs from pseudonymisation, which is reversible with additional information and remains personal data. The UK Information Commissioner's Office (ICO) issues guidance, and the UK Anonymisation Network (UKAN) offers practical frameworks; re-identification risk must always be assessed.

Read →

Explainer

What is COUNTER?

COUNTER provides a Code of Practice that standardises how usage of online journals, books, databases, and other scholarly content is measured and reported, making statistics comparable between vendors. The current versions are Release 5 and the updated Release 5.1. It defines metric types such as Total Item Requests and Unique Item Requests, distinguishing raw activity from de-duplicated counts. Reports can be retrieved automatically using the SUSHI protocol, avoiding manual downloads. For institutional repositories, IRUS provides COUNTER-conformant statistics. Libraries, publishers, and consortia use COUNTER data to evaluate collections, demonstrate value, and inform purchasing.

Read →

Explainer

What is a paper mill?

A paper mill is a commercial operation that fabricates scientific manuscripts and sells authorship positions, allowing customers to acquire publications they did not earn. Mill-produced papers often show tell-tale signs such as tortured phrases (oddly paraphrased standard terms), fabricated or duplicated data and images, and coordinated citation manipulation. They are fuelled by intense "publish or perish" pressure on researchers. Detection and response efforts include the STM Integrity Hub, which pools publisher tools and signals, and guidance from the Committee on Publication Ethics (COPE). Exposure of paper-mill networks has led publishers to issue mass retractions.

Read →

Explainer

What is data citation?

Data citation means treating datasets as citable outputs in their own right, with their own entry in a reference list. The widely adopted framework is the FORCE11 Joint Declaration of Data Citation Principles (2014), which sets out eight principles covering importance, credit, evidence, unique identification, access, persistence, specificity, and interoperability. In practice a data citation includes the creators, a title, a year, a publisher or repository, and a persistent identifier — most commonly a DataCite DOI. Data citation underpins reproducibility, supports a data availability statement in the paper, and ensures the people who produce and curate data are recognised and rewarded for that work.

Read →

Explainer

What is the IGSN?

IGSN originally stood for International Geo Sample Number and is now described as the International Generic Sample Number, reflecting its expansion beyond geoscience to environmental and biological samples. It is a persistent, globally unique identifier for a physical specimen, allowing that object to be unambiguously cited across analyses, datasets, and papers. In 2021 IGSN e.V. partnered with DataCite so that IGSN IDs are registered as DataCite DOIs, inheriting that infrastructure's metadata and resolution. Samples are commonly registered through allocating agents and systems such as SESAR (the System for Earth Sample Registration). By giving physical objects a PID, the IGSN brings samples into the FAIR ecosystem alongside digital data.

Read →

Explainer

What are the Hong Kong Principles?

The Hong Kong Principles for assessing researchers emerged from the 6th World Conference on Research Integrity (Hong Kong, 2019) and were published by Moher and colleagues in PLOS Biology in 2020. They set out five principles: assess responsible research practices; value complete and transparent reporting; reward the practice of open science (such as open data and open methods); acknowledge a broad range of research activities (including peer review, mentoring, and replication); and recognise essential tasks like administration and team science. Their explicit aim is to align how institutions and funders evaluate researchers with the behaviours that produce trustworthy science. They complement the metric-focused critique of DORA, the reform agenda of CoARA, and the indicator guidance of the Leiden Manifesto.

Read →

Explainer

What is rights retention?

Rights retention means authors keep the rights they need to self-archive and share their work openly. The best-known mechanism is the cOAlition S Rights Retention Strategy (RRS), which asks Plan S-funded authors to apply a CC BY licence to the Author Accepted Manuscript (AAM) at submission. The UK Scholarly Communications Licence (UK-SCL) is a related institutional approach that grants the institution a non-exclusive licence over the AAM, again typically CC BY. Because the open licence is applied before any publisher agreement is signed, authors can deposit the accepted manuscript without an embargo. Rights retention therefore interacts directly with publisher copyright transfer and embargo policies, asserting the author's prior right to share.

Read →

Explainer

What is COAR Notify?

The COAR Notify Initiative, led by the Confederation of Open Access Repositories (COAR), defines a standard way for repositories and external services to notify one another about activities relating to their resources. It uses W3C Linked Data Notifications (LDN) together with the Activity Streams 2.0 vocabulary to send and receive structured, machine-readable messages. This enables distributed workflows — for example linking a deposited preprint to a peer review, an endorsement, or an overlay journal hosted on a separate service — without funnelling everything through one centralised platform. It supports the emerging "publish, review, curate" model, in which a work is shared first and then reviewed and curated by independent services that connect back to the original repository copy.

Read →

Explainer

What is a primary source?

A primary source is original, first-hand evidence created at the time of an event or study, before anyone has interpreted it — for example raw data, an original research article reporting new results, letters, photographs, or artefacts. It is the closest available record to the thing being studied. Whether a source counts as primary is discipline-dependent: the same document can be primary in one field and secondary in another.

Read →

Explainer

What is a secondary source?

A secondary source is a work that interprets, analyses, or summarises evidence drawn from primary sources, rather than presenting first-hand data itself; common examples include literature reviews, textbooks, biographies, meta-analyses, and journal articles that discuss other researchers’ findings. Secondary sources are valuable for synthesis and context, but they place an author’s interpretation between you and the original evidence, so key claims should ideally be traced back to the primary source.

Read →

Explainer

What is a tertiary source?

A tertiary source is a work that compiles, indexes, or summarises information drawn from primary and secondary sources, packaging it for quick reference or discovery; examples include encyclopaedias, dictionaries, abstracting databases, handbooks, and bibliographies. Tertiary sources are excellent starting points for orientation and for finding primary and secondary material, but they are rarely cited as evidence in original research because they sit several steps removed from first-hand data.

Read →

Explainer

What is an operational definition?

An operational definition defines a variable by the specific, observable operations used to measure or manipulate it, rather than by its abstract meaning. For example, defining "stress" as a score on a validated questionnaire, or "sleep" as minutes recorded by an actigraph. Operationalising a concept makes a study repeatable and its measurements transparent, because anyone can apply the same procedure and obtain comparable data.

Read →

Explainer

What is empirical research?

Empirical research is research that answers a question using evidence gathered through direct observation, measurement, or experiment, rather than through reasoning or belief alone, so that its conclusions rest on empirical evidence — verifiable data collected systematically and analysed openly. Hallmarks include a clear question, defined methods, recorded data, and findings that others can scrutinise or attempt to reproduce. It contrasts with purely theoretical, philosophical, or anecdotal approaches.

Read →

Explainer

What is research bias?

Research bias is a systematic error — not random chance — that skews a study’s results consistently towards a particular outcome, making them unrepresentative of reality. It can enter at any stage: how participants are selected, how data are collected, how results are interpreted, or which findings get published. Common forms include selection, confirmation, recall, observer, and publication bias. Sound design, blinding, pre-registration, and transparent reporting are the main defences.

Read →

Explainer

What is construct validity?

Construct validity is the extent to which a measurement instrument genuinely measures the abstract concept, or construct, it is intended to measure. A depression questionnaire has strong construct validity if its scores really reflect depression and not, say, general fatigue. It is established by accumulating evidence that the measure behaves as the construct’s theory predicts. It is the most fundamental form of measurement validity and central to trustworthy operational definitions.

Read →

Explainer

What is internal validity?

Internal validity is the extent to which a study can justify a cause-and-effect claim — the confidence that the independent variable, and not some confounding factor, produced the observed effect. A study has high internal validity when alternative explanations have been ruled out through sound design, such as randomisation and control of confounds. It concerns the soundness of the causal conclusion within the study, distinct from whether findings generalise elsewhere.

Read →

Explainer

What is external validity?

External validity is the extent to which a study’s conclusions hold true beyond the specific sample, setting, and moment in which they were obtained — whether the findings generalise to other people, places, and times. A study has high external validity when its results can be reasonably applied to the wider population or to real-world conditions. It is distinct from internal validity, which concerns whether the cause-effect claim itself is sound.

Read →

Explainer

What is generalisability?

Generalisability is the degree to which findings from a particular sample or study can be extended to a broader population, setting, or context. A study generalises well when its sample fairly represents the population of interest and its conditions resemble those to which the conclusions will be applied. It is closely tied to external validity and depends heavily on sampling: representative, adequately sized samples support stronger, more defensible generalisation.

Read →

Explainer

What is epistemology?

Epistemology is the branch of philosophy concerned with the nature, sources and limits of knowledge — what counts as valid knowledge and how we know what we know. In research methods, your epistemological position determines what evidence you treat as legitimate, how you justify claims and which methods you consider trustworthy.

Read →

Explainer

What is ontology?

Ontology is the branch of philosophy concerned with the nature of being and reality — what exists and how the world is structured. In research methods, your ontological position states whether you assume a single objective reality independent of observers, or multiple realities constructed through social meaning, and that assumption underpins your whole approach.

Read →

Explainer

What is a research paradigm?

A research paradigm is a shared framework of basic beliefs that guides inquiry — combining assumptions about reality (ontology), knowledge (epistemology) and appropriate methods (methodology). Common paradigms include positivism, interpretivism/constructivism, pragmatism and critical/transformative approaches. The paradigm a researcher adopts shapes the questions they ask, the evidence they accept and the methods they use.

Read →

Explainer

What is positivism?

Positivism is a research paradigm asserting that genuine knowledge derives from observable, measurable facts about an objective reality that exists independently of the observer. It favours quantitative methods, hypothesis testing and the search for general laws, and prizes objectivity, replication and the separation of the researcher from what is studied. It contrasts sharply with constructivism.

Read →

Explainer

What is constructivism?

Constructivism, often called interpretivism, is a research paradigm holding that reality is socially constructed and that knowledge is created through interpretation rather than detached measurement. Researchers seek to understand the meanings, perspectives and contexts of participants, favouring qualitative methods such as interviews and observation. It stands in direct contrast to positivism’s objective, measurement-led stance.

Read →

Explainer

What is a scientific theory?

A scientific theory is a well-substantiated explanation of some aspect of the natural world, built on a body of evidence and repeatedly confirmed through observation and experiment. It is not a hunch or an unproven guess; in science, "theory" denotes one of the most reliable forms of knowledge, integrating facts, laws and tested hypotheses into a coherent, predictive framework.

Read →

Explainer

What is deductive reasoning?

Deductive reasoning is a top-down logic that moves from general premises to a specific conclusion that must be true if those premises are true. In research it underpins the theory-first approach: a researcher derives a hypothesis from existing theory, then collects data to test it. If the premises hold and the logic is valid, the conclusion is guaranteed.

Read →

Explainer

What is inductive reasoning?

Inductive reasoning is a bottom-up logic that moves from specific observations to a broader, probable generalisation. In research it underpins the data-first approach: a researcher gathers observations, identifies patterns, and builds toward theory. Its conclusions are likely rather than certain — they extend beyond the evidence — which makes induction powerful for discovery but inherently provisional.

Read →

Explainer

What is phenomenology?

Phenomenology is a qualitative research approach, rooted in philosophy, that studies lived experience — how people consciously perceive, interpret and give meaning to a particular phenomenon. Rather than measuring or explaining causes, it seeks the essence of an experience as it is lived. Researchers gather rich first-person accounts, usually through in-depth interviews, and analyse them for shared meaning.

Read →

Explainer

What is a case study?

A case study is a research method that investigates a single bounded case — a person, group, organisation, event or programme — in depth and within its real-world context. It draws on multiple sources of evidence, such as interviews, documents and observation, to build a rich, holistic understanding. Case studies suit "how" and "why" questions about contemporary phenomena the researcher cannot control.

Read →

Explainer

What is grounded theory?

Grounded theory is a qualitative research methodology in which theory is developed inductively from data rather than imposed beforehand. Through systematic coding and constant comparison, the researcher analyses data as it is collected, lets concepts and categories emerge, and builds an explanatory theory grounded in the evidence. Developed by Glaser and Strauss, it is a leading method for theory generation.

Read →

Explainer

What is face validity?

Face validity is whether a test or measure looks like it measures what it is supposed to, in the casual judgement of those who use or take it. It is assessed subjectively rather than statistically, which makes it the weakest form of validity evidence. A measure can look entirely plausible yet measure the wrong thing, and a sound measure can look unconvincing. Face validity matters mainly for buy-in — respondents are more likely to take a measure seriously if it seems relevant — but it is never sufficient on its own.

Read →

Explainer

What is content validity?

Content validity is whether the items of a test adequately and representatively sample the whole domain of the construct being measured. Unlike face validity, it relies on structured expert review rather than a casual impression — specialists check each item against a definition of the construct for relevance and coverage. A measure has poor content validity if it omits important facets (construct under-representation) or includes items tapping unrelated content (construct-irrelevant variance). It is often quantified, for example through a content validity index based on expert ratings.

Read →

Explainer

What is criterion validity?

Criterion validity is whether scores on a measure relate, as expected, to an external standard that reflects the construct of interest. The criterion can be assessed at the same time (concurrent validity) or in the future (predictive validity). Evidence typically takes the form of a correlation — a validity coefficient — between the measure and the criterion. Its great strength is that it ties the measure to a concrete, observable outcome rather than to theory alone; its main limitation is that you need a trustworthy criterion to validate against, which is not always available.

Read →

Explainer

What is predictive validity?

Predictive validity is whether scores on a measure predict a future criterion — an outcome the measure is supposed to anticipate. An admissions test has predictive validity if higher scores reliably go with better later performance; a clinical screen has it if it forecasts who will develop a condition. Evidence is usually a correlation between the early measure and the later outcome. It is a sub-type of criterion validity distinguished by its time gap, and it underpins any tool used to make forward-looking decisions about people.

Read →

Explainer

What is ecological validity?

Ecological validity is how well a study’s materials, tasks, and setting resemble the everyday situations the findings are supposed to generalise to. A lab experiment using artificial tasks under tight control may have strong internal validity yet low ecological validity, because the conditions bear little resemblance to real life. It is closely related to, but narrower than, external validity: external validity is generalisation in general, while ecological validity focuses specifically on realism of context. Maximising one can come at the cost of the other.

Read →

Explainer

What is test-retest reliability?

Test-retest reliability is whether a measure gives consistent results when the same individuals are tested twice, separated by an interval. It is usually quantified by correlating the two sets of scores; a high correlation means the measure is stable over time. It is appropriate for traits expected to remain steady, such as personality or aptitude, and less so for states that genuinely fluctuate, such as mood. The chosen interval matters: too short invites memory effects, too long allows real change to masquerade as unreliability.

Read →

Explainer

What is inter-rater reliability?

Inter-rater reliability is how consistently different observers produce the same ratings when assessing the same material. It is essential wherever data depend on human judgement — coding behaviour, scoring essays, diagnosing, or screening studies for a systematic review. Agreement is quantified with statistics that correct for chance, such as Cohen’s kappa for two raters with categorical judgements, Fleiss’ kappa for several, and the intraclass correlation coefficient for continuous ratings. Low inter-rater reliability signals ambiguous criteria or inadequate rater training, not just human error.

Read →

Explainer

What is internal consistency?

Internal consistency is whether the individual items of a scale hang together — whether they are all tapping the same construct. It is estimated from a single test administration by examining how the items intercorrelate, most often via Cronbach’s alpha, with McDonald’s omega and split-half methods as alternatives. High internal consistency suggests the items form a coherent scale; very low values suggest a mix of unrelated items. Because alpha rises with the number of items, an extremely high value can also signal redundancy rather than quality.

Read →

Explainer

What is selection bias?

Selection bias occurs when the process of choosing or keeping participants produces a group that is not comparable to the population the study aims to describe, or to the comparison group. It can arise at recruitment (who enters) or during follow-up (who drops out, known as attrition). Because the bias is built into who is studied, it cannot be fixed simply by collecting more data. Common forms include volunteer bias, healthy-worker effects, and loss to follow-up. Randomisation, careful sampling, and minimising attrition are the main defences.

Read →

Explainer

What is sampling bias?

Sampling bias occurs when the method of selecting a sample systematically over- or under-represents parts of the target population, so the sample is not representative. Estimates drawn from it are skewed in a predictable direction rather than scattered randomly. It is a sub-type of selection bias focused specifically on the sampling stage, and it stems from a flawed sampling frame or non-probability methods. Classic examples include convenience and self-selection samples. Probability sampling and a complete sampling frame are the principal safeguards.

Read →

Explainer

What is recall bias?

Recall bias occurs when study data depend on memory and that memory is systematically inaccurate — most damagingly when one group recalls the past differently from another. In case-control studies, for instance, people with a disease may search their memories harder for past exposures than healthy controls, exaggerating an apparent association. Because it distorts the comparison itself, recall bias is a form of information (measurement) bias. It is mitigated by using objective records, validated instruments, shorter recall periods, and designs less dependent on retrospection.

Read →

Explainer

What is response bias?

Response bias is any systematic tendency to respond to survey or interview questions on a basis other than their true content — for instance, to look good, to agree regardless of the question, or to favour extreme or middle options. Because it skews answers in a consistent direction, it threatens the validity of self-report measures rather than merely adding noise. Major forms include social-desirability bias, acquiescence (yea-saying), extreme and central-tendency responding, and demand characteristics. Careful question wording, anonymity, and balanced scales help reduce it.

Read →

Explainer

What is confirmation bias?

Confirmation bias is the inclination to look for and accept evidence that supports what we already believe, and to discount or overlook evidence that does not. In research it can shape which hypotheses are tested, how ambiguous data are read, which results are reported, and how findings are remembered. Because it operates largely unconsciously, it is not cured by good intentions alone. Safeguards include pre-registration, blinding, actively seeking disconfirming evidence, peer review, and structured methods that make the search for evidence explicit and even-handed.

Read →

Explainer

What is observer bias?

Observer bias occurs when those collecting or assessing data unintentionally let their expectations shape what they notice and record, skewing results in the direction they anticipate. It is closely tied to the observer-expectancy effect and is most dangerous in studies with subjective outcomes — clinical ratings, behavioural coding, or open assessment of an intervention. Because it is an information (measurement) bias rooted in the assessor, the standard defence is blinding: keeping observers unaware of group allocation or hypotheses, alongside standardised, objective measurement protocols.

Read →

Explainer

What is a pilot study?

A pilot study is a scaled-down version of a planned study, run in advance to check whether the proposed methods, instruments, recruitment, and logistics actually work. Its purpose is feasibility and refinement — surfacing ambiguous questions, faulty equipment, unrealistic timelines, or recruitment difficulties before committing full resources. It is not designed to test the main hypothesis or to estimate effect sizes for power calculations, and its small sample means its outcome findings are not reliable. A successful pilot improves the protocol; it does not prove the theory.

Read →

Explainer

What is reflexivity?

Reflexivity is the ongoing, critical self-examination through which researchers consider how their identity, values, prior beliefs, and relationship with participants influence what they study, how they gather data, and how they interpret it. Rather than pretending to be a neutral instrument, the reflexive researcher makes their positionality explicit and accounts for it. In qualitative research, reflexivity is central to trustworthiness — analogous to how validity and reliability function in quantitative work — and is often documented through a reflexive journal or a positionality statement.

Read →

Explainer

What is a variable?

A variable is any characteristic or attribute that can take on different values — across individuals, occasions, or experimental conditions. Examples include age, blood pressure, reaction time, and treatment group. Variables are classified two ways: by the role they play in a study (independent, dependent, confounding, control, extraneous) and by their type or level of measurement (categorical versus numerical, discrete versus continuous). Identifying each variable’s role and type is the first step in designing a study and choosing an appropriate analysis.

Read →

Explainer

What is a construct?

A construct is an abstract concept or trait that is not directly observable and must be inferred from things that are — for example intelligence, wellbeing, socioeconomic status, or customer satisfaction. Because a construct cannot be measured straightforwardly, researchers define it operationally and capture it through observable indicators or measures. The quality of that link is judged by construct validity: whether the chosen measure genuinely reflects the intended idea. A construct is the abstract concept; a variable is its measured representation.

Read →

Explainer

Dublin Core metadata

Dublin Core is a standard schema consisting of fifteen primary descriptors (such as Creator, Title, Date, Subject, and Format) used to catalog and discover digital assets. Originally designed in 1995, it became a cornerstone of open access repository systems because of its simplicity and flexibility. There are two primary versions: Simple Dublin Core (the core 15 elements) and Qualified Dublin Core (which adds refinements and vocabularies). It allows different repository platforms to share descriptions seamlessly via protocols like OAI-PMH.

Read →

Explainer

Controlled vocabulary

A controlled vocabulary is a standardized list of terms, subject headings, or codes used to describe resources in digital libraries, databases, and catalogs. By restricting descriptive fields to a predefined list, it eliminates natural-language issues like synonyms (different words with the same meaning), homographs (the same word with different meanings), and spelling variations. Common examples include the Library of Congress Subject Headings (LCSH) and Medical Subject Headings (MeSH). This consistency is vital for systematic reviews and scholarly database indexing.

Read →

Explainer

What is a literature review?

A literature review is a structured, critical synthesis of existing research on a topic. It maps what is already known, identifies gaps and contradictions, situates a new study in context, and justifies the research approach. Types include narrative, systematic, scoping, integrative, and umbrella reviews. It is a core chapter in a dissertation or thesis and appears as a section of most journal articles, helping readers and reviewers assess how the study builds on prior work.

Read →

Explainer

What is plagiarism?

Plagiarism is the act of using another person's words, ideas, data, or creative output without appropriate acknowledgement, thereby implying they are your own original contribution. Types include direct copying, paraphrasing without citation, self-plagiarism (recycling your own previously published text without disclosure), patchwork or mosaic plagiarism, and ghost-writing. Detection software such as Turnitin and iThenticate identifies textual overlap, but intent and context are judged by institutional academic integrity panels.

Read →

Explainer

What is a research hypothesis?

A research hypothesis is a specific, testable, and falsifiable statement predicting the relationship between variables based on theory or prior evidence. The null hypothesis (H₀) states that no effect or relationship exists; the alternative hypothesis (H₁ or Hₐ) predicts the expected effect. Hypotheses may be directional (specifying which group scores higher) or non-directional (predicting a difference without specifying direction). A good hypothesis is grounded in existing knowledge, operationally clear, and answerable with available data.

Read →

Explainer

What is a research question?

A research question is the precise, answerable question that drives a study. It should be focused, feasible, relevant, and ethically acceptable. Types include descriptive (what is happening?), comparative (how do groups differ?), causal (does X cause Y?), and exploratory (what are the key factors?). In health sciences, the PICO or PICOTS framework structures questions around Population, Intervention, Comparator, Outcome, Time, and Setting. The FINER criteria — Feasible, Interesting, Novel, Ethical, Relevant — are a useful quality check.

Read →

Explainer

What is a scientific theory?

A scientific theory is a rigorously tested, explanatory framework supported by extensive evidence from multiple lines of inquiry. It explains why phenomena occur, not just that they occur. A theory is not a guess, a hunch, or a preliminary idea — in everyday speech "theory" can mean speculation, but in science it denotes the highest level of explanatory confidence. Theories are testable, predictive, parsimonious, and open to revision if contradicting evidence accumulates. Examples include evolution by natural selection, germ theory, general relativity, plate tectonics, and quantum theory.

Read →

Explainer

What is epistemology?

Epistemology addresses what counts as legitimate knowledge and how it can be acquired. Key positions include positivism (knowledge comes from observable, measurable facts, independent of the knower), interpretivism (knowledge is subjective and constructed through meaning-making), pragmatism (knowledge is judged by its practical utility), and critical realism (observable events are produced by underlying structures that are not directly observable). A researcher's epistemological stance shapes their methodology, data collection strategy, and the claims they can make about their findings.

Read →

Explainer

What is ontology in research?

Ontology in research refers to a researcher's assumptions about the nature of reality. Realism holds that reality exists independently of observers and can be studied objectively. Idealism holds that reality is mind-dependent or socially constructed. Nominalism holds that only particular things exist — abstract universals are labels, not real entities. In social research, objectivism (social phenomena exist independently of actors) and constructivism (social reality is produced through social interaction) are the dominant ontological positions, and they shape what methods and epistemologies are appropriate.

Read →

Explainer

What is a research paradigm?

A research paradigm is a set of philosophical assumptions that determines how a discipline approaches research — what reality is (ontology), how knowledge of it is produced (epistemology), and what methods are appropriate (methodology). Thomas Kuhn introduced the concept in The Structure of Scientific Revolutions (1962), showing that science operates within paradigms and periodically undergoes paradigm shifts. Major paradigms include positivism, post-positivism, interpretivism/constructivism, critical/transformative, and pragmatism. Paradigm choice shapes every aspect of a study from question formulation to analysis.

Read →

Explainer

What is the difference between inductive and deductive reasoning?

Deductive reasoning starts with an established theory, derives a testable hypothesis, collects data, and confirms or refutes the prediction — a top-down, theory-testing approach associated with quantitative and positivist research. Inductive reasoning starts with observations, identifies patterns, forms a hypothesis, and builds toward a broader theory — a bottom-up, theory-generating approach associated with qualitative and interpretivist research. A third approach, abductive reasoning, infers the best available explanation from incomplete evidence and is common in exploratory and case-study research.

Read →

Explainer

What is a conceptual framework?

A conceptual framework is a researcher-built map of the key concepts and their expected relationships in a specific study. It shows what the researcher thinks is happening — which variables are independent, which are dependent, which mediate or moderate the relationship, and what contextual factors may influence it. It differs from a theoretical framework (which adopts an existing theory wholesale) and from a literature review (which surveys existing knowledge). Examiners use it to assess whether the study's design is coherent and whether claims are anchored in a clear conceptual logic.

Read →

Explainer

What is validity in research?

Validity is the degree to which a research instrument, study design, or finding accurately represents the concept or population it is intended to capture. Key types include internal validity (confidence in causal inferences), external validity (generalisability to other settings and populations), construct validity (whether the measure operationalises the intended concept), content validity (whether items cover the full domain), and criterion validity (whether the measure predicts or correlates with an external standard). In qualitative research, Lincoln and Guba's parallel criteria — credibility, transferability, dependability, and confirmability — serve analogous functions.

Read →

Explainer

What is reliability in research?

Reliability is the consistency and stability of a measurement. Key types include test-retest reliability (the same participants produce similar scores at different times), inter-rater reliability (different observers or coders produce similar scores for the same data), internal consistency (items within a scale consistently measure the same underlying concept, commonly assessed with Cronbach's alpha ≥ 0.7), and parallel/alternate forms reliability (two equivalent versions of a test produce similar results). Reliability is necessary but not sufficient for validity: a measure can be consistent yet still measure the wrong thing.

Read →

Explainer

What is a variable in research?

A variable is any characteristic that can vary — across individuals, occasions, or experimental conditions. Researchers classify variables by the role they play in a study (independent, dependent, control, confounding, mediating, moderating) and by their level of measurement (nominal, ordinal, interval, ratio — Stevens, 1946). Operationalisation converts an abstract concept into a measurable variable. Identifying the role and level of measurement of each variable at the design stage determines which statistical tests are appropriate and what causal claims can legitimately be made.

Read →

Explainer

What is a block grant?

A block grant is a high-level funding allocation given to institutions or regional bodies rather than individual research projects. Unlike categorical grants (which target highly specific projects), block grants provide institutions with the autonomy to distribute funds across their own internal researchers, infrastructure, and strategic priorities. For example, block grants from the UK Research and Innovation (UKRI) fund institutional Open Access (OA) publishing budgets, letting universities decide how to allocate APC fees internally.

Read →

Explainer

What is a case study?

A case study is an in-depth, intensive investigation of a single individual, group, event, organisation or phenomenon in its real-world context. It is used when researchers want to understand complex processes or test whether theories apply in specific settings. Case studies are associated with qualitative research but can incorporate quantitative data.

Read →

Explainer

What is a meta-analysis?

A meta-analysis is a statistical technique that combines the results of multiple independent studies on the same research question to calculate an overall, more precise effect size. It sits at the top of the evidence hierarchy, above individual randomised controlled trials. Meta-analysis is typically part of a systematic review that has already identified and screened eligible studies.

Read →

Explainer

What is mixed methods research?

Mixed methods research combines quantitative and qualitative data collection and analysis in a single study to answer research questions that neither approach can answer alone. It draws on the strengths of both paradigms — the generalisability of quantitative methods and the contextual depth of qualitative methods. The term was popularised by Creswell and Plano Clark.

Read →

Explainer

What is statistical significance?

Statistical significance describes when a result is unlikely to have occurred by chance if the null hypothesis were true. It is measured by the p-value: p < 0.05 means there is less than a 5% probability of observing the result if the null hypothesis holds. Statistical significance does not equal practical importance.

Read →

Explainer

What is content analysis?

Content analysis is a research method for systematically analysing the content of texts, images, audio or video — counting words, themes or patterns either quantitatively (frequency counts) or qualitatively (interpreting meaning and themes). It is widely used in communication studies, sociology, political science, psychology and health research to analyse large volumes of documents.

Read →

Explainer

What is discourse analysis?

Discourse analysis is a qualitative research method that examines how language is used in texts and talk to construct meanings, identities and social realities. It goes beyond what is said to analyse how language use reflects and shapes power relations, ideology and social context. Key traditions include critical discourse analysis, conversation analysis and Foucauldian discourse analysis.

Read →

Explainer

What is a null hypothesis?

A null hypothesis (H₀) is a statement that there is no effect, no difference or no relationship between variables — the default position that a researcher tests. Statistical hypothesis testing determines whether there is sufficient evidence to reject H₀ in favour of the alternative hypothesis (H₁). Null hypotheses are never proved, only failed to reject.

Read →

Explainer

What is triangulation in research?

Triangulation in research is the use of multiple methods, data sources, researchers or theoretical perspectives to investigate the same phenomenon, increasing confidence in findings. Denzin (1978) identified four types: data triangulation, investigator triangulation, theory triangulation and methodological triangulation. It is especially common in qualitative and mixed-methods research to enhance credibility and reduce bias.

Read →

Explainer

What is action research?

Action research is a cyclical research methodology in which researchers and practitioners collaboratively identify a problem, plan an intervention, act, observe outcomes and reflect before repeating the cycle. It aims to improve practice while simultaneously generating knowledge. It is widely used in education, nursing, community development and organisational contexts.

Read →

Explainer

What is sampling in research?

Sampling in research is the process of selecting a subset of individuals, cases or units from a larger population to study. The goal is to obtain a sample from which findings can be inferred back to the population. There are two broad types: probability sampling (every unit has a known chance of selection) and non-probability sampling (selection is not random).

Read →

Explainer

What is thematic analysis?

Thematic analysis is a qualitative method for identifying, analysing and reporting patterns of meaning (themes) across a dataset. Braun and Clarke’s (2006) six-phase framework — familiarisation, generating codes, searching for themes, reviewing themes, defining and naming, and writing up — is the most widely cited approach. It is flexible, applicable across many theoretical frameworks, and suitable for large qualitative datasets.

Read →

Explainer

What is a research design?

A research design is the overall plan for how a study will be conducted — specifying the research questions, methods, data sources, analysis approach and strategies to ensure validity. It is the bridge between the research question and the evidence collected. The three broad categories are experimental, quasi-experimental and non-experimental designs.

Read →

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →