Editorial category Track B
Research data infrastructure
Trusted repositories, EOSC, biobanks, data trusts, federated infrastructure.
- 21 June 2026
Identifiers for Things, Not Just Papers: IGSN and PIDINST
Persistent identifiers are familiar for articles, datasets, and people, but the physical objects of research, the rock cores, water samples, and the instruments that measure them, have long lacked stable references. The IGSN for samples and the PIDINST work for instruments extend persistent identification to the physical world, making physical research objects findable, citable, and connectable to the data they produce.
- 21 June 2026
Anonymising research data: k-anonymity, differential privacy and the re-identification risk
Sharing data about people without exposing the people themselves is one of the hardest problems in research data management. This article distinguishes anonymisation from pseudonymisation, explains the privacy models researchers actually use, k-anonymity, l-diversity and differential privacy, and introduces the practical guidance from the UK Anonymisation Network (UKAN) and the ICO’s anonymisation code. It also confronts the uncomfortable reality that re-identification is often easier than it looks.
- 20 June 2026
Big Data and the Vs of Data Explained for Research
Big data describes datasets so large, fast or varied that traditional tools cannot handle them. This guide explains the defining Vs, from volume and velocity to veracity and value, how distributed processing copes, and what big data means for research and FAIR data.
- 20 June 2026
Cloud Computing for Research Infrastructure
Cloud computing delivers on-demand, elastic, measured computing resources over a network. This explainer defines it using the NIST model, distinguishes IaaS, PaaS and SaaS, and weighs its role in reproducible research alongside cost and governance considerations.
- 20 June 2026
Open Data in Public-Health Research: Standards
Open and FAIR data principles are reshaping how public-health research data are shared and reused. This guide explains data-sharing standards, anonymisation at a high level, metadata, persistent identifiers, and the governance that enables responsible reuse.
- 20 June 2026
Incidence vs Prevalence: Key Epidemiological Measures
Incidence counts new cases over time, while prevalence counts existing cases at a point or over a period. This guide defines each measure, shows how they are calculated, explains the relationship between them, and clarifies when to use which.
- 19 June 2026
Death Rate and Mortality Statistics: Definitions
A death rate measures deaths relative to a population, but a crude rate and an age-standardised rate answer different questions. This guide defines both, explains why standardisation is needed for comparison, and outlines cause-of-death coding and data sources.
- 18 June 2026
Genomic Data-Sharing Standards: GA4GH and Responsible Access Explained
Genomic data sharing relies on common standards for formats, metadata, consent and controlled access. This guide explains the role of the Global Alliance for Genomics and Health, FAIR principles and controlled-access archives in moving genetic data responsibly.
- 18 June 2026
Amino Acids: Notation, Protein Data and How Sequences Are Recorded
Amino acids are the chemical building blocks of proteins. This guide explains the 20 standard amino acids, their one- and three-letter notation, and how protein sequence and structure data are recorded and shared through UniProt and the Protein Data Bank.
- 18 June 2026
FAIR Principles for Research Data Explained
The FAIR principles make research data Findable, Accessible, Interoperable and Reusable. Published by Wilkinson et al. in 2016, they emphasise persistent identifiers and rich metadata. This explainer defines each principle and clarifies how FAIR differs from open data.
- 18 June 2026
Census and Population Data: Sources and Standards
A census is a complete enumeration of a population at a defined moment, the backbone of official population statistics. This guide explains how the data are collected and standardised, the roles of national statistics offices and the UN, and the shift toward register-based methods.
- 18 June 2026
Life Expectancy: How It Is Calculated and the Data Behind It
Life expectancy is the average number of years a population is expected to live, estimated from a life table built on mortality data. This guide explains the method, the difference between period and cohort measures, and the data sources involved.
- 15 June 2026
Open scholarly infrastructure and the POSI principles
The identifiers, registries and services that hold scholarship together are infrastructure — and infrastructure can fail, lock in, or disappear. The POSI principles set out how the organisations that run it can be made durable and accountable.
- 15 June 2026
FAIR data in practice: making research data findable and reusable
FAIR is widely cited and often misunderstood. What Findable, Accessible, Interoperable and Reusable actually require in practice, why FAIR is not the same as open, and the concrete steps that move a dataset from a hard drive to a genuinely reusable output.
- 15 June 2026
Licensing research data: CC-BY, CC0 and when to use each
A dataset without a clear licence is data nobody can confidently reuse. The difference between CC0 and CC BY for data, why software needs different licences, and how to choose a licence that makes your data genuinely reusable.
- 13 June 2026
Trusted repositories and the EOSC: where research data should live
FAIR data has to live somewhere, and not every place is fit to hold it. Trusted digital repositories, CoreTrustSeal certification and the European Open Science Cloud set out where research data should be deposited.
- 12 June 2026
Repository certification: CoreTrustSeal and the markers of a trustworthy repository
Where should research data live so that it is still usable in twenty years? Repository certification — CoreTrustSeal, the TRUST Principles, the nestor Seal and ISO 16363 — gives concrete, auditable answers about what makes a digital repository genuinely trustworthy.
- 11 June 2026
Data citation: giving datasets the credit they deserve
Datasets underpin findings but are rarely cited as first-class objects. How DataCite DOIs, the FORCE11 Joint Declaration of Data Citation Principles and the CRediT Data curation role together make data both citable and creditable.
- 11 June 2026
Data availability statements: what to write and where to deposit
A data availability statement is now a routine journal requirement, but “available on request” statements rarely deliver. What to write, how to make data genuinely FAIR, and how to choose between generalist and domain repositories.
- 10 June 2026
DataCite and the data-citation infrastructure
Articles have long had DOIs; datasets, software and other research outputs needed the same. DataCite provides persistent identifiers and a metadata schema that make data first-class, citable, connectable objects in the scholarly record.
- 31 May 2026
Finding research data: dataset discovery and data search engines
Depositing data in a repository is only half the battle — if no one can find it, it might as well not exist. Registries of repositories, dataset search engines and structured metadata are what turn a deposited dataset into a discoverable one.
- 26 May 2026
Federated analysis: bringing computation to the data
Some of the most valuable research data — health records, genomic data, sensitive registries — cannot easily be moved or pooled. Federated analysis flips the usual model: instead of bringing data to the computation, it sends the computation to the data, enabling collaboration without the data ever leaving home.
- 12 February 2026
EOSC Federation governance: what changed in 2026
The 2026 EOSC Federation governance refresh: new node-membership criteria, the sustainability funding model, and how integrators should reposition.
- 8 January 2026
FAIR data assessment frameworks: a buyer’s guide for institutions
RDA Maturity Model, F-UJI, FAIR-Aware, CESSDA, ARDC: which FAIR assessment tool to choose, when, and how to integrate with institutional repositories.
- 1 January 1970
Documenting social science data: the Data Documentation Initiative (DDI)
Survey and microdata are only reusable if every variable, code and question is documented. The Data Documentation Initiative (DDI) is the international metadata standard that makes that possible for social, behavioural and economic data. This article distinguishes DDI Codebook from DDI Lifecycle, explains variable-level metadata, and shows why major archives such as the UK Data Service, ICPSR and GESIS built their holdings on it, and how DDI fits the wider metadata landscape.
- 1 January 1970
Choosing Where to Put Your Data: re3data and the Landscape of Research Data Repositories
Depositing research data in a suitable repository is now an expectation of most major funders, but with thousands of repositories in existence — institutional, disciplinary, and general-purpose — choosing the right one is not straightforward. re3data, the Registry of Research Data Repositories, is the principal tool for navigating this landscape: a searchable, indexed registry of more than 3,000 research data repositories worldwide, operated under the auspices of DataCite. This article examines re3data, its metadata schema and badge system for signalling repository properties such as data access, licensing, and persistent identifier support, the main types of repository, how funders including the NIH, Wellcome, and Horizon Europe point researchers towards repository registries for repository selection, and the complementary role of FAIRsharing as a registry of databases, standards, and policies.
- 1 January 1970
FAIR Digital Objects: Making Research Data Machine-Actionable Beyond Metadata
FAIR principles have transformed expectations around research data sharing, but the original FAIR framework addresses metadata and findability rather than the machine-actionable operations needed to actually use data at scale. The FAIR Digital Object (FDO) framework, developed through the FDO Forum and built on the Digital Object Interface Protocol (DOIP) from the Corporation for National Research Initiatives, extends FAIR by wrapping data objects with typed, machine-executable operations. Where a standard DOI resolves to a landing page for human readers, an FDO exposes typed interfaces that allow software to retrieve, validate, and act on data without human intervention. This article examines the FDO framework, the CNRI Cordra software that implements it, the Research Data Alliance FAIR Digital Objects working group, and practical deployments in the European Open Science Cloud and biodiversity informatics.
- 1 January 1970
Linked Data and Knowledge Graphs in Scholarly Research: RDF, SPARQL, Wikidata and ORKG
Linked data technologies developed by the World Wide Web Consortium — principally the Resource Description Framework and the SPARQL query language — provide a principled way to connect research objects, express relationships between them, and make them queryable by machines as well as humans. Applied to scholarly research, these technologies underpin initiatives such as the Open Research Knowledge Graph at TIB Hannover, Wikidata’s role as a community-maintained scholarly knowledge base, and the widespread adoption of Schema.org’s Dataset markup to make research datasets discoverable by search engines. Understanding how linked data connects papers, datasets, authors, funders and institutions into a navigable web of scholarly knowledge helps research administrators and data infrastructure professionals appreciate the foundations of machine-readable research information systems.








