Author: MCP Service

FORCE11 Scholarly Communication Institute 2026: A Career Pathway for Research-Support Staff

The FORCE11 Scholarly Communication Institute (FSCI) is an annual week-long summer training programme, co-hosted by FORCE11 and the UCLA Library, that teaches researchers, librarians, publishers, funders, and research administrators the practical skills of open scholarly communication. For research-support professionals specifically, FSCI functions less like a one-off conference and more like a structured training pathway: a recognised route to build open-science, data-stewardship, and research-metrics competence that can be cited on a CV or used to justify a promotion case. FSCI 2026 runs 27–31 July 2026.

The FORCE11 Scholarly Communication Institute is best defined this way: it is a volunteer-run, multi-day summer school in which attendees select one week-long “morning course” plus a rotation of shorter afternoon electives covering topics such as FAIR data stewardship, persistent identifiers, peer review, and research metrics. It was first launched in 2017 and is modelled on the longer-running Digital Humanities Summer Institute in Victoria, British Columbia.

What is the FORCE11 Scholarly Communication Institute?
Who should attend FSCI as a career-development step?
How does the FSCI course structure work?
What does FSCI cost, and are scholarships available?
How does FSCI differ from a formal scholarly communication librarian role?
Frequently asked questions
What this means for research-support careers

What is the FORCE11 Scholarly Communication Institute?

FSCI is the training arm of FORCE11, the community that originated in 2011 around “the Future of Research Communications and e-Scholarship.” Since 2017, FSCI has been co-organised with the UCLA Library and runs each summer, alternating in recent years between in-person, online, and hybrid formats. Course materials from FSCI 2020 through FSCI 2024 have been archived openly on Zenodo and the Open Science Framework, so the institute leaves a durable, citable training record rather than a one-time event.

FORCE11’s broader track record matters for credibility: the same community co-developed the FAIR Data Principles and the Joint Declaration of Data Citation Principles, two frameworks that underpin research-data policy at funders and repositories worldwide. FSCI teaches practitioners to apply that same body of work operationally, rather than simply reading about it.

Who should attend FSCI as a career-development step?

FSCI is explicitly multi-audience: researchers, librarians, publishers, funders, university research-administration staff, students, and postdocs all attend the same institute, choosing courses at introductory or advanced level. For a research-support professional — someone working in a research office, library scholarly-communication unit, or funder programme team — this cross-sector mix is the point.

Rather than training in isolation with only colleagues from one institution, attendees benchmark their skills against a global peer group. A 2018 Serials Review analysis of the institute (Rodriguez, 2018, DOI: 10.1080/00987913.2018.1555510) described FSCI as training people “not for where we’re at, but for where we’re going” — a framing that positions the institute as anticipatory skills-building rather than remedial catch-up.

Research administrators managing open-access compliance or data-management-plan review
Library staff moving into or already working in scholarly-communication roles
Early-career researchers who want to specialise in research infrastructure rather than bench/field research
Funder programme officers who need to understand practitioner-level workflows, not just policy text
Publishing and repository staff building peer-review, persistent-identifier, or metrics expertise

How does the FSCI course structure work?

Each attendee commits to one week-long morning course, which allows sustained, cohort-based depth on a single subject, and supplements it with shorter afternoon elective courses on adjacent topics. This structure is designed to produce both a depth credential (the morning course) and breadth exposure (the electives), which is unusual among short-format professional development options in the research-support field.

Topics have included FAIR data management and stewardship, persistent identifiers, peer-review innovation, new forms of publication, research-metrics literacy, and — in recent years — AI governance in scholarly communication. Plenary sessions, “do-a-thons,” and structured networking events run alongside the coursework, which is what distinguishes FSCI from a standard webinar series.

What does FSCI cost, and are scholarships available?

FSCI publishes its registration fees and scholarship terms on the official FORCE11 site ahead of each year’s institute, and pricing has varied by year and by in-person/online format. FORCE11 has consistently run a scholarship programme to support attendance from historically underrepresented regions; organisers have reported scholarship recipients from six continents, including documented career-changing participation from institutions in Nigeria and Pakistan. For a research-support professional building a career-development business case, the scholarship route is often the most persuasive argument to an institution reluctant to fund a full-fee place.

Attribute	FSCI (FORCE11)	Formal scholarly communication librarian role
Format	One-week intensive summer institute	Ongoing salaried position
Entry route	Open registration; no degree prerequisite	Typically requires an MLIS or equivalent
Cost to individual	Course fee, offset by scholarships	N/A — paid employment
Output	Practical skills, network, open course materials	Institutional job title and remit
Best used as	A training pathway feeding into or alongside a role	The destination role itself

How does FSCI differ from a formal scholarly communication librarian role?

It is worth being precise about the distinction, because the two are often conflated in search results. A scholarly communication librarian is a formal, usually MLIS-qualified, salaried institutional role with responsibilities such as running an institutional repository, advising on copyright and open-access policy, or managing an “office of scholarly communication.” FSCI is not that role — it is a training pathway that can be undertaken by someone already in such a role, by someone aspiring to move into one, or by a research administrator, funder officer, or publisher who never intends to hold that job title at all.

This distinction matters for career planning. Treating FSCI as a credential-building input — alongside, not instead of, formal qualifications, ORCID-linked professional profiles, and institutional experience — is the more accurate way to use it. Institutions considering whether to fund staff attendance should therefore evaluate FSCI as continuing professional development, comparable to funding attendance at ARMA, NCURA, or EARMA training events, rather than as a substitute for a formal library or research-office qualification.

Frequently asked questions

What is FSCI 2026 and when does it take place?

FSCI 2026 is the annual FORCE11 Scholarly Communication Institute, running 27–31 July 2026. It follows the institute’s established format of a week-long morning course paired with rotating afternoon electives on open-science and research-communication topics for a global, cross-sector audience.

How much does FORCE11 FSCI cost to attend?

Registration fees are set and published by FORCE11 for each year’s institute and vary by format and early registration. FORCE11 runs a dedicated scholarship programme that has supported attendees from underrepresented countries and regions, which materially lowers the effective cost for many participants.

Who should attend the FORCE11 Scholarly Communication Institute?

FSCI is designed for researchers, librarians, publishers, funders, and research administrators at any career stage, plus students and postdocs. Courses are offered at introductory and advanced levels, so attendees choose a track matched to their existing scholarly-communication knowledge.

Are FSCI course materials available after the event?

Yes. FORCE11 has archived FSCI course materials from 2020 through 2024 openly on Zenodo and the Open Science Framework, meaning the training content remains accessible as a reference resource even for people who did not attend that year’s live sessions.

What this means for research-support careers

For institutions, FSCI attendance is a low-cost, high-signal way to build in-house open-science capacity without hiring a new specialist role. For individuals, it is a documented, citable training credential that sits alongside — not in place of — formal qualifications and institutional experience. As open-access mandates, data-management requirements, and AI-governance expectations continue to expand across funders including UKRI and cOAlition S signatories, the practical skills FSCI teaches are becoming a standard expectation of research-support work rather than a specialist add-on.

Research offices, libraries, and funder teams weighing professional-development budgets in 2026 should treat FSCI as one input in a broader research-support career pathway: a way to keep staff current with FAIR data practice, persistent identifiers, and evolving scholarly-communication standards, while formal qualifications and institutional experience continue to do the work of defining the job itself.

July 4, 2026

CRediT Taxonomy Author Contributions Example: Trial Consortia

A credit taxonomy author contributions example for a 100+-author clinical trial consortium paper typically cannot assign all 14 CRediT roles to every named individual. Instead, most multi-site consortia assign roles to a small “writing committee,” then credit the remaining site investigators and staff as a collective group — a workable but imperfect compromise between transparency and practicality.

The CRediT taxonomy author contributions example published by most journals — one paper, a handful of authors, each ticking a few of the 14 roles — is straightforward. It falls apart at scale. Multi-site clinical trial consortia routinely publish primary results papers with 50, 200, or even several hundred named contributors across dozens of hospitals, laboratories, and coordinating centres. Applying individual-level CRediT attribution to every one of them is rarely feasible, and the taxonomy itself offers no scaling guidance. This article examines how consortia actually resolve that gap, where the “writing committee” shortcut helps and where it hides real accountability problems, and what research administrators should check before signing off on a consortium submission.

CASRAI originated the CRediT contributor role taxonomy in 2014. The standard is now stewarded by NISO as ANSI/NISO Z39.104-2022, an important distinction for any institution citing CRediT in policy documents.

What is the CRediT taxonomy and how is it meant to work?
Why does individual-level CRediT attribution break down above 100 authors?
How do multi-site consortia actually assign CRediT roles?
Answer-first questions on CRediT and large author groups
What this means for research administrators, funders, and publishers

What is the CRediT taxonomy and how is it meant to work?

The CRediT (Contributor Roles Taxonomy) is a standardised list of 14 role categories — including Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, and the two Writing roles — used to describe what each named contributor to a research output actually did. Under ANSI/NISO Z39.104-2022, any of the 14 roles can be assigned to more than one contributor, and any contributor can hold more than one role. The taxonomy was designed around conventional author lists of perhaps two to twelve people, where a corresponding author can realistically survey everyone and compile an accurate statement.

CRediT deliberately does not define who qualifies as an author — that remains the domain of criteria such as those published by the International Committee of Medical Journal Editors (ICMJE). CRediT only describes contribution once authorship, or collaborator status, has already been decided elsewhere.

Why does individual-level CRediT attribution break down above 100 authors?

Multi-site clinical trial consortia — platform trials, adaptive-design mega-trials, and large international collaborative groups — routinely list hundreds of contributors: principal investigators at each site, research nurses, statisticians, data monitors, and a central coordinating team. Surveying every one of them individually against 14 role definitions, reconciling disagreements, and keeping the record current through a multi-year trial is an administrative task few coordinating centres can sustain.

Three practical failure points recur:

Collection burden. A corresponding author cannot manually chase 300 collaborators for role self-declarations before every manuscript revision.
Role granularity mismatch. Site-level staff often perform a genuinely narrow contribution (patient recruitment, sample handling) that maps to only one or two roles, making individual disclosure administratively disproportionate to its informational value.
Authorship-vs-collaborator ambiguity. Not every named contributor meets full authorship criteria, and CRediT provides no mechanism of its own for distinguishing the two — that decision is made upstream, under ICMJE or journal-specific rules.

The ICMJE’s Recommendations on the role of authors and contributors state plainly: “When a large multi-author group has conducted the work, the group ideally should decide who will be an author before the work is started and confirm who is an author before submitting the manuscript for publication.” In practice, that decision — not the CRediT assignment — is what most consortia spend their governance effort on.

How do multi-site consortia actually assign CRediT roles?

Three models are in active use across large trial consortia, and each trades transparency against administrative load differently. The dominant compromise is a named writing committee that receives individual CRediT attribution, combined with a collective collaborative group byline (for example, “The [Trial Name] Collaborative Group”) that carries the remaining contributors without a role-by-role breakdown for each person.

Model	How it works	Transparency	Administrative load
Full individual CRediT	Every named author, however many, completes a role disclosure form	Highest	Unsustainable above roughly 30-50 authors
Writing committee + collective group	A small writing committee gets full CRediT roles; remaining contributors are credited as a named collective group, often with individual names and site affiliations in a supplementary appendix	Moderate — accountable core, opaque periphery	Manageable; used by most platform and mega-trials
Hybrid tiered disclosure	Writing committee gets full CRediT roles; site principal investigators get a single broad role (e.g. Investigation); frontline staff are acknowledged, not authored	Higher than pure collective model	Moderate, requires a pre-agreed authorship policy

The ICMJE recommendations also clarify how this interacts with indexing: “the byline of the article identifies who is directly responsible for the manuscript,” and MEDLINE indexes as authors whichever names appear there, while non-author collaborators can still be individually listed and searchable if the journal provides an accompanying note. This means a consortium can preserve individual, searchable credit for site staff even when it does not extend full CRediT role disclosure to each of them — an option under-used by many trial groups.

A pre-agreed authorship and contribution policy, set before a multi-site trial begins recruitment rather than at the manuscript stage, is the single factor that most reliably prevents disputes later. Waiting until submission to decide who was an “author” versus a “collaborator” — and who gets which CRediT role — is the most common cause of delay and disagreement in large consortium publications.

Answer-first questions on CRediT and large author groups

What are examples of author contributions?

Typical author contributions include conceiving the study design, securing funding, recruiting patients, collecting or curating data, performing statistical analysis, writing the first draft, and critically reviewing the final manuscript. Under CRediT, each of these maps to one of 14 defined roles rather than a vague general description.

What should substantial contributions include to be credited as an author?

Per ICMJE criteria, a substantial contribution requires involvement in the work’s conception or design, or the acquisition, analysis, or interpretation of data, combined with drafting or critically revising the manuscript and final approval of the published version. Meeting only one element, such as data collection alone, typically warrants acknowledgement rather than authorship.

How to write an author contribution in a case report?

A case report contribution statement should name each author against the specific tasks they performed — for example, clinical assessment, literature review, drafting, and supervision — using plain, specific language rather than the fuller 14-role CRediT set, which is more suited to larger, multi-method studies with a genuinely divided workload.

What this means for research administrators, funders, and publishers

Research offices supporting multi-site consortium trials should treat CRediT and authorship decisions as a governance item from the protocol stage, not a manuscript-stage formality. A written policy — agreed by the steering committee before recruitment starts — should specify who sits on the writing committee, what threshold of involvement earns collective-group inclusion versus acknowledgement-only, and how the supplementary collaborator list will be maintained and version-controlled across a multi-year trial.

Funders and institutions increasingly use CRediT statements as an input to research assessment, so an opaque “collective group” byline with no supplementary breakdown under-serves early-career site staff who did substantive work but receive no individually attributable, citable role. Publishers that support both a named writing committee and a searchable, named collaborator appendix — rather than a collective name alone — give institutions and funders a materially better evidence trail for exactly this reason.

The underlying tension is not going away: CRediT was built for conventional author teams, and large trial consortia will keep testing its edges. Until a scaling mechanism is formally added to the taxonomy, the writing-committee-plus-named-collaborator-appendix model remains the most defensible practical compromise between individual accountability and administrative reality.

July 4, 2026

CRediT Taxonomy at Cell Press vs STAR Methods

Cell Press embeds the CRediT taxonomy inside a highly formalised manuscript template — Summary, STAR★Methods, and a back-matter Author Contributions section — rather than treating it as a free-floating declaration bolted onto the end of a paper. The taxonomy itself sits in Author Contributions, not inside STAR★Methods, but both are governed by the same family-wide Cell Press formatting policy. That distinction matters for anyone comparing how publishers operationalise contributor-role reporting.

The CRediT taxonomy at Cell Press journals — Cell, Cell Reports, Molecular Cell, Cell Metabolism, and the rest of the family — follows the same 14-role vocabulary used everywhere else, but the surrounding article architecture is unusually structured. CRediT is a controlled vocabulary of 14 contributor roles used to describe who did what on a research output. Understanding where Cell Press places it, and why, is useful for research administrators, publishers, and developers building submission tooling.

What is the CRediT taxonomy at Cell Press?
Where does CRediT sit relative to the Summary and STAR★Methods?
How does this differ from the free-standing statement used elsewhere?
Answer-first questions on the CRediT taxonomy
Implications for administrators, publishers, and developers

What is the CRediT taxonomy at Cell Press?

CASRAI originated the CRediT contributor role taxonomy in 2014. The standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. Cell Press adopted it early: Deborah Sweet, Cell Press’s Vice President of Editorial, announced in a June 2015 Cell Mentor post that the Author Contributions section — traditional or CRediT-formatted — was being introduced as an option across Cell Press journals.

At that point, per Sweet’s post, the section was optional unless a paper carried co-first authorship, in which case a contributions statement became necessary to clarify precedence. The taxonomy provides 14 discrete roles:

Conceptualization
Data curation
Formal analysis
Funding acquisition
Investigation
Methodology
Project administration
Resources
Software
Supervision
Validation
Visualization
Writing – original draft
Writing – review & editing

Cell Press has never claimed ownership of the taxonomy; its published guidance credits the originating collaboration and links out to the standard, consistent with an “originator, not owner” framing that has held since 2015.

Where does CRediT sit relative to the Summary and STAR★Methods?

This is the section most write-ups get wrong. Cell Press’s own manuscript-preparation guidance caps the front-matter Summary at 150 words, written as a single unstructured paragraph with no citations — it is not a labelled, IMRaD-style structured abstract. The structure that gives Cell Press its reputation lives further down the paper, in STAR★Methods (Structured, Transparent, Accessible Reporting), which replaces a conventional free-text Methods section with standardised subsections: a Key Resources Table, Resource Availability, Experimental Model and Subject Details, Method Details, and Quantification and Statistical Analysis.

CRediT itself does not sit inside STAR★Methods. It occupies its own Author Contributions block in the back matter, ordered — per the current Cell Press article template — after Acknowledgments and before Declaration of Interests and the reference list. The practical pattern is this: STAR★Methods standardises what was done and how; the CRediT-based Author Contributions statement, sitting immediately alongside it in the same standardised back matter, standardises who did it. Both are governed by one uniform, family-wide Cell Press formatting policy that applies identically whether a paper is submitted to Cell, Molecular Cell, or Cell Reports.

That is the genuinely distinct editorial pattern: not CRediT literally nested inside STAR★Methods, but CRediT folded into the same rigid, standardised template architecture that STAR★Methods represents — a single formatting regime covering resources, methods, and contributorship together, rather than an ad hoc statement appended wherever a given journal happens to put it.

How does this differ from the free-standing statement used elsewhere?

Many publishers treat the Author Contributions/CRediT statement as a genuinely free-standing element: a short paragraph or table inserted near the end of the manuscript with no other structural scaffolding around it. Cell Press’s family-wide template treats it as one governed component among several.

Feature	Cell Press pattern	Typical free-standing pattern
Summary/abstract	150-word unstructured paragraph, no citations	Varies by journal; often unstructured, no fixed cap
Methods reporting	Mandatory STAR★Methods with Key Resources Table	Free-text Methods, no standardised subsections
Author Contributions placement	Fixed back-matter slot after Acknowledgments, before Declaration of Interests	Placement varies; sometimes front matter, sometimes end matter
CRediT status (historically)	Optional unless co-first authorship (per 2015 policy)	Mandatory at many journals since 2016, e.g. Journal of Cell Science, per Company of Biologists policy
Governance	One family-wide policy across all Cell Press titles	Set independently per journal or per publisher imprint

The comparison matters for anyone auditing submission systems across publishers: a developer building CRediT-aware manuscript tooling cannot assume a single fixed position for the statement, nor assume it is mandatory everywhere. Journal of Cell Science, for instance, requires CRediT-tagged contributions during online submission and states plainly that the taxonomy does not itself determine who qualifies as an author — authorship is a separate editorial decision at every publisher, Cell Press included.

Answer-first questions on the CRediT taxonomy

What is the CRediT taxonomy?

The CRediT taxonomy is a controlled vocabulary of 14 contributor roles used to describe individual contributions to a research output, from conceptualization to writing – review & editing. It replaces a single vague “authorship” credit with a granular, role-by-role statement, and it is now formalised as ANSI/NISO Z39.104-2022.

What are the 14 roles of the CRediT taxonomy?

The 14 roles are Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, and Writing – review & editing. Any author may hold one or several roles on a single paper.

What does investigation mean in CRediT taxonomy?

Investigation, in CRediT terms, means conducting the research process itself — specifically performing experiments or carrying out data and evidence collection. It is distinct from Methodology (designing the approach) and from Formal analysis (applying statistical or computational techniques to the resulting data).

Implications for administrators, publishers, and developers

For research administrators, the Cell Press pattern is a reminder that CRediT compliance checks cannot be reduced to “is the statement present.” Where a co-first-authorship claim appears without any Author Contributions statement, that is a Cell Press-specific red flag worth raising with authors before submission, given the historical optional-unless-co-first-authors policy.

For publishers and journal-system developers, the lesson is architectural: pairing a standardised contributorship statement with a standardised methods-reporting format, under one uniform policy, appears to reduce the drift that otherwise causes CRediT statements to vary wildly in placement and completeness across a publisher’s own journal family. As more publishers formalise their own STAR★Methods-style templates, expect more of them to fold CRediT into the same governed structure rather than leaving it as an isolated, easily skipped field.

The underlying taxonomy remains unchanged wherever it appears. What Cell Press demonstrates is that where and how rigidly a publisher enforces CRediT — not the 14 roles themselves — is where meaningful editorial variation still exists across the scholarly-publishing landscape.

Related reading: the CRediT taxonomy overview, the full list of CRediT contributor roles, and CASRAI’s authorship criteria resources.

July 4, 2026

FAIR Data Point: Making Data Machine-Actionable

A FAIR data point is a lightweight metadata server that exposes structured, standardised descriptions of a dataset — its identifier, creator, licence and access route — through a REST API, so software (not just people) can discover and assess it automatically. Under the GO FAIR implementation network, FAIR Data Points are the working infrastructure that turns the FAIR principles from a policy statement into a queryable service.

In formal terms, a FAIR Data Point (FDP) is a metadata repository that follows the DCAT2 vocabulary and organises records in a fixed hierarchy — repository, catalogue, dataset, distribution — using Linked Data Platform containers, as set out in the peer-reviewed FDP specification (da Silva Santos et al., 2023, Data Intelligence, MIT Press).

What Is a FAIR Data Point?
How Does the GO FAIR Initiative Use FAIR Data Points?
FAIR Data Point vs Machine-Actionable DMP: What Is the Difference?
How Is FAIRness Measured? The F-UJI Evaluator
Where Does DDI Fit Into the FAIR Data Point Stack?
Frequently Asked Questions
What This Means for Data Stewards and Developers

What Is a FAIR Data Point?

A FAIR Data Point separates metadata from data. It does not host the dataset itself; it hosts a machine-readable description of the dataset, reachable at a stable HTTP endpoint. This separation is what makes the metadata queryable independently of wherever the underlying files are actually stored.

The reference specification defines four nested layers, each exposed as its own resource:

Repository — the top-level FDP instance, describing the organisation or project running it
Catalogue — a themed grouping of related datasets
Dataset — the described research object, with identifier, creator, licence and rights statement
Distribution — the concrete access point (a download URL, an API, a query service)

Every layer is exposed via a REST API and encoded as RDF using the DCAT2 vocabulary, which is why an FDP can be crawled and indexed by external harvesters without bespoke integration work per institution.

How Does the GO FAIR Initiative Use FAIR Data Points?

GO FAIR is a grassroots, community-run implementation network for the FAIR principles, not a standards body with formal ownership of a single specification. It organises its work around three self-described pillars — GO CHANGE (policy and culture), GO TRAIN (skills) and GO BUILD (technical infrastructure) — coordinated through the GO FAIR Foundation.

FAIR Data Points sit inside the GO BUILD pillar. GO FAIR pairs FDPs with FAIR Implementation Profiles (FIPs): a documented set of choices a specific research community makes about identifiers, vocabularies, access protocols and licensing terms. The FIP tells an FDP deployment which controlled vocabularies to use at the dataset and distribution layers, so that metadata from two unrelated institutions in the same domain remains interoperable rather than merely similar.

The combined goal is what GO FAIR calls the “Internet of FAIR Data & Services” — a distributed network of FDPs that automated agents can traverse to locate relevant data without a central index. A working example already in production is the European Joint Programme on Rare Diseases (EJP RD) Virtual Platform, whose index runs on a federated network of FDPs contributed by member registries across Europe, funded through the EU Horizon research programme.

FAIR Data Point vs Machine-Actionable DMP: What Is the Difference?

The two are frequently conflated because both are described as “machine-actionable,” but they describe different objects at different points in the research lifecycle. A machine-actionable Data Management Plan (maDMP) — built on the Research Data Alliance’s DMP Common Standard and served by tools such as DMPTool or DMPonline — describes intentions: what data a project will produce, where it will deposit it and under what licence. An FDP describes an already-deployed dataset that a machine can query right now.

Aspect	FAIR Data Point	Machine-Actionable DMP
Lifecycle stage	Post-deposit, dataset already exists	Pre-project, data not yet produced
Governing spec	GO FAIR / FDP specification (DCAT2, LDP)	RDA DMP Common Standard
Query interface	REST API over a live metadata service	JSON export or plan-management tool API
Granularity	Per dataset / per distribution	Per project or funding award
Typical operator	Data repository or institutional archive	Institution, funder, or research office

Confusing the two leads institutions to procure the wrong tool: an maDMP platform will not make a finished dataset crawlable, and an FDP deployment will not help a project plan its future data management obligations.

How Is FAIRness Measured? The F-UJI Evaluator

F-UJI is an automated FAIR assessment tool developed under the Horizon 2020 FAIRsFAIR project. It scores a dataset’s exposed metadata — including metadata served by an FDP — against a fixed set of maturity indicators grouped under the four FAIR facets, returning a numeric FAIRness score rather than a binary pass/fail.

F-UJI can only evaluate what is machine-visible: it checks whether a licence, persistent identifier or access protocol is declared in the metadata, not whether the underlying data file is actually reusable in practice. This is precisely why the metadata layer an FDP provides matters — a well-structured FDP deployment is what allows a tool like F-UJI to detect FAIRness signals automatically, while a plain data-download page with no structured metadata will score poorly regardless of how well-organised the actual dataset is.

Where Does DDI Fit Into the FAIR Data Point Stack?

The Data Documentation Initiative (DDI) is an XML/RDF metadata standard maintained by the DDI Alliance for describing social, behavioural and economic science data at the variable level — survey questions, coding frames, sampling design. DCAT2, the vocabulary an FDP uses by default, describes a dataset at the catalogue-entry level; it was never designed to capture variable-level detail.

A research community whose FAIR Implementation Profile specifies DDI alongside DCAT2 gets both: FDP-level crawlability for discovery, and DDI-level granularity for reuse. Social-science archives affiliated with the Consortium of European Social Science Data Archives (CESSDA) and the UK Data Service already publish DDI metadata; wiring that metadata into an FDP endpoint is a genuine interoperability gain rather than duplicated effort.

Frequently Asked Questions

What is a FAIR data point?

A FAIR Data Point is a metadata repository that exposes a dataset’s identifier, licence, creator and access route through a REST API, structured according to the DCAT2 vocabulary. It publishes metadata about data, not the data itself, so automated tools can discover and evaluate the dataset without human involvement.

What does FAIR data mean?

FAIR data meets the 2016 principles of Findability, Accessibility, Interoperability and Reusability, first formally published by Wilkinson et al. in Scientific Data. The principles apply to metadata as much as to the underlying files, which is why machine-readable metadata infrastructure, such as an FDP, is required to satisfy them at scale.

What are the four pillars of the FAIR data principles?

The four pillars are Findable (a persistent identifier and rich metadata exist), Accessible (metadata is retrievable via an open protocol, even if the data itself is restricted), Interoperable (metadata uses a shared, formal vocabulary such as DCAT2), and Reusable (a clear licence and provenance are attached).

What This Means for Data Stewards and Developers

Deploying a FAIR Data Point is an infrastructure decision, not a documentation exercise. In practice it requires three steps: agreeing a FAIR Implementation Profile with the relevant research community, mapping local repository metadata onto DCAT2 at the dataset and distribution layers, and registering the resulting endpoint so external harvesters and tools such as F-UJI can find it.

Pair persistent dataset identifiers from DataCite with the FDP’s dataset layer so citation and discovery metadata stay consistent
Use ROR identifiers for the institutional agent fields rather than free-text organisation names
Treat the FDP as complementary to, not a replacement for, an maDMP — one documents intent, the other serves the finished product

Funders are moving in this direction: the UNESCO Recommendation on Open Science (2021) names FAIR data as a foundational pillar, and Horizon Europe grant conditions increasingly expect data to be discoverable by machines, not just listed in a repository catalogue. For institutions building research-data infrastructure now, a standards-conformant FAIR Data Point is a defensible way to demonstrate machine-actionability rather than assert it in a data management plan.

For related definitions and terminology, see the CASRAI dictionary and the research administration pillar.

July 4, 2026

What Is a Data Trust? Research Data Governance

A data trust is a legal and technical framework in which an independent trustee, bound by fiduciary duty, makes decisions about a pool of data on behalf of the people or organisations who contributed it. For research data, this offers a genuine alternative to depositing datasets individually in a repository: instead of each contributor negotiating access terms alone, a trustee stewards shared data collectively, with accountability built into the governance structure itself.

A data trust can be defined precisely: it is an independent steward, holding data under a formal duty of impartiality, prudence, transparency and undivided loyalty to the beneficiaries whose data it manages, according to the Open Data Institute (ODI), which coined and refined the term from 2018.

What is a data trust?
How does a data trust govern research data differently from repository deposit?
Data sharing agreement vs data processing agreement: where does a data trust fit?
What does a data trust mean for FAIR data stewardship?
Indigenous data sovereignty and the CARE Principles
Answer-first Q&A
Implications and outlook for research administrators

What is a data trust?

A data trust is a legal structure in which one party authorises an independent trustee to make decisions about data on their behalf, for the benefit of a defined group of stakeholders. The ODI, which published its first explainer on the concept in July 2018 and adopted a working definition later that year, models the idea on established asset trusts such as land trusts, transposing the same fiduciary logic onto data.

The clearest working example is UK Biobank, established in 2006 as a charitable company with trustees to steward genetic data and biological samples from around 500,000 participants. The ODI itself trialled the concept in practice with the UK Government’s Office for AI in April 2019, testing whether fiduciary stewardship could work as applied governance rather than theory alone. Separately, the University of Cambridge’s Data Trusts Initiative has examined data trusts as a mechanism for pooling individuals’ legal data rights into a single negotiating and stewardship entity.

How does a data trust govern research data differently from repository deposit?

Under the standard deposit model, a researcher or institution submits a dataset to a repository, which applies institutional policy and a licence to govern reuse — the repository itself owes no fiduciary duty to depositors. Under a data trust, an independent trustee holds ongoing decision-making authority over the pooled data and is legally obliged to act in the beneficiaries’ interests, not merely to apply a static licence at the point of deposit.

This distinction matters most for sensitive, re-identifiable, or commercially valuable research data, where a one-off licence cannot anticipate every future access request. A trust structure allows collective, ongoing renegotiation of terms as new uses arise, rather than requiring each depositor to individually vet every downstream request.

Feature	Data trust	Repository deposit
Legal basis	Formal trust or fiduciary agreement	Institutional policy plus a data licence
Decision-maker	Independent trustee(s) with ongoing authority	Depositor sets terms once, at submission
Fiduciary duty	Yes — legally binding to beneficiaries	No — repository is a custodian, not a fiduciary
Best suited to	Sensitive, re-identifiable, or contested data	Open, low-risk, citation-ready datasets

Data sharing agreement vs data processing agreement: where does a data trust fit?

A data sharing agreement sets out the terms under which two or more parties exchange data they each control, while a data processing agreement — required under UK GDPR Article 28 wherever a processor handles data on a controller’s behalf — fixes the narrower, instructed relationship between a data controller and a processor acting only on its instructions.

A data trust does not replace either instrument; it changes who holds the authority to agree them. Rather than each institution separately negotiating a data sharing agreement for every new research collaboration, the trustee negotiates and monitors compliance centrally, on behalf of all contributors, reducing duplicated legal effort across a research consortium.

What does a data trust mean for FAIR data stewardship?

The FAIR Principles — Findable, Accessible, Interoperable, Reusable, formalised by Wilkinson and colleagues in Scientific Data in 2016 — govern how research data should be described and made available, but they do not specify who decides access terms. A data trust supplies exactly that missing governance layer.

Findability and interoperability metadata can still be maintained in a conventional repository even where the trust governs access rights.
Accessibility becomes a trustee decision rather than a fixed licence, allowing tiered or conditional access for sensitive datasets that would otherwise be withheld entirely.
Reusability is strengthened where beneficiaries trust the stewardship arrangement enough to contribute richer, less redacted data in the first place.

Institutions bound by research data management policy obligations — including UKRI’s Common Principles on Data Policy — can treat a data trust as a compliance mechanism that satisfies funder access requirements without forcing full open deposit of sensitive material.

Indigenous data sovereignty and the CARE Principles

The Global Indigenous Data Alliance published the CARE Principles — Collective Benefit, Authority to Control, Responsibility, and Ethics — in 2019, explicitly to complement FAIR by centring people and purpose rather than data alone. CARE was developed in direct response to concerns that FAIR-only stewardship could enable extraction of Indigenous data without consent or benefit-sharing.

A data trust structure is one of the few governance mechanisms that can operationalise CARE’s “Authority to Control” principle in practice: it gives a defined community, rather than a repository operator, the standing to appoint trustees and set binding terms. This is a genuinely distinct information-gain point rarely covered in generic data-trust explainers, most of which address corporate or civic data rather than research data sovereignty.

Answer-first Q&A

What is a data trust?

A data trust is a legal and technical structure that manages data on behalf of contributors through an independent trustee. The trustee holds a fiduciary duty — impartiality, prudence, transparency, and undivided loyalty — to the people or organisations whose data is pooled, rather than to any single commercial interest.

What is the data trust structure?

The structure places data under the control of a board of trustees who owe a fiduciary responsibility to the beneficiaries. Terms of access, use, and onward sharing are set collectively and can be renegotiated over time, unlike a fixed licence attached to a single dataset at deposit.

What is a public data trust?

A public data trust is governed by community, government, or non-profit board members committed to widening access to data affecting a defined population. In a research setting, this model supports population studies, public-health cohorts, and civic datasets where public benefit and consent are central governance concerns.

What is the role of a data trustee?

A data trustee manages, protects, and ensures the integrity and appropriate use of pooled data. Trustees identify sensitivity and risk, approve or decline access requests, and enforce the trust’s terms — a standing, ongoing role rather than a one-time licensing decision made at the point of deposit.

Implications and outlook for research administrators

For research administrators, the practical implication is that data trusts are not a substitute for repository infrastructure — findability, persistent identifiers, and metadata still depend on conventional deposit systems. What a trust adds is a governance layer above the infrastructure, suited to consortium data, population cohorts, and datasets involving Indigenous or otherwise sovereignty-sensitive communities.

Institutions weighing a data trust model should expect higher upfront legal cost than a standard repository licence, offset against lower recurring negotiation cost across a multi-year, multi-partner project. As FAIR-compliant infrastructure matures and CARE-aligned governance expectations grow, data trusts are likely to remain a minority but increasingly cited option for exactly the categories of research data — sensitive, collectively owned, or community-governed — that pure open deposit handles least well.

July 4, 2026

Materials Data Repository: NIST’s FAIR Approach

The NIST Materials Data Repository is a US federal, open-access archive that lets materials scientists deposit, describe and reuse research data files under the Materials Genome Initiative (MGI). It matters for research data management (RDM) because materials science has lagged biomedical and social-science fields in adopting FAIR data principles, and NIST’s infrastructure — built on the open-source DSpace platform — offers a concrete, working template for what FAIR looks like in a physical-science discipline.

A materials data repository is a structured digital archive purpose-built for storing, describing and sharing datasets specific to materials science: crystal structures, mechanical-property measurements, spectroscopy files, simulation outputs and processing metadata. Unlike a general-purpose institutional repository, it is organised around domain metadata schemas that make heterogeneous, often binary, materials data searchable and machine-actionable.

What is the NIST Materials Data Repository?
How does it support FAIR data principles?
How does it compare with other materials data infrastructure?
What does this mean for RDM programmes?
Answer-first Q&A
Where materials science RDM is heading

What is the NIST Materials Data Repository?

The NIST Materials Data Repository, hosted at materialsdata.nist.gov, is a file repository maintained by the US National Institute of Standards and Technology’s Material Measurement Laboratory. It accepts data in any format and pairs each deposit with descriptive metadata — title, author, ownership and, where available, richer domain fields — specifically to counter the “opacity” of binary materials files that would otherwise be unsearchable.

NIST states the repository was created to give the research community “a concrete mechanism for the interchange and re-use of research data on materials systems,” in direct support of the Materials Genome Initiative, the 2011 US federal effort to accelerate materials discovery through better data infrastructure. Content is organised into communities and collections, which groups related datasets and improves browsability for specific research teams or projects.

Technically, the repository runs on DSpace, an open-source repository platform widely used across academic libraries, which gives it three RDM-relevant capabilities out of the box: persistent identifiers for deposited files, a web-accessible API for machine-to-machine access, and federation with other repositories. NIST has used that API to feed repository references into the Materials Data Facility and a “root and rules” search algorithm, extending the data’s reach beyond the repository’s own interface.

How does the repository support FAIR data principles?

The FAIR data principles — Findable, Accessible, Interoperable, Reusable — were formalised in 2016 in Scientific Data by Wilkinson et al. as a shared standard for making research data machine-actionable, not just human-readable. NIST’s repository operationalises each element rather than treating FAIR as an abstract aspiration.

Findable: rich, mandatory metadata plus persistent identifiers make each dataset discoverable independent of where its underlying file happens to live.
Accessible: the majority of holdings are public and retrievable through a standard web browser or the repository’s API, with limited invitation-only collections reserved for pre-publication analysis.
Interoperable: structured metadata and DSpace’s federation capability let the repository exchange records with external systems such as the Materials Data Facility, rather than functioning as an isolated silo.
Reusable: depositor-selected licensing terms and descriptive context give downstream users the information they need to judge whether a dataset is fit for reuse in new research.

This matters because FAIR compliance in materials science carries a different technical burden than it does in genomics or clinical trials data. A single alloy characterisation dataset can combine imaging files, spectroscopy outputs and tabular composition data in incompatible native formats — which is precisely the interoperability problem a domain-specific repository, rather than a generic institutional one, is built to solve.

How does it compare with other materials data infrastructure?

NIST’s repository is one node in a small but growing international ecosystem of materials-specific data infrastructure. Research administrators advising physical-science departments should understand where each fits, since “materials data repository” covers genuinely different data types — deposited raw files versus computed, simulation-derived properties.

Repository	Steward	Data type	Notable FAIR feature
NIST Materials Data Repository	NIST (US federal)	Deposited experimental/research files, any format	Persistent IDs, API, DSpace federation
MDR (DICE)	National Institute for Materials Science, Japan	Data and publications, domain-tailored metadata	Metadata schemas tuned to materials disciplines
Materials Project	Lawrence Berkeley National Laboratory	Computed structure/property data	Open API for bulk computed-data queries
NOMAD	FAIRmat / open-source community	Simulation and computational materials data	Explicitly FAIR-by-design, free and open source

UK institutions have a domestic reference point too: the Henry Royce Institute, the UK’s national institute for advanced materials research, maintains a Digital Materials Foundry that curates links to major computational materials databases for UK researchers, positioning FAIR materials data as institutional infrastructure rather than a project-by-project afterthought.

Registries such as re3data.org — the DataCite-affiliated global registry of research data repositories — independently list the NIST repository, which gives it discoverability outside its own domain and is itself a small but real Findability signal under the FAIR framework.

What does this mean for RDM programmes?

Materials science RDM guidance remains thin relative to biomedical and social-science fields, where funder mandates, data-sharing plans and repository certification (CoreTrustSeal, for example) are comparatively mature. Research administrators supporting engineering and physical-science faculties can draw three practical lessons from NIST’s model.

Domain-specific metadata schemas matter more than generic institutional-repository templates for high-heterogeneity data such as materials characterisation files.
Persistent identifiers and API access are not optional extras — they are what converts a file dump into FAIR-compliant infrastructure.
Federation with discipline hubs (the Materials Data Facility, re3data.org) extends a dataset’s reach far beyond a single institutional URL.

For research administrators building data management plans that reference physical-science outputs, pointing PIs toward an established domain repository — rather than a generic institutional one — materially improves the odds that FAIR criteria in funder compliance reviews are actually met.

Answer-first Q&A

What is the purpose of a materials data repository?

A materials data repository exists to make heterogeneous, often binary materials science data — spectroscopy, imaging, composition and mechanical-property files — searchable, citable and reusable. It solves the specific problem that raw materials files are otherwise opaque to search engines and incompatible with generic institutional repository metadata schemas.

What are examples of materials data repositories besides NIST’s?

Beyond the NIST Materials Data Repository, notable examples include Japan’s NIMS MDR (via the DICE platform), the US Materials Project for computed structure data, and NOMAD, a European open-source repository explicitly built to FAIR specifications for computational materials science.

Is it costly to deposit data in a repository like NIST’s?

NIST’s Materials Data Repository is a federally funded, open-access service with no publicly advertised deposit fee, unlike some generalist commercial repositories that charge per gigabyte above a free tier. Costs for materials-specific deposit are therefore typically absorbed by the institution’s existing RDM infrastructure rather than billed per dataset.

What is the best materials data repository for FAIR compliance?

There is no single “best” repository — the right choice depends on data type. NOMAD and the Materials Project suit computed/simulation data, while NIST’s and NIMS’ MDR suit deposited experimental datasets; all four implement the core FAIR pillars but through different metadata and access mechanisms.

Where materials science RDM is heading

Materials science FAIR infrastructure is converging on the same architecture that biomedical and social-science RDM adopted earlier: persistent identifiers, API-level machine access, domain-tuned metadata and cross-repository federation. NIST’s Materials Data Repository, updated as recently as March 2025 according to its own programme page, demonstrates that a federal physical-science agency can build FAIR-compliant infrastructure without waiting for a universal cross-discipline standard to arrive first. For research administrators, the practical task now is steering physical-science principal investigators toward these domain repositories in data management plans, rather than defaulting to generalist options that were never built for materials data’s particular complexity.

July 4, 2026

Research Data Management Policy: €10.2bn Case

A research data management policy that treats FAIR compliance as a line-item cost, rather than a reuse and reputation asset, is the wrong accounting model. PwC estimated in a 2018 study for the European Commission that the absence of FAIR (Findable, Accessible, Interoperable, Reusable) research data costs the European economy at least €10.2 billion a year, largely through duplicated data collection and wasted researcher time. That figure is the strongest evidence available that under-investment in research data management (RDM) infrastructure is a false economy, not a saving.

A research data management policy is an institutional document setting out the responsibilities of researchers and the institution for planning, storing, securing, sharing and preserving research data across its lifecycle. Most UK universities — Southampton, Birmingham, Manchester, Edinburgh and others — already publish one. The argument here is narrower and more contentious: most are drafted, funded and governed as compliance paperwork, when the evidence says they should be funded as reuse and reputation infrastructure.

Why RDM policy gets treated as a cost centre
What the evidence actually says about FAIR and avoided cost
How funder compliance requirements are changing the calculus
The case for investing in data stewardship, not just policy text
Answer-first Q&A
Implications for institutional leaders

Why RDM policy gets treated as a cost centre

Institutional budgets typically classify research data management as overhead: storage costs, repository subscriptions, a data steward’s salary, training time. Each appears as a debit with no offsetting credit line, because savings from avoided duplication and faster reuse accrue diffusely, across future researchers and grants, not to the budget holder who paid for the infrastructure.

This accounting mismatch is compounded by how the data management plan (DMP) requirement is handled in practice. Most funders now mandate one, but research offices frequently treat it as a box-ticking exercise completed at proposal stage and never revisited, rather than a live operational document. That framing under-serves the researcher, who gets no practical reuse benefit, and the institution, which under-recovers the true cost of good RDM from grants that would pay for it.

UK Research and Innovation (UKRI) explicitly states that costs associated with research data management — storage, curation, repository deposit — are eligible for recovery under its funding. Institutions treating RDM as unfunded overhead are frequently leaving recoverable grant money unclaimed rather than avoiding a cost.

What the evidence actually says about FAIR and avoided cost

The FAIR data principles were formalised in 2016 by Wilkinson et al. in Scientific Data as a guide for making digital assets Findable, Accessible, Interoperable and Reusable by both humans and machines. FAIR data is not a compliance checkbox; it is a design standard for making data usable by someone who was not present when it was collected.

The clearest attributed cost estimate comes from PwC’s 2018 cost-benefit analysis for the European Commission, which put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven by researcher time lost searching for data, recreation of data that already exists, and lost interdisciplinary reuse. A separate, frequently cited illustration is the University of Minnesota’s decades-long diet study, whose original data nearly disappeared into storage before being recovered and reanalysed — a reminder that data loss is a recurring, avoidable event when retention and documentation are afterthoughts.

Three mechanisms explain where the savings actually come from:

Avoided duplication. Findable, well-described data lets a second researcher build on an existing dataset instead of re-running a costly collection exercise.
Faster reuse cycles. Interoperable data in standard formats with persistent identifiers can be integrated into new analyses without reformatting or re-negotiating access.
Preserved institutional memory. Deposit in a certified repository protects data against the single most common loss vector: staff turnover and undocumented local storage.

None of this shows up as a saving on a university’s annual accounts, which is precisely why RDM investment is chronically under-prioritised relative to its documented return.

How funder compliance requirements are changing the calculus

Funder mandates are steadily converting FAIR data from voluntary good practice into a hard compliance gate, which changes the institutional risk calculus even for leaders unconvinced by the reuse argument. UKRI’s Common Principles on Research Data, and the underlying Concordat on Open Research Data, require a data management plan for funded research and state that data should be made openly available with as few restrictions as necessary. Horizon Europe applies comparable requirements, and cOAlition S’s Plan S pushes the same expectations into journal-level open-access policy.

A comparison of how three major funders frame the requirement illustrates the convergence:

Funder / framework	Core RDM requirement	FAIR reference
UKRI	Data management plan for funded research; RDM costs eligible for recovery	Endorses FAIR via the Concordat on Open Research Data
Horizon Europe	DMP required within six months of project start, updated across lifecycle	“As open as possible, as closed as necessary,” explicitly FAIR-aligned
cOAlition S (Plan S)	Underlying data should accompany open-access publications	References FAIR principles for supporting data

Institutions that fund RDM only to the minimum needed for a single grant’s DMP template are exposed twice: to duplicated administrative cost when infrastructure is rebuilt project by project, and to compliance risk as funders move toward auditing DMP adherence rather than merely requiring its submission.

The case for investing in data stewardship, not just policy text

A policy document alone does not create FAIR data. That requires people: a data steward function — a dedicated role, a network of disciplinary data champions, or a research data service embedded in the library — able to advise researchers on repository choice, metadata standards and licensing at the point where those decisions are actually made, not after the fact.

Institutions that fund this role tend to route researchers toward standards-based infrastructure rather than ad hoc local storage: a research data repository registered in re3data.org, ideally holding Core Trust Seal certification, with persistent identifiers (DOIs) and standard metadata attached to every deposit. This is the practical, unglamorous mechanism by which the €10.2 billion estimate above is actually avoided — not through a policy PDF, but through a person and a repository that make FAIR operational.

CASRAI’s relevance here is provenance and interoperability, not ownership. CASRAI originated the CRediT contributor role taxonomy in 2014, now stewarded by NISO as ANSI/NISO Z39.104-2022 — the same underlying argument in a different domain: standardising who-did-what reduces duplicated verification effort just as standardising data description reduces duplicated data collection. Institutions weighing their research administration infrastructure should treat RDM policy, contributor attribution and open data reuse as one reputational and efficiency system, not separate obligations.

Answer-first Q&A

What is a research data management policy?

A research data management policy is an institutional document defining responsibilities for planning, storing, securing, sharing, and archiving research data across its lifecycle. UK universities including Edinburgh and Manchester publish theirs publicly, typically requiring a data management plan at proposal stage and deposit in an approved repository after project completion.

What are the FAIR data principles?

The FAIR data principles — Findable, Accessible, Interoperable, Reusable — were published by Wilkinson et al. in 2016 in Scientific Data as guidance for making digital research assets usable by both humans and machines, through persistent identifiers, standard metadata, and clear licensing.

Do UK and EU funders require a data management plan?

Yes. UKRI requires a data management plan for funded research and treats RDM costs as eligible for recovery, while Horizon Europe requires a DMP within six months of project start under its “as open as possible, as closed as necessary” principle.

How much does poor research data management actually cost?

PwC’s 2018 analysis for the European Commission put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven primarily by duplicated data collection and researcher time lost searching for data that already exists elsewhere.

Implications for institutional leaders

The practical implication is a reframing exercise, not necessarily a large new budget line. Research offices should cost RDM infrastructure — repositories, data steward time, metadata training — against the funder-eligible recovery already available through DMP-linked grants, rather than absorbing it as unfunded overhead. Leaders reviewing their research data management policy should ask whether it funds a data steward with real authority over repository choice and metadata quality, or whether it is a document that satisfies a compliance checklist and stops there.

The evidence — a €10.2 billion EU-wide cost estimate, UKRI’s funding eligibility for RDM costs, and Horizon Europe’s escalating DMP requirements — points one direction: institutions that keep treating FAIR compliance as a cost centre are choosing to keep paying the duplication tax FAIR data was designed to eliminate.

July 4, 2026

DDI Metadata Standard: FAIR Data Checklist for Survey Archives

The DDI metadata standard (Data Documentation Initiative) is an international, XML-based specification for documenting surveys, censuses, and other social, behavioural, and economic science microdata at both the study and variable level. It is the metadata backbone that most social science data archives use to make survey data findable, accessible, interoperable, and reusable (FAIR) — turning a raw data file plus a PDF codebook into a machine-readable, citable, cataloguable research object.

DDI is not a government mandate or a funder requirement; it is a community-maintained documentation standard. The DDI Alliance, an international collaboration established in 2003, maintains the specification and its schemas. This guide explains what the standard covers, who uses it, how it maps onto the FAIR principles, and the practical steps a repository or research team needs to adopt it.

What is the DDI metadata standard?
Who maintains DDI and which archives use it?
How does DDI support the FAIR data principles?
DDI-Codebook vs DDI-Lifecycle vs DDI-CDI
A practical checklist for adopting DDI
Answer-first Q&A
What this means for research data repositories

What is the DDI metadata standard?

The Data Documentation Initiative is a metadata standard for describing the full lifecycle of a research data collection: study design, sampling, data collection, processing, variables, and access conditions. It was built specifically for social, behavioural, and economic sciences data — surveys, censuses, panel studies, and administrative microdata — rather than as a general-purpose schema.

Records are encoded in Extensible Markup Language (XML), which makes them machine-readable and harvestable. A DDI catalogue record typically documents three layers: the study description (bibliographic citation, scope, geography, time period, methodology), the data file description (format, structure, missing-data conventions, weighting), and the variable description (question text, value labels, codes). This granularity is what separates DDI from simpler discovery schemas such as Dublin Core, which describe a resource but not its internal variable structure.

Who maintains DDI and which archives use it?

The DDI Alliance, an international collaboration of research institutions, statistical agencies, and data archives established in 2003, develops and maintains the specification. DDI is listed as a recognised research-data metadata standard in the Research Data Alliance Metadata Standards Catalog (entry m13), which documents its scope, schemas, and adoption.

According to the UK Data Service, DDI “is used by most social science data archives in the world” to structure catalogue records, and it forms the basis of the discovery metadata behind its own collection. The Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan and the members of CESSDA, the Consortium of European Social Science Data Archives, likewise build their cataloguing infrastructure on DDI, harvesting records via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) so aggregators can index them without direct database access.

How does DDI support the FAIR data principles?

The FAIR Guiding Principles — findable, accessible, interoperable, reusable — were formalised for the research community in 2016. DDI operationalises each principle for survey and social science data specifically, rather than leaving them as abstract goals.

Findable: structured study-level metadata (title, creators, keywords, abstract, coverage) makes records indexable by catalogues and search engines, and DDI records are commonly assigned persistent identifiers, including DOIs registered through DataCite.
Accessible: standardised access-condition fields tell a would-be reuser exactly how to request or download the data, and harvesting via OAI-PMH gives repositories a predictable retrieval protocol.
Interoperable: a shared XML vocabulary and controlled thesauri — the European Language Social Science Thesaurus (ELSST), maintained by CESSDA, is one widely used example — let metadata move between archives and languages without semantic drift.
Reusable: variable-level documentation (question wording, value labels, derivation logic) and provenance information are what actually let a second researcher re-run or extend an analysis, which is the point FAIR exists to serve.

DDI-Codebook vs DDI-Lifecycle vs DDI-CDI: which do you need?

DDI is not a single schema. Three variants serve different documentation depths, and choosing the wrong one is the most common early adoption mistake.

Variant	Best for	Documents	Status
DDI-Codebook (DDI-C)	A single finished dataset	Study, file, and variable description for one deposit	Simpler, widely used legacy format
DDI-Lifecycle (DDI-L)	Longitudinal or multi-wave studies	The full research lifecycle: concept, instrument, collection, processing, archiving, reuse	Comprehensive, versioned in the 3.x series
DDI-CDI (Cross-Domain Integration)	Integrating structured data across statistical and research domains	Model-driven descriptions that link datasets, variables, and classifications across systems	Developed jointly by the DDI Alliance and the SDMX community

A single-wave survey deposited once needs only DDI-Codebook. A cohort study revisited over years — the kind of resource the UK Data Service and ICPSR both hold in volume — needs DDI-Lifecycle to capture instrument changes between waves. DDI-CDI is aimed at repositories that need to align microdata with aggregate statistics (for example, linking a survey to official statistics published under SDMX), which is an emerging rather than default requirement.

A practical checklist for adopting DDI

Repositories and research teams introducing DDI documentation for the first time should work through these steps in order:

Identify your lifecycle stage. A one-off dataset needs DDI-Codebook; a repeated or panel study needs DDI-Lifecycle.
Model metadata before ingest, not after. Capture study description, sampling, collection dates, and variable labels/codes at deposit time using a structured deposit form, as the UK Data Service does, rather than reverse-engineering them from a finished file.
Use a DDI-aware authoring tool (for example Colectica or Nesstar-derived CESSDA tooling) instead of hand-writing XML, which is error-prone at scale.
Register a persistent identifier. Crosswalk core fields to the DataCite metadata schema so the dataset gets a citable DOI alongside its DDI record.
Adopt a controlled vocabulary such as ELSST for subject keywords to keep records interoperable across languages and archives.
Enable OAI-PMH harvesting so catalogue aggregators and search services can index the record without bespoke integration work.
Validate against peer practice — check the record structure against the RDA Metadata Standards Catalog entry and against comparable ICPSR or CESSDA holdings before publishing.

Answer-first Q&A

What is the metadata standard DDI?

DDI (Data Documentation Initiative) is an international metadata standard for documenting socioeconomic surveys, censuses, and microdata. It is maintained by the DDI Alliance, encoded in XML, and used by most social science data archives worldwide to capture study, file, and variable-level documentation in one structured record.

What is the best metadata standard for survey data?

For general resource discovery, Dublin Core (ISO 15836) is the simplest and most widely implemented option. For social science survey and microdata specifically, DDI is the domain standard, because it documents variables and methodology in a depth Dublin Core does not attempt.

How does DDI support the FAIR data principles?

DDI supports FAIR by pairing structured, machine-readable metadata with persistent identifiers for findability, standardised access fields for accessibility, a shared XML vocabulary and thesauri for interoperability, and variable-level provenance for reusability — the depth needed to re-run a secondary analysis.

What is the difference between DDI-Codebook and DDI-Lifecycle?

DDI-Codebook documents a single finished dataset. DDI-Lifecycle documents the entire research process — instrument design, fieldwork, processing, and archiving — across multiple waves, making it the correct choice for longitudinal and panel studies rather than one-off deposits.

What this means for research data repositories

Funder and journal data-sharing policies increasingly ask for FAIR-compliant deposits, but “FAIR” is a set of principles, not a file format. DDI is one of the few domain standards that translates those principles into a concrete, testable schema for survey and social science data — which is why it underpins the cataloguing infrastructure at the UK Data Service, ICPSR, and CESSDA member archives rather than being a niche archival choice.

Institutions building or upgrading a research data repository for social science holdings should treat DDI-Lifecycle adoption, ELSST keywording, and DataCite DOI registration as a single connected workflow rather than three separate projects. Repositories that skip variable-level documentation still get a catalogue entry, but they do not get reuse — and reuse, not deposit, is the actual measure of FAIR success. Institutional research administration and data management guidance should reference DDI explicitly wherever survey or microdata deposit is in scope.

July 4, 2026

Australian Research Data Commons: FAIR Model

The Australian Research Data Commons (ARDC) is Australia’s national research data infrastructure body: formed in 2018 by merging three earlier programmes, it gives researchers shared, FAIR-aligned access to data discovery, compute, and identifier services so individual universities do not have to build this capability alone.

The ARDC is a public company limited by guarantee that operates Australia’s national research data commons, formed on 1 July 2018 from the merger of the Australian National Data Service (ANDS), Nectar, and Research Data Services (RDS). For research administrators and institutional leaders comparing centralised national investment against distributed, institution-by-institution research data management (RDM), the ARDC is the clearest working example of the centralised model operating at national scale.

What is the Australian Research Data Commons?
How is the ARDC funded and governed?
What infrastructure does the ARDC actually operate?
Centralised vs distributed: what does the ARDC model mean for institutions?
Answer-first questions on the ARDC
What this means for institutions and funders
Outlook

What is the Australian Research Data Commons?

The Australian Research Data Commons consolidates three predecessor national programmes into a single body responsible for research data infrastructure across all disciplines. Before 2018, the Australian National Data Service (ANDS, established 2008), Nectar (established 2009), and Research Data Services (RDS) each managed a separate piece of the national e-research landscape: discovery, compute, and storage respectively.

Merging them removed the seams between discovery, storage, and compute that researchers previously had to navigate across three separately governed programmes. The ARDC’s stated aim, per its own site, is to enable Australian researchers and industry to access “nationally significant” digital research infrastructure, skills, and data collections rather than each institution replicating this from scratch.

How is the ARDC funded and governed?

The ARDC is funded primarily through the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS), the same mechanism that underwrote its predecessor programmes. ANDS was originally funded via a 2008 agreement between the (then) Department of Innovation, Industry, Science and Research and Monash University, with further funding arriving through the Education Investment Fund under the government’s Super Science Initiative.

Governance sits with a board overseeing a public company limited by guarantee, headquartered in Melbourne with staff across Canberra, Adelaide, Perth, Ballarat, Brisbane, and Sydney. This is a materially different governance shape from a distributed RDM model, where each university’s research office, library, and IT division independently funds and governs its own data services against the institution’s own budget cycle.

What infrastructure does the ARDC actually operate?

The ARDC’s core, user-facing service is Research Data Australia, a discovery portal giving access to metadata records from over 100 Australian research organisations, cultural institutions, and government agencies. It also runs the Nectar Research Cloud, a shared national compute facility, and coordinates three Thematic Research Data Commons that target long-term, discipline-specific infrastructure needs, including health and medical research and the humanities, arts, social sciences and Indigenous research (HASS) domain.

Beyond discovery and compute, the ARDC’s remit extends to standards and skills work that a single institution would struggle to justify funding alone:

Coordinating Australia’s national persistent identifier (PID) strategy, encouraging consistent use of identifiers for people, organisations, and datasets
Publishing FAIR data guides and running structured training such as “FAIR Data 101”
Requiring FAIR-aligned practice from its own co-investment projects as a condition of funding
Operating the Nectar Research Cloud (roughly 50,000 compute cores serving around 20,000 users, per historical ARDC/Nectar reporting) alongside virtual laboratories for specific research communities

Centralised vs distributed: what does the ARDC model mean for institutions?

A centralised national commons like the ARDC amortises the cost of discovery infrastructure, identifier strategy, and large-scale compute across an entire research system rather than each institution paying separately. The trade-off is that institutions cede some control over roadmap priorities and must align local practice with a national standard rather than an internally chosen one.

Dimension	Centralised national model (ARDC)	Distributed institutional model
Funding source	National programme (NCRIS)	Individual institutional budgets
Discovery layer	One shared portal (Research Data Australia)	Separate institutional repositories
Compute/storage	Shared national cloud (Nectar)	Institution-specific procurement
Standards consistency	Single national PID and FAIR policy	Varies by institution
Duplication risk	Low — infrastructure built once	Higher — each institution rebuilds similar tooling
Local control	Lower — national roadmap governs priorities	Higher — institution sets its own priorities

Institutions weighing this trade-off are not choosing between “good” and “bad” infrastructure; they are choosing where duplication cost and local autonomy sit on a single spectrum. The ARDC demonstrates that a national commons can deliver FAIR-aligned discovery and compute without every institution independently re-solving the same identifier and storage problems.

Answer-first questions on the ARDC

What is Research Data Australia?

Research Data Australia is the ARDC’s national discovery portal, giving researchers a single point of access to metadata describing datasets held across more than 100 Australian research organisations, cultural institutions, and government agencies. It descends from the earlier ANDS Collections Registry and remains the ARDC’s principal public-facing discovery service.

How is the ARDC funded?

The ARDC is funded chiefly through the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS), following on from funding arrangements that originally supported its predecessor programmes, ANDS and Nectar, including money from the Education Investment Fund under the Super Science Initiative.

What did the ARDC replace?

The ARDC replaced three separately governed programmes on 1 July 2018: the Australian National Data Service (ANDS), Nectar (National eResearch Collaboration Tools and Resources), and Research Data Services (RDS), consolidating discovery, compute, and storage under one national body.

What this means for institutions and funders

For institutions and funders outside Australia, the ARDC is a working case study rather than a template to copy wholesale — national research systems differ in scale, federal structure, and existing infrastructure maturity. What generalises is the underlying logic: discovery metadata, persistent identifiers, and baseline compute are commodity infrastructure that gains value from being shared rather than re-procured by every institution.

Institutions currently investing in distributed RDM should ask which of their own services are genuinely differentiating (subject-specific curation, disciplinary expertise) versus which are commodity infrastructure better funded once, nationally or consortially, than dozens of times over.

Outlook

The ARDC’s roadmap continues to run through Australia’s National Research Infrastructure planning cycle, with persistent identifiers and FAIR-by-default practice as recurring priorities. As more national and regional funders assess where to draw the line between centralised and distributed research administration infrastructure, the ARDC’s decade-long consolidation experience — and the FAIR principles it operationalises via its data terminology and standards resources — offers a concrete reference point rather than an abstract framework.

July 4, 2026

MRC Grants Awarded: How to Read the Register

MRC grants awarded data is published across three separate UKRI sources — Gateway to Research, the legacy Grants on the Web (GOTW) register, and MRC’s board and panel outcomes pages — and reading it correctly for benchmarking means matching each source to a different question: what was funded, who applied, and how competitive each specific panel meeting was.

The MRC grants awarded register is the collective term for the public funding-decision records that UK Research and Innovation (UKRI) publishes for the Medical Research Council, spanning historical award spreadsheets, a live searchable grants database, and meeting-by-meeting board and panel outcome listings. For research office staff building competitor intelligence or benchmarking their institution’s success against peers, the register is genuinely useful — but only if its structure and its stated caveats are understood before the numbers are used.

What is the MRC grants awarded register?
Where to find MRC grants-awarded data: three sources compared
How to read board and panel outcomes for benchmarking
How to benchmark success rates and competitor institutions correctly
Common questions on reading the MRC register
Implications for research offices and what happens next

What is the MRC grants awarded register?

There is no single document called the “MRC grants awarded register” — it is a set of linked publications UKRI maintains under its “What MRC has funded” pages. These cover awarded grants and fellowships from April 2006 to December 2019 as a downloadable spreadsheet, interactive Tableau dashboards for 2022–23 funding decisions, and rolling board and panel outcome listings for funding meetings from 2017 onward, with earlier records held in the UK Government Web Archive.

Before 2018, MRC referred to this material as “success rates”; UKRI has since folded the reporting into the wider board and panel outcomes format used across all seven research councils. Any benchmarking exercise therefore has to account for a terminology and format change partway through the period being analysed.

Where to find MRC grants-awarded data: three sources compared

Three distinct tools hold MRC award data, and each answers a different research-intelligence question. Confusing them is the single most common reading error institutions make when building competitor comparisons.

Source	What it covers	Update pattern	Best use
Gateway to Research	Full award records once a grant has started, including principal investigator, institution and value, across all UKRI councils	Continuous, as grants start	Cross-council portfolio and competitor analysis
Grants on the Web (GOTW)	Legacy register of MRC-administered grants, fellowships and training grants, filterable by institution	Static; predates the UKRI merger	Institution-level historical lookups
Board and panel outcomes	Score out of ten and funding decision for every application discussed at a given meeting	Usually within four weeks of each meeting	Competitive positioning within a specific funding round
Archived spreadsheet and success-rate data	Award listings April 2006–December 2019 and pre-2018 success-rate summaries	Frozen, held on the UK Government Web Archive	Long-run trend analysis

For most benchmarking work, Gateway to Research and the board and panel outcomes pages should be the primary pair: the former gives the awarded portfolio, the latter gives the competitive context each award was won against.

How to read board and panel outcomes for benchmarking

MRC scores every application from one to ten, with ten the best, and this scoring structure applies across all types of MRC funding meeting. Applications are then listed in numerical order within blocks according to their median score group and funding decision, according to UKRI’s published board and panel outcomes guidance.

Outcomes are usually published within four weeks of a meeting, though UKRI notes this can sometimes take longer. Crucially, applications that are unsuccessful after an earlier shortlisting stage are not discussed at the funding meeting and are therefore not included in board and panel outcomes at all — a data-quality point that matters enormously for anyone computing a success rate, since the visible denominator understates total submissions.

Score and decision are recorded per application, not per institution, so institution-level rates must be aggregated manually.
Shortlisting-stage rejections are invisible in this dataset — factor this into any success-rate calculation.
Full award detail (value, abstract, classification) only appears on Gateway to Research once the grant has actually started.

How to benchmark success rates and competitor institutions correctly

UKRI states explicitly that funding decisions are made “in circumstances unique to each panel meeting” and that the funding cut-off is dependent on the budget available at that specific meeting — not a fixed quality threshold. UKRI’s guidance is direct: institutions should not compare funding cut-off points made in different meetings, and UKRI will not consider challenges or enquiries based on such comparisons.

This has a practical consequence for benchmarking: a proposal scoring 7/10 that was funded in a budget-flush round and a proposal scoring 8/10 declined in a tighter round are not evidence that the second panel was harsher. A robust competitor-analysis method therefore favours relative, within-round comparisons — an institution’s share of awards made at a given meeting, or across a given scheme over several rounds — over any single cross-period success-rate percentage pulled from a headline figure.

Combining Gateway to Research (what was funded), board and panel outcomes (how competitive that round was), and GOTW’s institution filter (a second, independent cross-check for MRC-specific awards) gives a defensible three-source method rather than a single-source snapshot.

Common questions on reading the MRC register

How do I search MRC grants awarded by institution?

Use Grants on the Web (GOTW), the legacy register hosted at gotw.nerc.ac.uk, and filter by “Institution > Medical Research Council (MRC)”; each project links to the full grant record, including principal investigator and value. For more current, cross-council records, Gateway to Research offers the same institution-level filtering.

Where can I find MRC board and panel outcomes?

UKRI publishes MRC’s board and panel outcomes in the “What MRC has funded” section of ukri.org, usually within four weeks of each funding meeting. Outcomes list every application discussed, its score out of ten and its funding decision, allowing panel-by-panel benchmarking rather than reliance on one headline figure.

Is there a live MRC grants search tool?

Gateway to Research is UKRI’s live, searchable database of funded projects across all seven research councils, updated continuously as grants start. Grants on the Web remains a parallel legacy tool for MRC-administered awards, useful for cross-checking older or training-grant records.

Can I compare MRC funding cut-off scores between panel meetings?

No — UKRI explicitly advises against this. Each meeting’s funding cut-off depends solely on the budget available at that specific meeting, not a fixed quality bar, so scores funded in one round and declined in another are not directly comparable as evidence of relative panel rigour.

Implications for research offices and what happens next

For research administration and funding-intelligence teams, the practical implication is that MRC grants-awarded data supports rigorous benchmarking only when the three sources are triangulated and UKRI’s own comparability caveats are respected. A single downloaded spreadsheet or a bare success-rate percentage, taken in isolation, will systematically misrepresent competitive position because of the shortlisting-stage exclusion and the meeting-specific funding cut-off.

UKRI last updated its board and panel outcomes guidance on 3 March 2026 and its “What MRC has funded” summary page on 29 September 2025, and continues to migrate historical reporting into Tableau-based dashboards — most recently for 2025 panel outcomes and attendance. Institutions building recurring funding-intelligence dashboards should expect this format to keep evolving, and should re-check source URLs each reporting cycle rather than hard-coding links to any single spreadsheet. Research administration teams that build this triangulated method once can reuse it across other UKRI councils, since board and panel outcomes reporting now follows a common structure council-wide.

July 4, 2026

Author: MCP Service

What is the FORCE11 Scholarly Communication Institute?

Who should attend FSCI as a career-development step?

How does the FSCI course structure work?

What does FSCI cost, and are scholarships available?

How does FSCI differ from a formal scholarly communication librarian role?

Frequently asked questions

What is FSCI 2026 and when does it take place?

How much does FORCE11 FSCI cost to attend?

Who should attend the FORCE11 Scholarly Communication Institute?

Are FSCI course materials available after the event?

What this means for research-support careers

Contents

What is the CRediT taxonomy and how is it meant to work?

Why does individual-level CRediT attribution break down above 100 authors?

How do multi-site consortia actually assign CRediT roles?

Answer-first questions on CRediT and large author groups

What are examples of author contributions?

What should substantial contributions include to be credited as an author?

How to write an author contribution in a case report?

What this means for research administrators, funders, and publishers

What is the CRediT taxonomy at Cell Press?

Where does CRediT sit relative to the Summary and STAR★Methods?

How does this differ from the free-standing statement used elsewhere?

Answer-first questions on the CRediT taxonomy

What is the CRediT taxonomy?

What are the 14 roles of the CRediT taxonomy?

What does investigation mean in CRediT taxonomy?

Implications for administrators, publishers, and developers

What Is a FAIR Data Point?

How Does the GO FAIR Initiative Use FAIR Data Points?

FAIR Data Point vs Machine-Actionable DMP: What Is the Difference?

How Is FAIRness Measured? The F-UJI Evaluator

Where Does DDI Fit Into the FAIR Data Point Stack?

Frequently Asked Questions

What is a FAIR data point?

What does FAIR data mean?

What are the four pillars of the FAIR data principles?

What This Means for Data Stewards and Developers

What is a data trust?

How does a data trust govern research data differently from repository deposit?

Data sharing agreement vs data processing agreement: where does a data trust fit?

What does a data trust mean for FAIR data stewardship?

Indigenous data sovereignty and the CARE Principles

Answer-first Q&A

What is a data trust?

What is the data trust structure?

What is a public data trust?

What is the role of a data trustee?

Implications and outlook for research administrators

What is the NIST Materials Data Repository?

How does the repository support FAIR data principles?

How does it compare with other materials data infrastructure?

What does this mean for RDM programmes?

Answer-first Q&A

What is the purpose of a materials data repository?

What are examples of materials data repositories besides NIST’s?

Is it costly to deposit data in a repository like NIST’s?

What is the best materials data repository for FAIR compliance?

Where materials science RDM is heading

Why RDM policy gets treated as a cost centre

What the evidence actually says about FAIR and avoided cost

How funder compliance requirements are changing the calculus

The case for investing in data stewardship, not just policy text

Answer-first Q&A

What is a research data management policy?

What are the FAIR data principles?

Do UK and EU funders require a data management plan?

How much does poor research data management actually cost?

Implications for institutional leaders

What is the DDI metadata standard?

Who maintains DDI and which archives use it?

How does DDI support the FAIR data principles?

DDI-Codebook vs DDI-Lifecycle vs DDI-CDI: which do you need?

A practical checklist for adopting DDI

Answer-first Q&A

What is the metadata standard DDI?

What is the best metadata standard for survey data?

How does DDI support the FAIR data principles?

What is the difference between DDI-Codebook and DDI-Lifecycle?