What machine-actionable compliance metadata means for research administration

Most research compliance lives in prose. An ethics approval is a letter; a data-protection assessment is a Word document; a material transfer agreement is a signed PDF in a filing system. Each is legally meaningful and administratively essential, and each is almost completely opaque to the systems that need to act on it. Machine-actionable compliance metadata is the idea that the salient facts in these documents — who approved what, under which regime, with what conditions and expiry — should also exist as structured, queryable data. This article explains what that means and why it is the next obvious step in reducing the administrative burden that CASRAI was founded to address. It draws on the vocabulary of the compliance and regulatory domain.

What “machine-actionable” actually means

A document is human-readable. A field is machine-actionable. The difference is whether another system can reliably extract a fact and act on it without a person reading the document first. “Ethics approval REC-2026-0481 granted by the University Research Ethics Committee on 2026-03-14, valid until 2029-03-14, covering protocol v2, conditions: annual review required” is the same information as the approval letter, but expressed so that a CRIS can check whether a project’s ethics approval is current, a repository can refuse a deposit whose approval has lapsed, and a funder report can be assembled without manual transcription.

The model here is the machine-actionable data management plan (maDMP). The research-data community spent several years turning the DMP from a frozen narrative document into structured content that can be exchanged between systems, validated against funder requirements, and updated through the project lifecycle. Compliance metadata is the same move applied to the ethics, data-protection, and agreements layer.

Three worked examples

Ethics review

An ethics approval — from an Institutional Review Board (IRB) in the US idiom, a Research Ethics Committee (REC) in the UK and Europe, or a Human Research Ethics Committee (HREC) in Australia — carries a small number of facts that systems repeatedly need: the approving body, the approval identifier, the review category (exempt, expedited, or full-board review), the date granted, the expiry or renewal date, the protocol version covered, and any conditions. Today those facts are re-keyed by hand into grant systems, CRIS records, and journal submission forms, with all the transcription error that implies. As structured metadata, the approval is recorded once and read everywhere. A journal’s submission system could, in principle, verify an approval against the issuing institution rather than trusting a typed-in reference number.

Data protection

Under the General Data Protection Regulation (GDPR) and its equivalents, a project handling personal data has a defined structure: a data controller, possibly a data processor, a lawful basis for processing, and — for higher-risk processing — a completed Data Protection Impact Assessment (DPIA). These are exactly the fields a research office needs to answer “are we compliant for this project, and who is accountable?” Expressed as metadata attached to the project record, the controller/processor relationship and the DPIA status become queryable across a whole portfolio, rather than being rediscovered project by project. Note that machine-actionable does not mean the personal data itself is exposed — it is the compliance facts about the processing that are structured, not the protected data.

Material transfer agreements

A material transfer agreement (MTA) governs the movement of physical research materials between institutions, and it typically carries conditions that have downstream consequences: permitted uses, publication restrictions, requirements to share derivatives, and obligations flowing back to the provider. When the MTA is a PDF in a drawer, those conditions are invisible to the systems where they matter — the repository that is about to publish a derivative, the technology-transfer office negotiating a licence. As structured metadata, the agreement’s key terms travel with the material and can be checked automatically before an action that the agreement restricts.

Why this is a research-administration problem, not a paperwork problem

The administrative burden on researchers and research offices is, to a significant degree, a metadata problem in disguise. The same compliance facts are entered repeatedly into incompatible systems because no shared, structured representation exists. Every CRIS implementation invents its own ethics-status fields; every funder portal asks for the same approvals in a different shape; every journal’s submission form re-requests information the institution already holds. The cost is the multi-billion-pound annual burden the original CASRAI set out to reduce.

A shared, controlled vocabulary for compliance — ethics review body, ethics review outcome, review category, controller, processor, MTA condition — is the precondition for machine-actionable compliance. Without it, every system structures the facts differently and interoperability fails at the first join. With it, a compliance fact recorded once can be validated, exchanged, and acted on across the lifecycle. This is precisely the convening-and-defining role the CASRAI dictionary is built to play.

The jurisdiction caveat

A real risk in this work is false equivalence. An IRB, a REC, and an HREC are analogous but not identical; GDPR’s controller/processor model does not map cleanly onto every other data-protection regime; “informed consent” carries different statutory meaning in different countries. The goal of machine-actionable compliance metadata is not to flatten these differences into a single global schema that erases jurisdiction. It is to normalise the shared structure — there is an approving body, there is an outcome, there is an expiry — while preserving the jurisdictional specifics as values within that structure. A vocabulary that pretends a REC and an IRB are the same thing would be worse than no vocabulary at all.

What to do now

For research offices and CRIS owners: start capturing the structured facts behind compliance documents — body, identifier, category, dates, conditions — as fields, not just as attached PDFs, even before a shared standard exists. For standards and vocabulary work: prioritise the controlled vocabulary for ethics review, data-protection roles, and agreement conditions, federating to the relevant legal frameworks rather than inventing normative content. The maDMP work is the proof of concept; compliance is the natural next layer.

Related reading

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *