regulatory metadata – CASRAI Dictionary

Most researchers experience compliance as a sequence of forms: an ethics application here, a data-protection assessment there, a signed agreement before a reagent can cross an institutional boundary. Each is completed under deadline pressure, filed as a PDF, and rarely consulted again. Yet the facts those documents record — that a study was approved, that personal data are processed under a defined legal basis, that a material was transferred under stated conditions — are exactly the facts that funders, regulators, repositories, and downstream researchers later need to check. This article sets out how the main compliance frameworks fit together, and why representing them as structured metadata, drawing on the compliance and regulatory domain, turns a paperwork burden into reusable infrastructure.

Ethics review: one function, many names

The oversight of research involving human participants is a near-universal requirement, but its institutional form varies by jurisdiction, and the vocabulary trips people up. In the United States the reviewing body is the Institutional Review Board (IRB); in the United Kingdom and much of Europe it is the Research Ethics Committee (REC); in Australia it is the Human Research Ethics Committee (HREC). These are not different things so much as the same function under different names and statutes, and a controlled vocabulary should treat them as such rather than privileging one country’s term.

The review itself is not monolithic. Most regimes distinguish levels of scrutiny calibrated to risk: an exempt review for studies meeting defined exemption criteria, an expedited review for minimal-risk work, and a full board review for higher-risk studies considered by the whole committee. The agreement that lets a participant take part is informed consent — a documented agreement to participate after being told the nature and risks of the research — with variants such as broad consent for unspecified future use, and an ethics-approved waiver of consent where standard procedures are omitted.

Data protection: GDPR’s defined roles

Where research touches personal data, ethics approval is necessary but not sufficient; data-protection law applies in its own right. In the European Union and, in near-identical form, the United Kingdom, the governing instrument is the General Data Protection Regulation (GDPR), and its precision is a feature, not an obstacle. GDPR assigns defined roles: a data controller is the entity that determines the purposes and means of processing, and a data processor processes personal data on the controller’s behalf. The distinction is not pedantic — it determines who carries which legal obligation — and a study that cannot say which party is controller and which is processor has a real gap, not merely a documentation one.

For higher-risk processing, GDPR requires a Data Protection Impact Assessment (DPIA): a structured analysis of the risks a processing activity poses to data subjects, and the measures taken to mitigate them. The DPIA is the data-protection analogue of the ethics application, and like it, it tends to be written once and shelved. Comparable regimes exist elsewhere — the United States governs protected health information under HIPAA — while the international ethical baseline for medical research remains the Declaration of Helsinki.

Material transfer: the agreement that governs the sample

The third pillar is contractual rather than regulatory. A Material Transfer Agreement (MTA) is a legal agreement governing the transfer of research materials — cell lines, reagents, datasets, biological samples — between institutions. The MTA sets out what the recipient may do with the material, what they may not, who owns any derivatives, and what acknowledgement or rights flow back to the provider. It is easy to treat the MTA as a one-off signature, but its terms persist for the life of the material and everything derived from it, which is precisely why its conditions need to travel as data rather than languish in a signed PDF.

An MTA whose terms are buried in a scanned signature page is a constraint nobody can check. The same agreement, recorded as structured conditions linked to the material and the receiving project, is a constraint a system can enforce — flagging, for instance, that a derivative dataset inherits a no-redistribution clause before it is deposited openly.

Compliance does not stop there. A conflict of interest, whether financial or personal, and export-controlled technology rules under regimes such as the EAR and ITAR, are further disclosable facts that arrive as yet more forms asking for information the institution already holds somewhere.

Why this is, at bottom, a metadata problem

Here is the connection to CASRAI’s mission. Almost every compliance artefact records a structured fact: a study has an approval reference and an approving committee; a processing activity has a controller, a legal basis, and a DPIA; a material moves under an MTA with stated conditions. The burden researchers feel as oppressive is largely the burden of re-entering those facts, by hand, in each system’s format — ethics portal, funder report, repository deposit form, institutional register. A regime that demands manual re-keying manufactures exactly the transcription errors it later treats as compliance failures.

Anchoring compliance records in persistent identifiers changes this. An approval linked to its project by a RAiD, an institution identified by a ROR ID, an investigator by an ORCID iD, and a funder by its registry ID, becomes a checkable assertion rather than a typed paragraph. A repository can read that a dataset’s underlying study was approved and that its MTA permits deposit; a funder can verify that a DPIA exists for a project processing personal data — without anyone reconstructing the paperwork from memory. This is the same dividend that structured grant and disclosure data pay throughout research administration: enter each fact once, read it everywhere.

Where shared vocabulary fits

The terms in this area are easy to misuse and the cost of misuse is real: an IRB and an REC are the same function, not different standards; a controller is not a processor; a DPIA is a specific assessment, not a synonym for “we thought about privacy”. A shared, federated vocabulary that defines these precisely — pointing back to the relevant national regulators and to the Declaration of Helsinki for the ethical baseline rather than inventing its own meanings — is what lets a compliance assertion made in one system be understood and checked in another. Supplying that definitional layer is the role the CASRAI dictionary is designed to play.

What to do now

For researchers: maintain approvals, data-protection roles, and MTA conditions as structured records linked to the project, not as orphaned PDFs. For institutions: build compliance registers that generate funder and repository disclosures from authoritative, identifier-anchored facts rather than from re-keyed forms. For standards work: pin down the precise definitions that separate ethics review levels, controller from processor, and one agreement type from another, federating to the authoritative regulators for the normative content.

Tag: regulatory metadata

Research compliance as structured metadata: IRB/REC, GDPR and MTAs