Skip to main content
v2026.1714 entries · CC-BY 4.0

Implementation checklistTrack C

Implementing the AI and ML research outputs vocabulary

Repository managers, journal editorial offices, and CRIS administrators ingesting machine-learning artefacts alongside articles.

When to apply When deposits start including model cards, datasheets, evaluation suites, or code-paper-data triples that the existing article schema cannot describe.

Before you start

Prerequisites

What needs to be in place before you operationalise AI and ML research outputs terminology in your CRIS or repository.

  • A repository or CRIS that can host non-article record types (DSpace, EPrints, Pure, Symplectic Elements, Worktribe, VIVO)
  • Familiarity with JATS, DataCite Metadata Schema 4.x, or Crossref schema for the article side of the triple
  • Ability to extend the local metadata profile with custom fields or define a new record type
  • Agreement with researchers on a minimum model-card template (Mitchell et al. 2019 is the de facto baseline)
  • A persistent-identifier strategy for models and datasets — typically DataCite DOI plus an internal handle

Deployment

Five steps to deploy

Each step is small enough to land in a single sprint or a single sitting with the relevant CRIS administrator. Follow in order.

  1. Define a model-output record type

    Stand up a new record type or item-type (DSpace community, Pure custom type, Symplectic record category) distinct from "dataset" and "software", because evaluation provenance, intended use, and out-of-scope warnings have no clean home in either.

  2. Add the model-card and datasheet metadata fields

    At minimum: intended_use, out_of_scope_use, training_data_doi, evaluation_data_doi, evaluation_metrics, model_architecture, base_model, parameter_count, training_compute_estimate, ethical_considerations, license. Map each to existing crosswalks (HuggingFace model-card spec, datasheet-for-datasets schema) where possible.

  3. Wire the ingestion pipeline

    Configure your deposit form (DSpace submission step, Pure import profile, Symplectic Elements connector) to populate the new fields. If you accept HuggingFace model URLs, parse the model card YAML automatically rather than asking depositors to retype it.

  4. Add validation rules

    Require the training/evaluation DOI cross-references when an output is tagged as supervised or fine-tuned; require ethical-considerations free-text when intended_use mentions a regulated domain (health, hiring, lending, justice).

  5. Test with five real records

    Ingest a foundation-model release, a fine-tuned domain model, a benchmark suite, a paper-with-model bundle, and an archived weights-only deposit. Verify each emits a valid Crossref / DataCite payload and surfaces the model card as structured fields, not just an attached PDF.

Worked example

Sample workflow

A realistic walk-through of a single record passing through the AI and ML research outputs pipeline once the checklist is in production.

A computer-science group submits a clinical-prediction model trained on a hospital dataset. The depositor pastes the HuggingFace repo URL into the new model-output form; the connector parses the model card YAML and pre-fills intended_use, evaluation_metrics, base_model, and license. The hospital dataset DOI is entered as training_data_doi and triggers a cross-reference check that confirms the dataset record exists and is access-controlled. Because the intended_use field mentions "diagnosis", the validation layer forces the ethical_considerations field to be non-empty before the record can be approved. After staff curation, the record mints a DataCite DOI, surfaces as a DefinedTerm-style structured entity (not a PDF attachment) on the institutional repository, and posts a Crossref relationship to the linked paper so that DOI-resolvers carry the model–paper–data triple to downstream indexers like OpenAIRE and PubMed.

Integration points

CRIS and repository systems

Vendor-specific notes on where this vocabulary fits in real research-information systems. Names appear here only where there is public field evidence — they are not vendor partnerships.

DSpace 8.x and DSpace-CRIS

Use a custom entity type via the configurable-entities framework; the DSpace-CRIS extension already ships with software and patent entities that can be cloned as a starting point.

Pure (Elsevier)

Register a custom research-output type via Pure Admin; the new fields go in a custom metadata template. Pure can ingest from HuggingFace but the connector is local-build territory.

Symplectic Elements

Create a new publication sub-type or extend the Dataset sub-type. The Elements API lets you push records back to ORCID and to local DSpace via the Repository Tools module.

VIVO

Extend the VIVO-ISF ontology with subclasses of vivo:Dataset and obo:IAO_software for model and benchmark; reuse Schema.org SoftwareApplication where applicable.

EPrints

Add a new item-type via the EPrints config; the model-card fields go in a custom workflow stage. EPrints Bazaar packages cover some of the DataCite mapping out of the box.

What goes wrong in the field

Common pitfalls

The patterns that show up repeatedly when this checklist is skipped or misapplied. Address these before they become entrenched.

  • Treating the model card as a free-form PDF attachment instead of structured, queryable metadata
  • Skipping the training-data versus evaluation-data DOI cross-references, breaking the reproducibility audit trail
  • Conflating "license of the weights" with "license of the underlying training data" — they are routinely different and both must be captured
  • Letting depositors enter intended_use as a single word like "research"; require a usable sentence
  • Forgetting to version model records when weights change — a new fine-tune is not a metadata edit

Frequently asked

Implementation FAQ

Who maintains this checklist?
The AI and ML research outputs working group maintains the checklist alongside the dictionary terms in the same domain. It is reviewed each release cycle (March and September) and updated when a working-group consultation, a vendor product change, or a federation-partner schema update materially changes the operational guidance.
What if my CRIS or repository is not listed?
The integration points listed name the systems CASRAI has direct field experience with — Pure, Symplectic Elements, Worktribe, Converis, DSpace and DSpace-CRIS, EPrints, VIVO, Dataverse, Invenio-RDM. The CERIF mapping in the checklist is vendor-neutral and applies equally to other CRIS or repository products. If your system supports the underlying entities (Person, Project, Output, Funding, plus the domain-specific extensions), the steps transfer.
How do I validate my implementation?
Three validation surfaces. First, the deposit form should refuse a record missing required fields rather than warn and accept. Second, the resulting metadata should round-trip through the federation layer your institution uses (OpenAIRE Guidelines 4.0 for European federation, DataCite Commons for DOI-anchored discovery, Crossref for article-anchored discovery) without upstream errors. Third, walk a real-world record through the sample-workflow path on this page and confirm the structured fields capture what the prose describes.
Where do I report errors in the checklist?
Open a comment via the dictionary-feedback flow at /dictionary/contribute. Editorial corrections — wrong vendor module names, deprecated standards, broken integration paths — are queued into the next release cycle. Substantive disagreements on the operational guidance are routed to the working group for review and may motivate a checklist revision.
Is this checklist enough to certify my implementation?
No. The checklist gives you the operational baseline; certification against federation profiles (CoreTrustSeal, OpenAIRE-compliant, COAR-aligned) is a separate process with its own audit. Treat the checklist as the engineering scaffolding and the certification as the institutional sign-off that the scaffolding is being used.

Adopted by research universities worldwide

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • Massachusetts Institute of Technology logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo

View CASRAI adoption →