When to apply When deposits start including model cards, datasheets, evaluation suites, or code-paper-data triples that the existing article schema cannot describe.
Before you start
Prerequisites
What needs to be in place before you operationalise AI and ML research outputs terminology in your CRIS or repository.
- A repository or CRIS that can host non-article record types (DSpace, EPrints, Pure, Symplectic Elements, Worktribe, VIVO)
- Familiarity with JATS, DataCite Metadata Schema 4.x, or Crossref schema for the article side of the triple
- Ability to extend the local metadata profile with custom fields or define a new record type
- Agreement with researchers on a minimum model-card template (Mitchell et al. 2019 is the de facto baseline)
- A persistent-identifier strategy for models and datasets — typically DataCite DOI plus an internal handle
Deployment
Five steps to deploy
Each step is small enough to land in a single sprint or a single sitting with the relevant CRIS administrator. Follow in order.
Define a model-output record type
Stand up a new record type or item-type (DSpace community, Pure custom type, Symplectic record category) distinct from "dataset" and "software", because evaluation provenance, intended use, and out-of-scope warnings have no clean home in either.
Add the model-card and datasheet metadata fields
At minimum: intended_use, out_of_scope_use, training_data_doi, evaluation_data_doi, evaluation_metrics, model_architecture, base_model, parameter_count, training_compute_estimate, ethical_considerations, license. Map each to existing crosswalks (HuggingFace model-card spec, datasheet-for-datasets schema) where possible.
Wire the ingestion pipeline
Configure your deposit form (DSpace submission step, Pure import profile, Symplectic Elements connector) to populate the new fields. If you accept HuggingFace model URLs, parse the model card YAML automatically rather than asking depositors to retype it.
Add validation rules
Require the training/evaluation DOI cross-references when an output is tagged as supervised or fine-tuned; require ethical-considerations free-text when intended_use mentions a regulated domain (health, hiring, lending, justice).
Test with five real records
Ingest a foundation-model release, a fine-tuned domain model, a benchmark suite, a paper-with-model bundle, and an archived weights-only deposit. Verify each emits a valid Crossref / DataCite payload and surfaces the model card as structured fields, not just an attached PDF.
Worked example
Sample workflow
A realistic walk-through of a single record passing through the AI and ML research outputs pipeline once the checklist is in production.
Integration points
CRIS and repository systems
Vendor-specific notes on where this vocabulary fits in real research-information systems. Names appear here only where there is public field evidence — they are not vendor partnerships.
Use a custom entity type via the configurable-entities framework; the DSpace-CRIS extension already ships with software and patent entities that can be cloned as a starting point.
Register a custom research-output type via Pure Admin; the new fields go in a custom metadata template. Pure can ingest from HuggingFace but the connector is local-build territory.
Create a new publication sub-type or extend the Dataset sub-type. The Elements API lets you push records back to ORCID and to local DSpace via the Repository Tools module.
Extend the VIVO-ISF ontology with subclasses of vivo:Dataset and obo:IAO_software for model and benchmark; reuse Schema.org SoftwareApplication where applicable.
Add a new item-type via the EPrints config; the model-card fields go in a custom workflow stage. EPrints Bazaar packages cover some of the DataCite mapping out of the box.
What goes wrong in the field
Common pitfalls
The patterns that show up repeatedly when this checklist is skipped or misapplied. Address these before they become entrenched.
- Treating the model card as a free-form PDF attachment instead of structured, queryable metadata
- Skipping the training-data versus evaluation-data DOI cross-references, breaking the reproducibility audit trail
- Conflating "license of the weights" with "license of the underlying training data" — they are routinely different and both must be captured
- Letting depositors enter intended_use as a single word like "research"; require a usable sentence
- Forgetting to version model records when weights change — a new fine-tune is not a metadata edit
Frequently asked
Implementation FAQ
- Who maintains this checklist?
- The AI and ML research outputs working group maintains the checklist alongside the dictionary terms in the same domain. It is reviewed each release cycle (March and September) and updated when a working-group consultation, a vendor product change, or a federation-partner schema update materially changes the operational guidance.
- What if my CRIS or repository is not listed?
- The integration points listed name the systems CASRAI has direct field experience with — Pure, Symplectic Elements, Worktribe, Converis, DSpace and DSpace-CRIS, EPrints, VIVO, Dataverse, Invenio-RDM. The CERIF mapping in the checklist is vendor-neutral and applies equally to other CRIS or repository products. If your system supports the underlying entities (Person, Project, Output, Funding, plus the domain-specific extensions), the steps transfer.
- How do I validate my implementation?
- Three validation surfaces. First, the deposit form should refuse a record missing required fields rather than warn and accept. Second, the resulting metadata should round-trip through the federation layer your institution uses (OpenAIRE Guidelines 4.0 for European federation, DataCite Commons for DOI-anchored discovery, Crossref for article-anchored discovery) without upstream errors. Third, walk a real-world record through the sample-workflow path on this page and confirm the structured fields capture what the prose describes.
- Where do I report errors in the checklist?
- Open a comment via the dictionary-feedback flow at /dictionary/contribute. Editorial corrections — wrong vendor module names, deprecated standards, broken integration paths — are queued into the next release cycle. Substantive disagreements on the operational guidance are routed to the working group for review and may motivate a checklist revision.
- Is this checklist enough to certify my implementation?
- No. The checklist gives you the operational baseline; certification against federation profiles (CoreTrustSeal, OpenAIRE-compliant, COAR-aligned) is a separate process with its own audit. Treat the checklist as the engineering scaffolding and the certification as the institutional sign-off that the scaffolding is being used.








