Tag: dual-use

  • Responsible release of machine-learning models: weights, licences and access tiers

    A trained machine-learning model has become a genuine research output, on a par with the article that describes it and the dataset it learned from. Yet releasing a model is not the simple act it might appear to be. Behind the casual phrase “we released the model” lies a series of consequential decisions: which artefacts are shared, under what licence, with what conditions, and to whom. A model released without its weights is a description rather than a usable system; one released without a licence leaves users guessing at what they may do; and a powerful one released with no thought to misuse can cause harm its creators never intended. Treating the released model as part of the research-output record means making these decisions deliberately and recording them, a concern that sits squarely within the AI and ML research-outputs domain of the CASRAI Dictionary.

    What is actually being released

    The first question is what a release contains, because “releasing the model” can mean very different things. The model weights — the learned parameters that constitute the trained model — are the heart of the matter: with them, others can run the model directly. Alongside the weights sit the architecture or code needed to load and execute them, and ideally the training and evaluation code, the data documentation and the configuration that would let someone reproduce or adapt the work. A release providing only a description, or only an interface to query the model, withholds the artefact that makes independent use and scrutiny possible. Being explicit about which artefacts are shared is the foundation of an honest release.

    Documenting the release: model cards

    A released model needs to travel with documentation, and the established instrument for this is the model card: a structured document describing what the model is, what it was trained on, how it performs, where it works well and where it does not, its known limitations and risks, and its intended and out-of-scope uses. Platforms such as Hugging Face have made model cards a routine expectation, attached to the model so that anyone obtaining the weights also obtains a clear account of what they are getting. The model card is what turns a bare file of parameters into a documented output, letting a downstream user judge whether the model suits their purpose and use it responsibly rather than blindly.

    Licensing: Responsible AI Licenses

    If model cards say what a model is, licences say what others may do with it. Conventional open-source licences were written for source code and do not always map neatly onto model weights or address the distinctive risks of capable models. In response, the Responsible AI Licenses (RAIL) family emerged, with the OpenRAIL-M variant — developed in the context of the BigScience open-science effort — among the best known. These licences are designed to permit broad access and reuse while attaching behavioural use restrictions: clauses prohibiting specified harmful uses, such as deploying the model to deceive or to discriminate unlawfully. The aim is to keep a model genuinely open and reusable while drawing a line around uses its creators judge unacceptable — a deliberate attempt to reconcile openness with responsibility rather than treating the two as opposites.

    Open, but on whose definition?

    The rise of behaviour-restricting licences has sharpened a question about what “open” means for AI. A licence that forbids certain uses is, by the classic test, not an open-source licence, because open-source definitions reject restrictions on fields of use. The Open Source Initiative has worked to articulate an Open Source AI Definition, setting out what would be required for an AI system to be considered genuinely open — including meaningful access to the components needed to study, use, modify and share it. The result is a productive tension: some releases prioritise unrestricted openness, others responsible-use conditions, and the two cannot always be satisfied at once. For the research record, the important thing is to state the actual licence and its terms, so that no one mistakes a behaviourally restricted release for an unconditionally open one.

    Tiered and gated access

    Between fully open release and keeping a model entirely private lies a spectrum of tiered and gated access arrangements. A model may be gated so that users must agree to terms, identify themselves or be approved before obtaining the weights; access may be staged, with a smaller version released first and broader release following as risks are better understood. These mechanisms are not hostility to openness but a way of matching the degree of access to the degree of risk. The key, again, is that the access conditions are part of what is being released and belong in the record of the output.

    Dual-use and the duty of care

    Underlying all of this is the recognition that capable models can be dual-use: the same system that does great good can, in the wrong hands, cause harm. Responsible release weighs the benefits of openness against the risk of enabling misuse, and chooses licence and access arrangements accordingly. There is no formula; it depends on the model’s capabilities and the plausible harms, and good practice requires only that the judgement be made consciously and documented honestly. Because building and releasing a model is genuine contribution, the work behind it can be described using the CRediT taxonomy alongside the other roles in a project, and the model recorded with the same rigour as any output in the research-administration record.

    A consistent vocabulary for model releases

    For a model release to be understood across repositories, institutions and funders, the elements that describe it — the artefacts shared, the licence and its terms, the access tier, the documentation — must mean the same thing everywhere. That consistency is what the CASRAI Dictionary works towards: a shared vocabulary so that what one platform records about a released model is read the same way by the next. A model is no longer a by-product but a first-class output, and releasing it responsibly is how it earns a trustworthy place in the scholarly record.