CC0 for data means dedicating a dataset to the public domain with no attribution requirement, while CC-BY permits free reuse conditional on credit — and for structured databases, neither Creative Commons tool may be the legally correct choice. Under the FAIR Data Principles, a licence is only “Findable, Accessible, Interoperable, Reusable” if it imposes minimal friction on machine and human reuse; CC0 is the tool most repositories recommend by default, CC-BY is acceptable where attribution norms are strong, and bespoke institutional terms are usually a liability, not a safeguard.
CC0 (Creative Commons Zero) is a public domain dedication published by Creative Commons that waives copyright and related rights “to the fullest extent permitted by law”, allowing copying, modification, and commercial reuse without permission or credit.
- Why a data licence matters for FAIR reuse
- CC0 vs CC-BY: what actually differs
- Why custom institutional terms usually backfire
- Databases are a special case: ODC-By and ODbL
- A decision framework for choosing a licence
- Answer-first Q&A
Why a data licence matters for FAIR reuse
The FAIR Data Principles — Findable, Accessible, Interoperable, Reusable — treat licensing as a core reusability criterion, not an afterthought. A dataset can be technically accessible and still fail FAIR if its licence is ambiguous, restrictive, or silent on reuse conditions.
Without an explicit licence, the default legal position in most jurisdictions is “all rights reserved”, deterring reuse even when the depositor intended openness. Data repositories such as Dryad require a clear waiver precisely to remove this ambiguity.
- Findability is unaffected by licence choice, but reusability collapses without one.
- Interoperability depends on whether the licence allows combination with other datasets under different terms.
- Reusability is maximised when the licence imposes the fewest conditions consistent with the depositor’s actual requirements.
CC0 vs CC-BY: what actually differs
CC0 removes all conditions, including attribution; CC-BY keeps commercial and derivative reuse rights but makes crediting the source a licence condition rather than a courtesy. The practical consequences are larger for data than for text or images.
| Aspect | CC0 | CC-BY 4.0 |
|---|---|---|
| Attribution required | No (legally); expected as scholarly norm | Yes, legally enforceable |
| Commercial reuse | Permitted | Permitted |
| Combining with other datasets | Frictionless | Can trigger “attribution stacking” |
| Recommended by | Dryad, GBIF, most genomics/biodiversity repositories | European Commission for some research data categories |
| Applies cleanly to non-copyrightable facts | Yes — designed for this case | Ambiguous; CC-BY presumes a copyright interest may not exist in raw data |
The CESSDA Data Management Expert Guide notes that CC0 prevents attribution stacking — the compounding burden of citing every upstream source when a new dataset merges dozens of others. This is the strongest technical argument for CC0 over CC-BY in aggregated or long-tail scientific data. Dryad’s data-services team has explained that CC0 was “crafted specifically to reduce any legal and technical impediments… to the reuse of data” — a rationale FAIR later formalised as a reusability requirement.
Does attribution disappear entirely under CC0?
No. CC0 removes the legal obligation to cite, but citation remains a scholarly and professional norm enforced through peer review, journal policy, and disciplinary ethics rather than licence terms. Most researchers continue citing CC0 datasets exactly as they would any other source, because academic integrity — not copyright law — is what drives the practice.
Why custom institutional terms usually backfire
Some institutions draft bespoke data-sharing agreements instead of adopting a standard licence, adding restrictions such as “non-commercial use only” or “notify us before reuse”. This creates three recurring problems.
- Machine unreadability: standard CC and Open Data Commons licences carry machine-readable metadata that repositories, indexers, and rights-clearance tools recognise automatically; bespoke legal text does not.
- Interoperability failure: a custom clause requiring prior notification or a specific attribution format is often legally incompatible with the standard licences used by the other datasets a researcher wants to combine it with.
- Enforcement uncertainty: institutions rarely have the resources to monitor or enforce bespoke terms, so the restriction deters legitimate reuse without stopping the misuse it was meant to prevent.
The University of California’s Office of Scholarly Communication has argued that CC-BY is “not always a good fit” for data, since its legal machinery was designed for copyrightable creative works rather than mixed factual content — and a custom clause layered on top compounds that mismatch rather than resolving it.
Databases are a special case: ODC-By and ODbL
Raw facts are generally not copyrightable, but a database’s structure can attract separate rights, including the EU’s sui generis database right. This is a genuine gap in most CC0-vs-CC-BY explainers: Creative Commons licences were not written for database rights, and the Open Knowledge Foundation’s Open Data Commons suite exists specifically to cover them.
- ODC-By (Open Data Commons Attribution License): permits copying, distribution, and commercial use of a database with attribution — the database-rights equivalent of CC-BY.
- ODbL (Open Database License): adds a share-alike condition, so derived databases must carry the same licence — the database-rights equivalent of CC-BY-SA.
- CC0 can still be applied to a database to waive both copyright and any sui generis database right simultaneously, which is why several major repositories default to it rather than layering ODC-By on top.
Joint guidance from Kehl University of Applied Sciences and IP specialists Maucher Jenkins explicitly separates content, software, and databases into three categories, rather than treating “data licensing” as one undifferentiated choice — a distinction most generic CC0-vs-CC-BY articles omit.
A decision framework for choosing a licence
Choosing correctly requires matching the licence to the data type and the reuse goal, not defaulting to whichever licence a template happens to include.
- Default to CC0 for raw observational data, measurements, or any dataset likely to be combined with others — this is the position taken by repositories including Dryad and GBIF and referenced in OpenAIRE’s data-sharing guidance.
- Use CC-BY where the deposited content includes substantial original creative or analytical framing (for example, a curated data paper’s narrative sections) and attribution is central to the scholarly reward system.
- Use ODC-By or ODbL where the artefact is genuinely a structured database and jurisdiction-specific database rights are a live concern, particularly for depositors working under EU law.
- Avoid bespoke terms unless a named legal, ethical, or funder requirement (such as personal or sensitive data restrictions) makes a standard open licence genuinely unsuitable — and even then, prefer a recognised restricted-access framework over ad hoc legal drafting.
Whichever licence is chosen, it must be declared unambiguously in the dataset’s metadata and in any accompanying data paper, since automated harvesters and data repository platforms increasingly reject or flag submissions with missing or non-standard licence fields.
Answer-first Q&A
Is CC0 free for commercial use?
Yes. CC0 places a work in the public domain, so there is no restriction on commercial exploitation, modification, or redistribution. Any user — including a company building a commercial product — may use CC0 data without seeking permission, paying a fee, or providing credit, though citing the source remains good scholarly practice.
Are CC0 and public domain the same?
Not exactly. The Public Domain Mark is an informational label applied when a work is already believed to be out of copyright, while CC0 is an active legal waiver used by a rightsholder to voluntarily place their own work in the public domain. CC0 changes legal status; the Public Domain Mark only describes an existing one.
Do I have to cite CC0 data?
Legally, no — CC0 imposes no attribution requirement. In practice, researchers should still cite the original dataset because academic norms, journal policies, and reproducibility standards expect source attribution regardless of what the licence legally mandates.
Can data be subject to copyright?
Raw facts generally cannot be copyrighted, but a database’s original selection, arrangement, or structure can attract copyright or, in the EU, a separate sui generis database right. This is precisely why database-specific licences such as ODC-By and ODbL exist alongside Creative Commons tools.
Implications for repositories and institutions
Repositories that mandate CC0 by default see fewer downstream reuse disputes and cleaner automated harvesting, because ambiguity is removed at the point of deposit. Institutions drafting data-management plans should specify the licence at policy level rather than per-project, and funders increasingly expect this decision documented, not left as “to be determined”.
Looking ahead
As FAIR compliance becomes a formal funder and publisher requirement rather than a voluntary aspiration, licence choice will keep moving from an afterthought to a mandatory, auditable field in data-management plans. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022 — a reminder that clear, jointly governed standards, rather than bespoke institutional terms, are what let research infrastructure scale across disciplines and borders.








