Definition · Plain-language

Dataverse

Dataverse is an open-source web application designed to share, preserve, cite, and analyse research data, developed and led by Harvard University.

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

A Distributed Network of Repositories

Unlike centralised repositories, Dataverse is a distributed system. Any university or research institution can download the open-source software and host their own instance. This allows institutions to retain physical control over their storage while participating in a global network, as all Dataverse installations use standardised APIs that make data searchable across platforms. This distributed architecture encourages collaboration and data sharing between institutions. By hosting their own instances, universities can customise storage policies and access controls to comply with local regulations, while still contributing to a global repository of scientific data, facilitating discovery, academic transparency, and collaborative research. This cooperative ecosystem ensures that research data remains accessible, searchable, and secure across international borders.

Dataverse and Datasets Hierarchy

The software employs a nested structure: a Dataverse is a container that can hold other Dataverses or Datasets. A Dataset consists of metadata describing the data, along with the actual data files and documentation. This flexible hierarchy allows departments, journals, and individual researchers to organise their collections according to their own administrative structures. For example, a university library can create a top-level Dataverse, with sub-Dataverses for specific departments and further subdivisions for individual labs. This logical structure makes it easy for researchers to manage their data and for external users to search and browse collections efficiently. This hierarchical organisation simplifies data retrieval, ensuring that complex scientific records remain orderly and easily navigable.

Advanced Metadata and Citation Standards

Dataverse places a strong emphasis on data quality and citation. It automatically extracts metadata from tabular data files and supports international schemas like DDI, Dublin Core, and Schema.org. When data is published, the platform generates a formal citation containing a DOI, authorship details, and a deposit date, ensuring researchers receive academic recognition for their data sharing. These detailed metadata records make datasets easier to find and cite, helping institutions track research impact. By supporting domain-specific metadata, Dataverse ensures that datasets remain understandable and reusable by other scientists, promoting the long-term reproducibility of modern scientific research. This adherence to standard citation metrics encourages transparency and supports the global transition toward open-access scientific inquiry.

Key facts

At a glance

Dataverse is an open-source repository software application developed by Harvard's IQSS.
It operates as a distributed network of repositories hosted by institutions worldwide.
The software provides automatic data citation, complete with a permanent DOI.
It supports complex, domain-specific metadata schemas like DDI and Dublin Core.
Harvard Dataverse is a large, free public installation open to researchers worldwide.

Common misconceptions

What people often get wrong

Often heard: Dataverse is a single, centralised website where all data must be uploaded.

Actually: Dataverse is an open-source software framework; while Harvard hosts a major public instance, many other institutions run their own separate installations.

Often heard: Uploading to Dataverse is only for quantitative, tabular data.

Actually: Dataverse accepts any file format, including qualitative transcripts, images, code scripts, and audio recordings, alongside quantitative tables.

Often heard: Dataverse repositories perform formal scientific review of all uploaded data.

Actually: The software provides curation workflows (such as draft stages and administrator reviews), but does not verify the scientific accuracy or validity of the research data.

Common questions

FAQ

What is the difference between Harvard Dataverse and Dataverse software?+

Dataverse is the open-source software application developed by Harvard. The Harvard Dataverse is one specific public installation of this software, hosted by Harvard and open to researchers globally.

Does Dataverse support restricted access to datasets?+

Yes. Dataverse allows researchers to restrict access to specific files within a dataset, requiring users to request access or agree to terms of use, which is helpful for sensitive data.

Going deeper

Related CASRAI guidance

Zenodo →Dimensions AI →Covidence →What is a Preprint? →