Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI

Guide

Version control

Version control is a systematic approach to tracking, recording, and managing changes to files, datasets, and research code, allowing teams to collaborate and reverse modifications.

CASRAI research-methods explainer — Version control

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

The importance of version control in research

In research workflows, version control replaces the chaotic naming of files (such as draft_final_v2_final) with a structured history of changes. Every edit is saved with a log message explaining what changed and why. This ensures that researchers can review their decision-making process months or years later. It forms the backbone of open science, allowing others to verify how data and manuscripts evolved over time. By maintaining a complete audit trail, researchers can demonstrate the integrity of their work, making it easier to resolve discrepancies, collaborate with external co-authors, and satisfy journal replication requirements and institutional policies. This capability transforms version control from a mere storage solution into a critical foundation for open and reproducible science.

Centralised versus distributed systems

Version control systems are either centralised or distributed. Centralised systems rely on a single server to store all files, while distributed systems, like Git, copy the entire repository and history to every collaborator’s local machine. This distributed model prevents a single point of failure and allows researchers to work offline. Hosting platforms like GitHub, GitLab, and Bitbucket build on this by offering remote collaboration, code hosting, and issue tracking. Researchers use these platforms to share their computational notebooks, scripts, and documentation openly, fostering collaborative development and peer review in the scientific community. By mastering these hosting environments, research groups can streamline their collaborative workflows and maintain code repositories that survive project transitions.

Integrating version control with writing and data

Beyond software code, version control is increasingly used for research writing and data management. Tools like Overleaf integrate Git tracking for LaTeX documents, allowing authors to merge changes smoothly. In data science, tracking modifications to datasets via systems like DVC (Data Version Control) helps ensure that statistical analyses are rerun on exact historical versions of data, preventing reproducibility errors. This integration ensures that any change in the data pipeline is linked to the corresponding analytical output, forming a cohesive, reproducible workflow that can be easily audited by external reviewers and collaborators. This integration bridges the gap between raw data collection and final publication, ensuring that every published figure remains fully verifiable.

Key facts

At a glance

  • Tracks file changes chronologically with details on who modified them
  • Prevents data loss by enabling researchers to revert to previous versions
  • Facilitates collaborative writing and coding through branching and merging
  • Git is the dominant distributed version control system in modern research
  • Supports transparency and reproducibility in open science workflows
  • Replaces unstructured naming schemes with a secure, linear edit history

Common misconceptions

What people often get wrong

Often heard: Version control is only useful for computer programmers writing code.

Actually: Version control is highly beneficial for writing research papers, managing structured datasets, and tracking changes to configuration files in any collaborative project.

Often heard: Using cloud storage like Google Drive is a complete version control system.

Actually: While cloud drives track file history, they lack structured commit messages, branching/merging features, and the detailed change-tracking capabilities of systems like Git.

Common questions

FAQ

What is the difference between Git and GitHub?+

Git is the local open-source software tool that tracks changes and version history on your computer. GitHub is a web-based hosting service that stores Git repositories online, adding tools for collaboration, issue tracking, and code sharing.

What is a merge conflict and how is it resolved?+

A merge conflict occurs when two collaborators make changes to the same line of a file in different ways, and the version control system cannot automatically reconcile them. Researchers must open the file and manually select which change to keep.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →