Where should research data for Computer Science & AI be deposited?

Research data generated in Computer Science & AI projects should ideally be deposited in approved repositories, such as: Zenodo (integrated with GitHub), Software Heritage, Figshare, and directory servers mapped in IEEE Xplore, ACM Digital Library & arXiv.

DFG Data Management Plan (DMP) Guide for Computer…

Q: What are the data sharing requirements for DFG grants?

Under Deutsche Forschungsgemeinschaft (German Research Foundation) (DFG) guidelines: Under guidelines set by the **Deutsche Forschungsgemeinschaft (German Research Foundation) (DFG)**, a formal DMP must be compiled and submitted for the **Computer Science & AI** project by Month 6. Research data must follow European open-science protocols, complying with the core doctrine of being "as open as possible, as closed as necessary" to secure proprietary discoveries.

1. Funder Policy & Open Data Compliance

In alignment with international open-science mandates, Deutsche Forschungsgemeinschaft (German Research Foundation) requires all principal investigators to submit a comprehensive Data Management Plan (DMP) with their grant application. A robust DMP details how research data will be collected, processed, documented, stored, shared, and preserved both during and after the project.

Funder-Specific Mandate Directive

Under guidelines set by the **Deutsche Forschungsgemeinschaft (German Research Foundation) (DFG)**, a formal DMP must be compiled and submitted for the **Computer Science & AI** project by Month 6. Research data must follow European open-science protocols, complying with the core doctrine of being "as open as possible, as closed as necessary" to secure proprietary discoveries.

Verified Funder Open-Science Portfolio

Based on independent, open-science bibliometric data from OpenAlex, the Deutsche Forschungsgemeinschaft (German Research Foundation) (DFG) oversees a massive scholarly ecosystem with over 729,972 published research outputs under their funding catalog, accumulating over 25,912,901 citations across the global scientific record. To protect the public's investment in this massive knowledge corpus, the funder strictly enforces FAIR data management and open repository deposits, making compliance with this DMP protocol mandatory for all awarded grants.

For projects in the field of Computer Science & AI, managing data correctly is essential not only for compliance, but also to support peer-review validation and reproducibility. All DMPs must be submitted through the elan Portal portal, using standard institutional guidelines.

2. Data Types, Formats, and Metadata Standards

A high-quality DMP must explicitly identify the types of data that will be generated and specify open, non-proprietary file formats to ensure long-term usability. For Computer Science & AI, datasets typically range from raw observational measurements to curated computational models.

Computational pipelines in **Computer Science & AI** require raw sequence file storage alongside the exact containerized alignment code (Docker/Singularity) and statistical models to ensure full pipeline replication for **DFG** audits.

To guarantee discoverability, datasets should be documented using standardised metadata schemas that map to the Mathematical Concepts branch of scholarly vocabularies. This ensures indexers and crawlers can crawl and identify research outputs accurately.

DMP Component	Custom Target Value for Computer Science & AI
Preferred File Formats	Python/R scripts (.py, .R), JSON (hyperparameters), PT/ONNX (neural weights), CSV (benchmarks)
Metadata Schema Standard	CodeMeta schema, Schema.org definitions, Software Ontology (SWO)
Target Scientific Repositories	Zenodo (integrated with GitHub), Software Heritage, Figshare, and directory servers mapped in IEEE Xplore, ACM Digital Library & arXiv

3. Step-by-Step DMP Construction Protocol

When preparing your DMP for a DFG proposal, structure your document around these core sections:

Data Collection and Generation:
Describe the methodology, instrumentation, or software used to collect or generate new data. Detail quality assurance and quality control measures implemented at your facility.
Documentation and Metadata:
Explain how the data will be documented, including accompanying read-me files, data dictionaries, and laboratory notebooks. Specify the metadata standards to be utilized (using CodeMeta schema, Schema.org definitions, Software Ontology (SWO) as standard).
Ethics, Intellectual Property, and Consent:
Address how sensitive or confidential datasets will be handled. Detail anonymisation processes, access controls, and compliance with institutional ethics boards.
Storage, Backups, and Security:
State where data will be stored during active research. Detail automated backup schedules, server redundancies, and access authorisation protocols.
Long-Term Preservation and Archiving:
Select the digital repository for post-project archiving (such as Zenodo (integrated with GitHub), Software Heritage, Figshare, and directory servers mapped in IEEE Xplore, ACM Digital Library & arXiv). Confirm that the repository supports persistent identifiers (handles/DOIs) and provides secure preservation.

Open Science Workflows, Data Curation & Repositories

Establishing a robust data management plan dmp for Computer Science & AI requires outlining rigorous data collection methods alongside established data curation standards from day one. PIs can leverage structured dmptool workflows to coordinate these data frameworks for review by Deutsche Forschungsgemeinschaft (German Research Foundation). Adhering to DFG requirements means detailing how raw files undergo data cleaning, how researchers verify ongoing data integrity, and which tools handle automated data wrangling. Additionally, a standardized data dictionary must be compiled to guarantee metadata clarity. For active storage, the proposal compares a relational data warehouse schema against an unstructured data lake model, reviewing the functional benefits of a data lake vs data warehouse environment for general data analysis and initial exploratory data analysis of study outputs. To ensure permanent access, datasets will be deposited in the dryad data repository, hosted as figshare datasets, or archived via a secure zenodo data upload, enabling inclusion in the data citation index and fulfilling standard nsf data management plan and local DFG requirements. To support replication, we will establish strict data versioning protocols on the open science framework osf to guide reproducible data sharing that follows fair data principles examples. When working with sensitive community records, the project will strictly observe the care data principles and indigenous data sovereignty care guidelines to guarantee ethical data stewardship in accordance with DFG rules. This explicit lifecycle structure meets the standard pre-requisites issued under DFG project management guidelines.

4. Frequently Asked Questions

Are we required to share all raw data from our research?

No, DFG policies generally recognise that some data cannot be shared publicly due to privacy, security, intellectual property, or commercialisation constraints. In such cases, your DMP must justify why certain datasets are restricted and describe how metadata will still be made discoverable.

Who owns the research data generated under this grant?

Data ownership is typically held by the host institution, subject to co-ownership clauses in collaborative projects. However, DFG guidelines require that data be made as openly available as possible under open licensing, such as Creative Commons or Open Data Commons.