Where should research data for Chemistry & Materials Science be deposited?

Research data generated in Chemistry & Materials Science projects should ideally be deposited in approved repositories, such as: PubChem, ChemSpider, Zenodo, Figshare, and directory servers mapped in Reaxys & CAS SciFinder.

NIH Data Management Plan (DMP) Guide for…

Q: What are the data sharing requirements for NIH grants?

Under National Institutes of Health (NIH) guidelines: To secure funding from **National Institutes of Health (NIH)** in **Chemistry & Materials Science**, PIs must upload a detailed Data Management Plan (DMP) directly into the **eRA Commons** system. In compliance with open-data statutes, all validated research outputs must be deposited in open repositories by the time results are published.

1. Funder Policy & Open Data Compliance

In alignment with international open-science mandates, National Institutes of Health requires all principal investigators to submit a comprehensive Data Management Plan (DMP) with their grant application. A robust DMP details how research data will be collected, processed, documented, stored, shared, and preserved both during and after the project.

Funder-Specific Mandate Directive

To secure funding from **National Institutes of Health (NIH)** in **Chemistry & Materials Science**, PIs must upload a detailed Data Management Plan (DMP) directly into the **eRA Commons** system. In compliance with open-data statutes, all validated research outputs must be deposited in open repositories by the time results are published.

Verified Funder Open-Science Portfolio

Based on independent, open-science bibliometric data from OpenAlex, the National Institutes of Health (NIH) oversees a massive scholarly ecosystem with over 1,762,091 published research outputs under their funding catalog, accumulating over 106,474,500 citations across the global scientific record. To protect the public's investment in this massive knowledge corpus, the funder strictly enforces FAIR data management and open repository deposits, making compliance with this DMP protocol mandatory for all awarded grants.

For projects in the field of Chemistry & Materials Science, managing data correctly is essential not only for compliance, but also to support peer-review validation and reproducibility. All DMPs must be submitted through the eRA Commons portal, using standard institutional guidelines.

2. Data Types, Formats, and Metadata Standards

A high-quality DMP must explicitly identify the types of data that will be generated and specify open, non-proprietary file formats to ensure long-term usability. For Chemistry & Materials Science, datasets typically range from raw observational measurements to curated computational models.

Simulations and code-based datasets for **Chemistry & Materials Science** must be packaged alongside the exact processing scripts and execution environments. Documenting this containerized configuration guarantees that **NIH** audits can reconstruct each analysis step.

To guarantee discoverability, datasets should be documented using standardised metadata schemas that map to the Chemical Actions and Uses branch of scholarly vocabularies. This ensures indexers and crawlers can crawl and identify research outputs accurately.

DMP Component	Custom Target Value for Chemistry & Materials Science
Preferred File Formats	CIF (crystallographic details), JCAMP-DX (spectral data), XML (chemical profiles), CSV (reactions)
Metadata Schema Standard	ChEMBL schema, Dublin Core, IUPAC standards
Target Scientific Repositories	PubChem, ChemSpider, Zenodo, Figshare, and directory servers mapped in Reaxys & CAS SciFinder

3. Step-by-Step DMP Construction Protocol

When preparing your DMP for a NIH proposal, structure your document around these core sections:

Data Collection and Generation:
Describe the methodology, instrumentation, or software used to collect or generate new data. Detail quality assurance and quality control measures implemented at your facility.
Documentation and Metadata:
Explain how the data will be documented, including accompanying read-me files, data dictionaries, and laboratory notebooks. Specify the metadata standards to be utilized (using ChEMBL schema, Dublin Core, IUPAC standards as standard).
Ethics, Intellectual Property, and Consent:
Address how sensitive or confidential datasets will be handled. Detail anonymisation processes, access controls, and compliance with institutional ethics boards.
Storage, Backups, and Security:
State where data will be stored during active research. Detail automated backup schedules, server redundancies, and access authorisation protocols.
Long-Term Preservation and Archiving:
Select the digital repository for post-project archiving (such as PubChem, ChemSpider, Zenodo, Figshare, and directory servers mapped in Reaxys & CAS SciFinder). Confirm that the repository supports persistent identifiers (handles/DOIs) and provides secure preservation.

Open Science Workflows, Data Curation & Repositories

When drafting a data management plan dmp to satisfy NIH guidelines, defining systematic data collection methods and formal data curation standards is vital. Utilizing institutional dmptool workflows ensures that these administrative requirements are built-in from the outset of the study. Investigators must outline procedures for post-collection data cleaning, strict audits of data integrity, and programmatic data wrangling to transform raw outputs into clean models. Furthermore, a descriptive data dictionary must be provided to define the database schema. Architecturally, teams can configure either a secure relational data warehouse or a cost-effective cloud-based data lake, evaluating how this data lake vs data warehouse setup supports formal data analysis and immediate exploratory data analysis under NIH guidelines. Upon completion, data will be submitted to the dryad data repository, published as figshare datasets, or preserved via a zenodo data upload to be registered in the global data citation index and satisfy nsf data management plan guidelines and regional NIH open-science rules. The study will document clear data versioning protocols hosted on the open science framework osf to enable reproducible data sharing matching top fair data principles examples. Furthermore, any community-engaged data must respect the care data principles and support indigenous data sovereignty care standards to ensure local governance of shared knowledge under NIH audits. Aligning the archiving schedule directly with NIH open-access metrics protects the project's funding cycles.

4. Frequently Asked Questions

Are we required to share all raw data from our research?

No, NIH policies generally recognise that some data cannot be shared publicly due to privacy, security, intellectual property, or commercialisation constraints. In such cases, your DMP must justify why certain datasets are restricted and describe how metadata will still be made discoverable.

Who owns the research data generated under this grant?

Data ownership is typically held by the host institution, subject to co-ownership clauses in collaborative projects. However, NIH guidelines require that data be made as openly available as possible under open licensing, such as Creative Commons or Open Data Commons.