Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI
Data Governance & Open Science

DMP Guide: CNPq for Linguistics & Cognitive Language

Learn how to design a fully compliant Data Management Plan (DMP) that satisfies Conselho Nacional de Desenvolvimento Científico e Tecnológico open-data policies. Explore optimal file formats, metadata mapping, and repository selection for Linguistics & Cognitive Language research data.

1. Funder Policy & Open Data Compliance

In alignment with international open-science mandates, Conselho Nacional de Desenvolvimento Científico e Tecnológico requires all principal investigators to submit a comprehensive Data Management Plan (DMP) with their grant application. A robust DMP details how research data will be collected, processed, documented, stored, shared, and preserved both during and after the project.

Funder-Specific Mandate Directive

The open-access mandate from **Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)** expects PIs to index their **Linguistics & Cognitive Language** findings in public repositories that support persistent citation IDs. Plans must be formulated and uploaded through the **Plataforma Lattes** interface.

Verified Funder Open-Science Portfolio

Based on independent, open-science bibliometric data from OpenAlex, the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) oversees a massive scholarly ecosystem with over 472,207 published research outputs under their funding catalog, accumulating over 9,674,359 citations across the global scientific record. To protect the public's investment in this massive knowledge corpus, the funder strictly enforces FAIR data management and open repository deposits, making compliance with this DMP protocol mandatory for all awarded grants.

For projects in the field of Linguistics & Cognitive Language, managing data correctly is essential not only for compliance, but also to support peer-review validation and reproducibility. All DMPs must be submitted through the Plataforma Lattes portal, using standard institutional guidelines.

2. Data Types, Formats, and Metadata Standards

A high-quality DMP must explicitly identify the types of data that will be generated and specify open, non-proprietary file formats to ensure long-term usability. For Linguistics & Cognitive Language, datasets typically range from raw observational measurements to curated computational models.

For qualitative and archival files in **Linguistics & Cognitive Language**, data plans focus on digitised materials, text corpora, and spreadsheets. To ensure durability, the DMP mandates saving all documents in non-proprietary formats, satisfying standard **CNPq** digital preservation criteria.

To guarantee discoverability, datasets should be documented using standardised metadata schemas that map to the Language branch of scholarly vocabularies. This ensures indexers and crawlers can crawl and identify research outputs accurately.

DMP ComponentCustom Target Value for Linguistics & Cognitive Language
Preferred File FormatsWAV (audio phonetics), TextGrid (Praat annotations), XML (lexical corpus), TXT (transcripts)
Metadata Schema StandardOLAC (Open Language Archives Community), Dublin Core
Target Scientific RepositoriesTLA (The Language Archive), CLARIN, Zenodo, and directory servers mapped in LLBA (Linguistics and Language Behavior Abstracts)

3. Step-by-Step DMP Construction Protocol

When preparing your DMP for a CNPq proposal, structure your document around these core sections:

  1. Data Collection and Generation:
    Describe the methodology, instrumentation, or software used to collect or generate new data. Detail quality assurance and quality control measures implemented at your facility.
  2. Documentation and Metadata:
    Explain how the data will be documented, including accompanying read-me files, data dictionaries, and laboratory notebooks. Specify the metadata standards to be utilized (using OLAC (Open Language Archives Community), Dublin Core as standard).
  3. Ethics, Intellectual Property, and Consent:
    Address how sensitive or confidential datasets will be handled. Detail anonymisation processes, access controls, and compliance with institutional ethics boards.
  4. Storage, Backups, and Security:
    State where data will be stored during active research. Detail automated backup schedules, server redundancies, and access authorisation protocols.
  5. Long-Term Preservation and Archiving:
    Select the digital repository for post-project archiving (such as TLA (The Language Archive), CLARIN, Zenodo, and directory servers mapped in LLBA (Linguistics and Language Behavior Abstracts)). Confirm that the repository supports persistent identifiers (handles/DOIs) and provides secure preservation.

Open Science Workflows, Data Curation & Repositories

When drafting a data management plan dmp to satisfy CNPq guidelines, defining systematic data collection methods and formal data curation standards is vital. Utilizing institutional dmptool workflows ensures that these administrative requirements are built-in from the outset of the study. This includes describing protocols for data cleaning, validating data integrity via checksums, and conducting secure data wrangling on raw source files. Each output dataset must be documented with an explanatory data dictionary mapping key metadata fields. Architecturally, teams can configure either a secure relational data warehouse or a cost-effective cloud-based data lake, evaluating how this data lake vs data warehouse setup supports formal data analysis and immediate exploratory data analysis under CNPq guidelines. PIs will facilitate public sharing by leveraging the dryad data repository, creating searchable figshare datasets, or completing a zenodo data upload, ensuring tracking through the data citation index in compliance with nsf data management plan protocols and Conselho Nacional de Desenvolvimento Científico e Tecnológico targets. The study will document clear data versioning protocols hosted on the open science framework osf to enable reproducible data sharing matching top fair data principles examples. Furthermore, any community-engaged data must respect the care data principles and support indigenous data sovereignty care standards to ensure local governance of shared knowledge under CNPq audits. This explicit lifecycle structure meets the standard pre-requisites issued under CNPq project management guidelines.

4. Frequently Asked Questions

Are we required to share all raw data from our research?

No, CNPq policies generally recognise that some data cannot be shared publicly due to privacy, security, intellectual property, or commercialisation constraints. In such cases, your DMP must justify why certain datasets are restricted and describe how metadata will still be made discoverable.

Who owns the research data generated under this grant?

Data ownership is typically held by the host institution, subject to co-ownership clauses in collaborative projects. However, CNPq guidelines require that data be made as openly available as possible under open licensing, such as Creative Commons or Open Data Commons.

DMP Specifications

Funding BodyCNPq (Brazil)
Submission ToolPlataforma Lattes
ROR Funder ID03swz6y49
Crossref Funder ID501100003593
Discipline FocusLinguistics & Cognitive Language
Target Index DBLLBA (Linguistics and Language Behavior Abstracts)

FAIR Principles

Your plan must align with the FAIR Principles:

  • Findable: Rich metadata and persistent DOIs.
  • Accessible: Free retrieval via standard protocols.
  • Interoperable: Open formats and vocabulary alignment (such as OLAC (Open Language Archives Community), Dublin Core).
  • Reusable: Clear data licensing and reuse guidelines.

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →