Tag: Data Stewardship

  • Research Data Governance: Where DMPs, FAIR and Institutional Policy Meet

    Research data governance is the institution-wide framework of policies, roles and standards that determines how research data is created, stored, protected, shared and retained across its lifecycle — distinct from the project-level task of managing a single dataset. It sits above data management plans (DMPs) and FAIR practice, translating funder and institutional policy into assigned accountability. The most common failure point is not the policy itself but the gap between what a DMP promises and what a principal investigator (PI) or data steward is actually resourced and empowered to deliver.

    Put simply: research data governance is the system of institutional authority, roles and control that determines who is accountable for a dataset at every stage of its life, from collection to eventual disposal or archiving.

    What is research data governance?

    Research data governance establishes the policies, roles and standards dictating how research data is ethically collected, stored, secured and shared, applied at the level of the whole institution rather than a single grant. It differs from research data management in scope: management is what a researcher does with one dataset; governance is how an organisation ensures every dataset is handled consistently and lawfully.

    Andrea Chiarelli’s 2023 analysis for Force11’s Upstream describes this as a shift “from individual projects or datasets to the way the organisation as a whole thinks and operates when it comes to research data.” A 2025 Data Science Journal paper by Odebrecht et al. argues governance requires a “system of cross-organisational” accountability, since ownership, stewardship and compliance obligations rarely sit with one office.

    In practice, governance frameworks typically assign roles across several functions:

    • Senior leadership — sets institutional strategy and secures infrastructure budget.
    • Data stewards or data champions — provide discipline-specific guidance and training.
    • Librarians and information professionals — curate data and advocate for open sharing.
    • Ethics and compliance officers — verify adherence to regulatory and funder requirements.
    • IT and information security teams — manage storage, backup and access control.
    • Principal investigators — remain directly responsible for their project’s data day to day.

    How do data management plans fit into research data governance?

    A data management plan is the project-level instrument; research data governance is the institutional context that shapes it. Governance sets the rules of the road — the DMP is the trip plan for a specific project, describing what data will be generated, how it will be stored, and what happens to it once funding ends. Most UK and EU funders now require a DMP at application stage, per the Digital Curation Centre’s funder-policy overview.

    UKRI’s Guidance on Best Practice in the Management of Research Data (2020) states research data should be “easily discoverable, accessible, assessable, intelligible, useable” — language drawn from the G8 Open Data Charter. That expectation only becomes operational once a governance framework specifies which repository, metadata schema and retention period satisfy it. Without that translation layer, a PI can write a technically compliant DMP the institution has no infrastructure to support.

    Where personal or sensitive data is involved, governance also requires a Data Protection Impact Assessment (DPIA) under UK GDPR before collection begins — a step outside most DMP templates, and frequently where research ethics and governance approval stalls.

    Where do FAIR principles sit in the governance stack?

    The FAIR Guiding Principles — Findable, Accessible, Interoperable and Reusable — were formally published in Scientific Data in 2016 (Wilkinson et al.) and have since become the default technical standard governance frameworks use to operationalise “good data practice.” FAIR is a set of design criteria for datasets; governance is the accountability structure that ensures those criteria are met at scale, not just described in policy.

    A governance policy might mandate persistent identifiers, controlled-vocabulary metadata and an approved repository — the mechanisms that make a dataset FAIR in practice. Funder mandates reinforce this: cOAlition S’s Plan S requires data underlying publications be made available in a FAIR-compliant repository, converting a technical principle into a compliance condition an institution’s governance office must monitor.

    Layer What it governs Primary owner
    Institutional research policy Ownership, retention, ethical boundaries Senior leadership / research office
    Research data governance framework Roles, accountability, infrastructure standards Data governance committee
    FAIR principles Technical findability/reuse criteria for datasets Data stewards, repository managers
    Data management plan Project-specific application of the above Principal investigator

    Where do responsibility gaps appear between data stewards and PIs?

    The most persistent governance failure is not absent policy but an accountability vacuum between those who write institutional standards and those who generate the data day to day. Force11’s Upstream analysis notes “research cultures value autonomy and independence,” making a standardised framework structurally difficult to enforce against individual research groups — a cultural, not merely technical, obstacle.

    The gap tends to open at predictable points:

    • Departure events — what happens to a dataset when a researcher leaves is, per Upstream, “one of the most common difficulties,” since ownership and access rights are rarely settled in advance.
    • Metadata quality — without an assigned data steward, a PI defaults to whatever documentation is fastest, not what is FAIR-reusable.
    • Sensitive data handling — a DPIA is approved at the outset, but ongoing access-control enforcement typically falls back to the PI’s lab, unsupported by IT.
    • Retention beyond project end — a retention period is set, but archiving budget and ownership after a grant closes is frequently unassigned.

    The University of Oxford’s data governance framework addresses this by “establishing roles, definitions, standards and procedures to help keep data accurate and fit for purpose” — an explicit attempt to move responsibility off the individual researcher and onto a named institutional function. Institutions without an equivalent role map leave every gap to default to the PI, regardless of whether they have the time, training or authority to close it.

    Frequently asked questions

    What is data governance in research?

    Data governance in research is the exercise of institutional authority and control over how research data is created, secured, shared and retained, increasing the value of research data while minimising risk, and covering ownership, quality, ethical compliance and long-term stewardship across every supported project.

    What are the four pillars of research data governance?

    Most frameworks converge on four pillars: policy (rules for ownership, access and retention), roles (stewards, ethics officers, IT, PIs), infrastructure (repositories, metadata standards, storage) and compliance monitoring (audits against funder and legal requirements). Each pillar fails independently if the others are absent.

    What are the 5 C’s of data governance?

    The 5 C’s — clear vision, leadership commitment, collaboration, communication and continuous improvement — describe the cultural conditions a governance programme needs to survive contact with autonomous research groups. Without leadership commitment specifically, governance policy tends to remain aspirational rather than enforced.

    Will AI replace research data governance?

    No. AI tools can automate metadata tagging, anomaly detection and compliance checks, but they cannot assign accountability or resolve the ethical judgement calls that research ethics and governance committees make. AI changes the tooling of governance, not the underlying need for named, human-accountable roles.

    Implications for institutions

    For research administrators, the practical implication is that a DMP template or FAIR-compliance checklist is necessary but not sufficient. An institution needs a named governance owner — a research data governance committee or chief data steward function — whose remit spans the full lifecycle, not just the application stage a DMP covers.

    The Royal Society and British Academy’s joint review, Data Management and Use: Governance in the 21st Century, argued data governance should be treated as an organisational capability comparable to financial or ethical governance, not a bolt-on exercise assigned to whichever office has spare capacity. That framing is increasingly reflected in how EARMA, ARMA and INORMS member institutions structure research administration functions, positioning data governance alongside grants management and research integrity rather than beneath IT.

    Conclusion: closing the gap

    Research data governance, DMPs and FAIR practice describe the same problem from three altitudes: institutional accountability, project-level planning, and technical dataset design. The responsibility gaps undermining all three consistently form where policy assigns an outcome — FAIR metadata, secure retention, a departure protocol — without assigning a person. Institutions that name an accountable role for every governance obligation, rather than defaulting to the PI, close that gap before it becomes a compliance failure. For broader context on these roles within the wider research administration function, see CASRAI’s research administration standards resources.

  • FAIR Dataset Mandates Risk Becoming a Checkbox

    A FAIR dataset is one that meets the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016 — but a funder mandate requiring deposit and a data management plan does not, on its own, guarantee this. Genuine FAIR compliance demands rich metadata, persistent identifiers and community-standard formats that most minimally compliant deposits skip entirely, because current incentive structures reward the act of depositing, not the work of curating.

    A FAIR dataset is a digital research object — data or its metadata — that satisfies the Findable, Accessible, Interoperable and Reusable principles first formalised by the FORCE11 community and published in Scientific Data in March 2016. The principles were designed to be applied in degrees, not as a pass/fail gate, which is precisely where funder policy and researcher practice have diverged.

    What does a FAIR dataset actually require?

    The FAIR principles set out four categories of requirement, each broken into specific sub-criteria. They are deliberately conceptual rather than prescriptive, which is a strength for cross-disciplinary adoption and a weakness for enforcement.

    • Findable — data and metadata carry a globally unique, persistent identifier and are indexed in a searchable resource.
    • Accessible — retrieval uses a standardised, open protocol, with metadata remaining accessible even when the underlying data cannot be.
    • Interoperable — data and metadata use a shared, formal language and vocabularies that follow FAIR principles themselves.
    • Reusable — data carry a clear licence, detailed provenance, and conform to domain-relevant community standards.

    The Research Data Alliance’s FAIR Data Maturity Model, published in 2020, decomposes these four principles into 41 discrete indicators covering both data and metadata. That granularity matters: a dataset can satisfy some indicators and fail most others while still being described, loosely, as “FAIR.” A funder checking only for repository deposit is verifying perhaps one or two of the 41.

    Why do funder mandates default to minimal compliance?

    Funder FAIR requirements typically operationalise as two things: a submitted data management plan and a deposit in a recognised repository at the end of the project. Neither step audits metadata richness, vocabulary use, or licensing clarity. The result is a policy that is easy to comply with and easy to satisfy without producing a dataset anyone outside the original team could actually reuse.

    Three structural gaps explain why:

    • Resourcing. Science Europe’s funders’ briefing on data management planning recommends that compliant curation cost roughly 5% of total research budget — a figure rarely built into grant awards, leaving curation as unfunded overhead.
    • Recognition. Data curation is not weighted in hiring, promotion or tenure decisions in most institutions, so time spent enriching metadata competes directly with time spent on publications that do count.
    • Standards gaps. Many disciplines still lack the domain-relevant community vocabularies that Interoperability and Reusability depend on, so even willing depositors have nothing FAIR-compliant to conform to.

    Horizon Europe requires that all data produced under the programme be FAIR “by default,” which is the strongest funder-level statement of intent currently in force. Yet the European Commission’s own guidance materials acknowledge that FAIRness is a spectrum, not a binary condition — an admission that sits uneasily alongside a compliance model built around a single deposit checkpoint.

    The maturity gap: from “FAIR start” to genuine reusability

    The European Commission’s Joint Research Centre published FAIR Data Guidelines in 2025 that organise the RDA’s 41 indicators into five progressive maturity levels. The framework is useful precisely because it makes visible how far “minimally compliant” sits from “genuinely reusable.”

    Maturity level What it requires
    FAIR start Published in a catalogue with mandatory metadata; data itself is not structured for machine reuse.
    FAIR play Links added between datasets and related resources, with enriched provenance and cross-referencing.
    FAIR go Data structured to community standards, with defined terminologies (not necessarily machine-readable).
    FAIR share Machine-readable data models (JSON Schema, XML Schema, SHACL) with richly documented provenance.
    FAIRest of them all Machine-readable model endorsed by the domain community; terms exposed via shared FAIR vocabularies.

    Most mandate-driven deposits land at “FAIR start” — indexed, licensed, discoverable, but not structured for genuine machine or cross-team reuse. The JRC guidelines are explicit that not every dataset needs the top tier, but they are equally explicit that FAIRness can degrade over time if metadata and platforms are not actively maintained. A one-off deposit satisfying a funder’s closeout requirement is not maintenance; it is a snapshot.

    Rebuilding incentives for genuine data stewardship

    Treating FAIR as a compliance checkbox is a governance failure, not a researcher failure. Three changes would shift the incentive structure toward genuine stewardship rather than deposit-and-forget behaviour.

    1. Credit the labour. CASRAI originated the CRediT contributor role taxonomy in 2014, and the standard is now stewarded by NISO as ANSI/NISO Z39.104-2022. “Data curation” is one of its fourteen roles, offering institutions an existing, citable mechanism to formally recognise stewardship work in author contribution statements — a mechanism that remains inconsistently applied in promotion and tenure review.
    2. Fund it explicitly. Grant budgets should ring-fence curation costs at the level Science Europe’s own guidance recommends, rather than treating data management plans as an unfunded compliance document.
    3. Audit maturity, not deposit. Funders and institutions should reference maturity models such as the RDA’s 41 indicators or the JRC’s five-level scale in closeout review, rather than accepting repository deposit as sufficient evidence of FAIR compliance.

    FAIR is also not a complete governance answer on its own. The CARE Principles for Indigenous Data Governance, released by the Global Indigenous Data Alliance in 2019, extend the framework to cover collective benefit, authority to control, responsibility and ethics — dimensions that a pure findability-and-format checklist does not touch. Institutions building data policy around FAIR alone are optimising for machine reuse while leaving governance and consent questions unaddressed.

    Frequently asked questions

    What is a FAIR dataset?

    A FAIR dataset satisfies the Findable, Accessible, Interoperable and Reusable principles published in Scientific Data in 2016. It carries a persistent identifier, standardised access, shared vocabularies, and clear licensing and provenance — not merely a repository listing.

    What does FAIR stand for with data?

    FAIR stands for Findable, Accessible, Interoperable and Reusable. The acronym describes a framework for data stewardship, not a certification; the Research Data Alliance breaks it into 41 measurable indicators rather than a single pass condition.

    What does FAIR stand for in data management?

    In data management, FAIR describes the target state a data management plan should work toward: identifiers, rich metadata, open protocols and community-standard formats. It guides curation decisions throughout a project, not just the final deposit.

    Why does FAIR data matter?

    FAIR data matters because it lets both humans and machines discover, verify and reuse research outputs without contacting the original authors. Poorly curated “FAIR” deposits undermine reproducibility and waste the public investment funders intended the mandate to protect.

    Implications and outlook

    Funder FAIR mandates have succeeded in one respect: deposit rates have risen sharply since 2016. They have not, on current evidence, produced datasets that are reliably machine-actionable or cross-team reusable at scale. That gap will not close through stricter wording in policy documents; it requires funders to resource curation at realistic cost, institutions to credit it in career progression via mechanisms such as CRediT’s Data curation role, and disciplines to finish building the community standards that Interoperability depends on. Until those three conditions are met, “FAIR by default” will remain a policy aspiration rather than a description of the average deposited dataset.

  • Professionalising Data Stewardship: Training, Roles, and Institutional Integration

    1. Introduction to the Role of Data Stewards in Scholarly Infrastructure

    In the contemporary landscape of global science, open research practices, and institutional data governance, establishing robust standards is crucial. The integration of Data Stewards represents a landmark advancement in addressing long-standing hurdles in scholarly communication, administrative reporting, and metadata curation. This extensive guide provides an expert-level breakdown of the operational frameworks, specifications, and systemic requirements surrounding Data Stewards in 2026.

    As academic funders and research ministries worldwide enforce increasingly rigid compliance pathways, universities must transition from ad-hoc administrative workflows to unified, persistent-identifier-driven schemas. Implementing Data Stewards is not merely a technical adjustment; it is a strategic necessity that secures institutional research visibility, ensures frictionless metadata reporting, and compounds the impact of scientific investments.

    2. Technical Architecture and Core Specifications

    Underpinning the deployment of Data Stewards is a set of rigorous, machine-actionable specifications designed to operate seamlessly across diverse platforms. This environment relies heavily on the core responsibilities of professional Data Stewards in supporting researchers across the data lifecycle. By establishing clear, standardized data exchange layers, organizations can bypass the siloed architectures that have traditionally plagued research information networks.

    A key focus of these specifications is the preservation of structural metadata integrity. This is achieved by mapping data payloads to recognized open vocabularies, such as Dublin Core, Schema.org, and custom JSON-LD graphs. This ensures that every scientific output—be it a journal article, a software version, or an administrative record—carries citable provenance tags, enabling automated indexing and cross-referencing by global citation engines such as OpenAlex and Crossref.

    3. Institutional Challenges, Workflows, and Solutions

    While the administrative and scientific benefits of Data Stewards are indisputable, the practical deployment across universities and libraries reveals significant hurdles. Major friction points include integrating data stewards within departments, drafting curricula, and aligning with national open-science training. Faculty reluctance, legacy software limitations (such as outdated CRIS databases), and the high administrative cost of manual curation represent substantial barriers to widespread compliance.

    Overcoming these implementation bottlenecks requires a systemic, top-down commitment to administrative automation. Institutions must deploy modern API middleware to coordinate data transfers between local enclaves and global public registries, eliminating manual data-entry redundancy. Furthermore, university promotion and tenure committees must update their evaluative rubrics to formally credit researchers for complying with these modern curation workflows, establishing a cultural positive-feedback loop.

    4. Technical Evaluation and Integration Matrix

    Integration Domain Primary Objective Core Interoperability Standard Friction Mitigation Strategy
    Persistent Identification Ensure permanent, citable links across registries. Unique URI / DOI Resolve Systems Implement automated metadata harvesting on ingest.
    Metadata Exchange Frictionless transfer between CRIS and repositories. JSON-LD / XML Schema Mapping Deploy standardized REST APIs with OAuth 2.0.
    Compliance Auditing Track, verify, and report on policy adherence. Standardized SQL / GraphQL Querying Generate real-time compliance scorecards for PIs.

    5. Five-Step Institutional Implementation Roadmap

    • Step 1: Institutional Alignment & Sign-off — Establish an official cross-departmental committee representing the library, IT services, and the research office to draft the institutional deployment charter for Data Stewards.
    • Step 2: API & Schema Mapping — Audit existing repository databases and map local metadata schemas to match the international JSON-LD specifications required for Data Stewards.
    • Step 3: Middleware Integration & SSO — Configure enterprise middleware layers to handle automated data harvesting and synchronize access using Single Sign-On (SAML/Shibboleth).
    • Step 4: Training & Support Networks — Deploy interactive workshops, dedicated helpdesks, and online documentation to educate researchers, metadata curators, and administrative staff.
    • Step 5: Automated Verification & Auditing — Launch real-time validation checks and annual data-quality audits to measure compliance rates and automatically identify and correct orphaned records.
  • Applying the DCC Curation Lifecycle Model: A Guide to Professional Data Stewardship

    1. Introduction to the Role of DCC Curation Lifecycle Model in Scholarly Infrastructure

    In the contemporary landscape of global science, open research practices, and institutional data governance, establishing robust standards is crucial. The integration of DCC Curation Lifecycle Model represents a landmark advancement in addressing long-standing hurdles in scholarly communication, administrative reporting, and metadata curation. This extensive guide provides an expert-level breakdown of the operational frameworks, specifications, and systemic requirements surrounding DCC Curation Lifecycle Model in 2026.

    As academic funders and research ministries worldwide enforce increasingly rigid compliance pathways, universities must transition from ad-hoc administrative workflows to unified, persistent-identifier-driven schemas. Implementing DCC Curation Lifecycle Model is not merely a technical adjustment; it is a strategic necessity that secures institutional research visibility, ensures frictionless metadata reporting, and compounds the impact of scientific investments.

    2. Technical Architecture and Core Specifications

    Underpinning the deployment of DCC Curation Lifecycle Model is a set of rigorous, machine-actionable specifications designed to operate seamlessly across diverse platforms. This environment relies heavily on the circular structure of the DCC Curation Lifecycle Model, covering conceptualisation, curation, preservation, and transformation. By establishing clear, standardized data exchange layers, organizations can bypass the siloed architectures that have traditionally plagued research information networks.

    A key focus of these specifications is the preservation of structural metadata integrity. This is achieved by mapping data payloads to recognized open vocabularies, such as Dublin Core, Schema.org, and custom JSON-LD graphs. This ensures that every scientific output—be it a journal article, a software version, or an administrative record—carries citable provenance tags, enabling automated indexing and cross-referencing by global citation engines such as OpenAlex and Crossref.

    3. Institutional Challenges, Workflows, and Solutions

    While the administrative and scientific benefits of DCC Curation Lifecycle Model are indisputable, the practical deployment across universities and libraries reveals significant hurdles. Major friction points include executing high-quality data curation checks (file translation, format verification, schema mapping) and long-term appraisal. Faculty reluctance, legacy software limitations (such as outdated CRIS databases), and the high administrative cost of manual curation represent substantial barriers to widespread compliance.

    Overcoming these implementation bottlenecks requires a systemic, top-down commitment to administrative automation. Institutions must deploy modern API middleware to coordinate data transfers between local enclaves and global public registries, eliminating manual data-entry redundancy. Furthermore, university promotion and tenure committees must update their evaluative rubrics to formally credit researchers for complying with these modern curation workflows, establishing a cultural positive-feedback loop.

    4. Technical Evaluation and Integration Matrix

    Integration Domain Primary Objective Core Interoperability Standard Friction Mitigation Strategy
    Persistent Identification Ensure permanent, citable links across registries. Unique URI / DOI Resolve Systems Implement automated metadata harvesting on ingest.
    Metadata Exchange Frictionless transfer between CRIS and repositories. JSON-LD / XML Schema Mapping Deploy standardized REST APIs with OAuth 2.0.
    Compliance Auditing Track, verify, and report on policy adherence. Standardized SQL / GraphQL Querying Generate real-time compliance scorecards for PIs.

    5. Five-Step Institutional Implementation Roadmap

    • Step 1: Institutional Alignment & Sign-off — Establish an official cross-departmental committee representing the library, IT services, and the research office to draft the institutional deployment charter for DCC Curation Lifecycle Model.
    • Step 2: API & Schema Mapping — Audit existing repository databases and map local metadata schemas to match the international JSON-LD specifications required for DCC Curation Lifecycle Model.
    • Step 3: Middleware Integration & SSO — Configure enterprise middleware layers to handle automated data harvesting and synchronize access using Single Sign-On (SAML/Shibboleth).
    • Step 4: Training & Support Networks — Deploy interactive workshops, dedicated helpdesks, and online documentation to educate researchers, metadata curators, and administrative staff.
    • Step 5: Automated Verification & Auditing — Launch real-time validation checks and annual data-quality audits to measure compliance rates and automatically identify and correct orphaned records.