Tag: data custodian vs data steward

  • Research Data Management Policy: €10.2bn Case

    A research data management policy that treats FAIR compliance as a line-item cost, rather than a reuse and reputation asset, is the wrong accounting model. PwC estimated in a 2018 study for the European Commission that the absence of FAIR (Findable, Accessible, Interoperable, Reusable) research data costs the European economy at least €10.2 billion a year, largely through duplicated data collection and wasted researcher time. That figure is the strongest evidence available that under-investment in research data management (RDM) infrastructure is a false economy, not a saving.

    A research data management policy is an institutional document setting out the responsibilities of researchers and the institution for planning, storing, securing, sharing and preserving research data across its lifecycle. Most UK universities — Southampton, Birmingham, Manchester, Edinburgh and others — already publish one. The argument here is narrower and more contentious: most are drafted, funded and governed as compliance paperwork, when the evidence says they should be funded as reuse and reputation infrastructure.

    Why RDM policy gets treated as a cost centre

    Institutional budgets typically classify research data management as overhead: storage costs, repository subscriptions, a data steward’s salary, training time. Each appears as a debit with no offsetting credit line, because savings from avoided duplication and faster reuse accrue diffusely, across future researchers and grants, not to the budget holder who paid for the infrastructure.

    This accounting mismatch is compounded by how the data management plan (DMP) requirement is handled in practice. Most funders now mandate one, but research offices frequently treat it as a box-ticking exercise completed at proposal stage and never revisited, rather than a live operational document. That framing under-serves the researcher, who gets no practical reuse benefit, and the institution, which under-recovers the true cost of good RDM from grants that would pay for it.

    UK Research and Innovation (UKRI) explicitly states that costs associated with research data management — storage, curation, repository deposit — are eligible for recovery under its funding. Institutions treating RDM as unfunded overhead are frequently leaving recoverable grant money unclaimed rather than avoiding a cost.

    What the evidence actually says about FAIR and avoided cost

    The FAIR data principles were formalised in 2016 by Wilkinson et al. in Scientific Data as a guide for making digital assets Findable, Accessible, Interoperable and Reusable by both humans and machines. FAIR data is not a compliance checkbox; it is a design standard for making data usable by someone who was not present when it was collected.

    The clearest attributed cost estimate comes from PwC’s 2018 cost-benefit analysis for the European Commission, which put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven by researcher time lost searching for data, recreation of data that already exists, and lost interdisciplinary reuse. A separate, frequently cited illustration is the University of Minnesota’s decades-long diet study, whose original data nearly disappeared into storage before being recovered and reanalysed — a reminder that data loss is a recurring, avoidable event when retention and documentation are afterthoughts.

    Three mechanisms explain where the savings actually come from:

    • Avoided duplication. Findable, well-described data lets a second researcher build on an existing dataset instead of re-running a costly collection exercise.
    • Faster reuse cycles. Interoperable data in standard formats with persistent identifiers can be integrated into new analyses without reformatting or re-negotiating access.
    • Preserved institutional memory. Deposit in a certified repository protects data against the single most common loss vector: staff turnover and undocumented local storage.

    None of this shows up as a saving on a university’s annual accounts, which is precisely why RDM investment is chronically under-prioritised relative to its documented return.

    How funder compliance requirements are changing the calculus

    Funder mandates are steadily converting FAIR data from voluntary good practice into a hard compliance gate, which changes the institutional risk calculus even for leaders unconvinced by the reuse argument. UKRI’s Common Principles on Research Data, and the underlying Concordat on Open Research Data, require a data management plan for funded research and state that data should be made openly available with as few restrictions as necessary. Horizon Europe applies comparable requirements, and cOAlition S’s Plan S pushes the same expectations into journal-level open-access policy.

    A comparison of how three major funders frame the requirement illustrates the convergence:

    Funder / framework Core RDM requirement FAIR reference
    UKRI Data management plan for funded research; RDM costs eligible for recovery Endorses FAIR via the Concordat on Open Research Data
    Horizon Europe DMP required within six months of project start, updated across lifecycle “As open as possible, as closed as necessary,” explicitly FAIR-aligned
    cOAlition S (Plan S) Underlying data should accompany open-access publications References FAIR principles for supporting data

    Institutions that fund RDM only to the minimum needed for a single grant’s DMP template are exposed twice: to duplicated administrative cost when infrastructure is rebuilt project by project, and to compliance risk as funders move toward auditing DMP adherence rather than merely requiring its submission.

    The case for investing in data stewardship, not just policy text

    A policy document alone does not create FAIR data. That requires people: a data steward function — a dedicated role, a network of disciplinary data champions, or a research data service embedded in the library — able to advise researchers on repository choice, metadata standards and licensing at the point where those decisions are actually made, not after the fact.

    Institutions that fund this role tend to route researchers toward standards-based infrastructure rather than ad hoc local storage: a research data repository registered in re3data.org, ideally holding Core Trust Seal certification, with persistent identifiers (DOIs) and standard metadata attached to every deposit. This is the practical, unglamorous mechanism by which the €10.2 billion estimate above is actually avoided — not through a policy PDF, but through a person and a repository that make FAIR operational.

    CASRAI’s relevance here is provenance and interoperability, not ownership. CASRAI originated the CRediT contributor role taxonomy in 2014, now stewarded by NISO as ANSI/NISO Z39.104-2022 — the same underlying argument in a different domain: standardising who-did-what reduces duplicated verification effort just as standardising data description reduces duplicated data collection. Institutions weighing their research administration infrastructure should treat RDM policy, contributor attribution and open data reuse as one reputational and efficiency system, not separate obligations.

    Answer-first Q&A

    What is a research data management policy?

    A research data management policy is an institutional document defining responsibilities for planning, storing, securing, sharing, and archiving research data across its lifecycle. UK universities including Edinburgh and Manchester publish theirs publicly, typically requiring a data management plan at proposal stage and deposit in an approved repository after project completion.

    What are the FAIR data principles?

    The FAIR data principles — Findable, Accessible, Interoperable, Reusable — were published by Wilkinson et al. in 2016 in Scientific Data as guidance for making digital research assets usable by both humans and machines, through persistent identifiers, standard metadata, and clear licensing.

    Do UK and EU funders require a data management plan?

    Yes. UKRI requires a data management plan for funded research and treats RDM costs as eligible for recovery, while Horizon Europe requires a DMP within six months of project start under its “as open as possible, as closed as necessary” principle.

    How much does poor research data management actually cost?

    PwC’s 2018 analysis for the European Commission put the annual cost of non-FAIR research data to the European economy at €10.2 billion, driven primarily by duplicated data collection and researcher time lost searching for data that already exists elsewhere.

    Implications for institutional leaders

    The practical implication is a reframing exercise, not necessarily a large new budget line. Research offices should cost RDM infrastructure — repositories, data steward time, metadata training — against the funder-eligible recovery already available through DMP-linked grants, rather than absorbing it as unfunded overhead. Leaders reviewing their research data management policy should ask whether it funds a data steward with real authority over repository choice and metadata quality, or whether it is a document that satisfies a compliance checklist and stops there.

    The evidence — a €10.2 billion EU-wide cost estimate, UKRI’s funding eligibility for RDM costs, and Horizon Europe’s escalating DMP requirements — points one direction: institutions that keep treating FAIR compliance as a cost centre are choosing to keep paying the duplication tax FAIR data was designed to eliminate.

  • Research Data Steward Job Description and Skills

    A research data steward is the named individual within a university, institute, or funded project who takes operational responsibility for the quality, FAIR compliance, documentation, and lifecycle management of a defined set of research datasets — distinct from the data owner, who holds accountability and sign-off authority, and the data custodian, who runs the technical storage infrastructure. The role sits inside the institutional research data management (RDM) team, typically reporting through the research office or library, and exists specifically because generic corporate data-steward job descriptions do not map cleanly onto grant-funded, multi-investigator, publicly scrutinised research data.

    Corporate data stewardship (the model most job-description templates online describe) is built around commercial master data, customer records, and regulatory compliance such as GDPR. Research data stewardship is built around a different set of pressures: funder-mandated Data Management Plans (DMPs), the FAIR Guiding Principles, discipline-specific repositories, and long-term reuse by researchers who were not part of the original project. This article defines the research-specific version of the role, maps it against the data owner and data custodian, and shows exactly where it sits in an institutional RDM structure.

    What Does a Research Data Steward Do?

    A research data steward manages the day-to-day quality, description, and reuse-readiness of research datasets on behalf of a principal investigator, department, or institutional repository. The role is operational, not accountable: a data steward implements policy, while a data owner sets it.

    Core duties typically include:

    • Reviewing datasets against the FAIR Guiding Principles — Findable, Accessible, Interoperable, Reusable — before deposit in a repository.
    • Writing and maintaining metadata, codebooks, and data dictionaries so a dataset is comprehensible to someone outside the original research team.
    • Advising researchers on Data Management Plan (DMP) compliance during grant applications and at project milestones.
    • Coordinating with disciplinary or institutional repositories on deposit, embargo periods, and licence selection.
    • Liaising with the data custodian (IT/systems) on storage, backup, and access-control implementation.
    • Flagging data quality issues — missing consent documentation, inconsistent variable coding, broken file formats — before they reach publication or reuse.

    UKRI’s Concordat on Open Research Data (2016) states that institutions are expected to have “clearly assigned responsibilities for the management of research data,” which is the direct policy basis most UK universities cite when creating dedicated data steward posts inside RDM or library services.

    Research Data Steward vs Data Owner vs Data Custodian

    These three roles are frequently conflated in generic data-governance content, but in a research setting they map to distinct, complementary functions. The data owner holds accountability; the data steward holds operational responsibility; the data custodian holds technical infrastructure responsibility.

    Role Primary focus in RDM Typical post-holder Accountable for
    Data owner Accountability and sign-off Principal Investigator or Head of Department Decisions on access, sharing, and retention of a specific dataset
    Data steward Operational quality and FAIR compliance Research data steward / RDM officer, often in the library or research office Metadata, documentation, DMP compliance, deposit readiness
    Data custodian Technical storage and access control Research IT / systems administrator Backup, encryption, storage infrastructure, access provisioning

    A single dataset can pass through all three roles: the PI (owner) approves that a dataset can be shared, the data steward prepares it to FAIR standard and selects the repository and licence, and the data custodian executes the technical transfer and sets the access permissions.

    What Skills and Qualifications Does the Role Require?

    Research data stewards need a blend of technical data-management skills and subject-domain fluency that generic corporate data-steward job descriptions rarely specify. Institutions increasingly treat this as a distinct career pathway rather than an IT-adjacent generalist role.

    • Working knowledge of the FAIR principles and metadata standards (Dublin Core, DDI, discipline-specific schemas).
    • Familiarity with persistent identifier infrastructure — DOIs assigned via DataCite, and researcher identifiers via ORCID — for correctly attributing and citing datasets.
    • Understanding of funder DMP requirements, including Horizon Europe’s and cOAlition S’s expectation that funded research data be FAIR by default.
    • Basic data-cleaning and documentation skills (spreadsheet/database literacy, controlled vocabularies, version control).
    • Communication skills sufficient to negotiate data-sharing terms between researchers, ethics committees, and repository managers.

    Professional bodies including ARMA (Association of Research Managers and Administrators) and INORMS now track research data stewardship as a recognised strand within the broader research-administration career pathway, reflecting its growing separation from generic corporate data governance.

    How Does This Differ from the CRediT “Data Curation” Role?

    The ANSI/NISO Z39.104-2022 CRediT taxonomy — originated by CASRAI in 2014 and now stewarded by NISO — includes “Data Curation” as one of fourteen contributor roles credited on a published paper. This is a per-publication authorship credit, not a job title or institutional post. A research data steward, by contrast, is an ongoing operational role that may perform data-curation work across many projects and papers, only some of which will formally credit them under the CRediT taxonomy. Conflating the two is a common error in job-description drafting.

    Where Does the Role Sit in the Institutional RDM Team?

    Research data stewards typically sit within one of three institutional homes: the library/research-data-services team, the central research office, or a departmental/faculty RDM function. Reporting lines vary, but the steward almost always works across, not inside, individual research groups.

    • Library-based model: data steward reports into research data services alongside repository managers and scholarly-communications staff — common where the institution treats RDM as an extension of open-access infrastructure.
    • Research-office model: data steward sits alongside grants and ethics administrators, closer to the DMP-compliance and funder-reporting workflow.
    • Departmental model: larger science faculties sometimes embed a data steward within a department, working directly with PIs on discipline-specific formats and repositories.

    In all three models, the data steward reports functionally to institutional data governance policy (set by data owners at PI or departmental-head level) while collaborating operationally with IT-based data custodians on infrastructure. The four core stewardship areas identified in institutional data-governance models — operational oversight, data quality, privacy/security/risk management, and policies and procedures — apply directly to this reporting structure.

    Answer-First Q&A

    What skills do you need to be a data steward?

    A data steward needs both technical and business-facing skills: metadata and data-modelling literacy, familiarity with data-quality tooling, and strong communication skills to translate governance policy into day-to-day research practice. In a research context, this also requires knowledge of FAIR principles, funder DMP requirements, and discipline-specific repository standards.

    What are the four main roles of an effective data stewardship model?

    An effective stewardship model groups responsibilities into four areas: operational oversight, data quality, privacy, security and risk management, and policies and procedures. Research data stewards typically own operational oversight and data quality directly, while collaborating with data owners and custodians on the remaining two areas.

    What makes a good data steward?

    A good data steward combines subject-domain credibility with disciplined documentation habits — able to identify data-quality problems early, communicate clearly with both researchers and technical staff, and apply governance rules consistently. In research settings, respect from the researcher community is essential, since the steward has no direct authority over the data owner.

    What is another title for a data steward?

    Common alternative titles include research data manager, data curator, RDM officer, and domain data steward. Institutions vary in naming, but the underlying responsibilities — FAIR compliance, metadata quality, and DMP support — remain consistent across these titles.

    Implications for Research Institutions

    As funders including UKRI, Horizon Europe, and cOAlition S tighten FAIR data requirements within grant conditions, institutions without a clearly defined research data steward role risk inconsistent DMP compliance and poor dataset discoverability after project closure. Writing a job description that borrows directly from generic corporate data-governance templates will under-specify the FAIR, DMP, and repository-liaison duties that make the research variant of the role effective.

    Institutions building or revising this post should draft the job description around the three-way split set out above — owner accountability, steward operations, custodian infrastructure — rather than treating “data steward” as a single undifferentiated data-governance title.