Tag: data curator

  • Crediting data stewards and curators: recognising RDM professionals

    Behind every well-managed research dataset there is usually a person whose name does not appear on the paper. They are the ones who organised the data so it made sense, wrote the documentation that explains what each variable means, checked it for errors, chose appropriate formats, ensured it was deposited under the right licence, and made it findable and reusable. This is the work of data stewards and curators — demanding, skilled professional labour that turns a heap of files into an asset that can be trusted and reused. Yet because it does not fit the traditional shape of authorship, it is frequently invisible in the scholarly record. This article makes the case for recognising it properly, drawing on the CRediT-extensions domain of the CASRAI Dictionary.

    The work behind FAIR data

    The aspiration that research data should be FAIR — Findable, Accessible, Interoperable and Reusable — is now widely shared, but it is easy to forget that FAIR data is not a natural state. Data does not become findable, well-documented and reusable on its own; someone has to make it so. Achieving each FAIR principle is real work: findability requires good metadata and persistent identifiers; interoperability requires standard formats and vocabularies; reusability requires thorough documentation, clear licensing and quality checking. This is precisely the work data stewards and curators do. They are, in effect, the people who deliver FAIR in practice, translating an admirable principle into actual datasets that other researchers can find and use. Recognising their contribution is therefore not a courtesy; it is acknowledging the people who make one of open science’s central goals achievable at all.

    The recognition gap

    The difficulty is that the reward systems of research were built around a narrower idea of contribution. Recognition has long been anchored in authorship of articles and the metrics derived from them, and someone whose contribution is curating the data rather than writing the paper can find there is no obvious place for them. They may have spent months making a dataset usable, only to be absent from the byline and, at most, thanked vaguely in an acknowledgement. This invisibility has consequences beyond unfairness. It makes data-management careers harder to sustain, because contribution that cannot be pointed to cannot easily support promotion; and it weakens the incentive to do the work well, because diligent curation goes unrewarded while the data that depends on it is taken for granted. A research system that wants FAIR data but does not recognise the people who produce it works against its own aims.

    The CRediT Data curation role

    One of the most direct ways to close this gap already exists within the standard vocabulary of contribution. The CRediT taxonomy includes a role that names this work explicitly: Data curation, defined as management activities to annotate (produce metadata), scrub data and maintain research data — including the software code where needed to interpret the data itself — for initial use and later reuse. That definition is almost a job description for a data steward. By assigning the Data curation role, a contributorship statement records the steward’s or curator’s work in the same structured form used for every other contributor, in the same place readers and evaluators look. The work appears in the formal record as a recognised contribution rather than disappearing into a line of thanks. The broader question of how contribution taxonomies are being adapted and extended for roles like these is the concern of the CRediT-extensions domain, and the principles of who counts as a contributor connect closely to authorship more generally.

    Beyond a single role

    It is worth being honest that a single role does not capture everything a data professional does. Their contribution often spans several activities, and a fair statement may reflect more than one:

    • Data curation for the core work of annotating, cleaning and maintaining the data.
    • Methodology where they helped design how data would be captured and structured.
    • Software where they built tools or scripts to process or document the data.
    • Validation where they verified the integrity and quality of the data and its outputs.

    The point is not to inflate credit but to describe contribution accurately. Data professionals are not a single undifferentiated category; using the appropriate roles, and more than one where warranted, gives a truthful picture of skilled, multifaceted work — which is what honest recognition requires.

    The professionalisation of research data management

    Recognition in individual outputs is part of a larger development: the professionalisation of research data management. Data stewardship is increasingly understood as a profession with its own expertise, training, standards and career structures, rather than a task done in spare moments by whoever is available. Dedicated data-steward and curator roles are appearing in institutions; training and competency frameworks for data professionals are maturing; and the field is acquiring the identity and standing that mark an established profession. This matters because recognition operates at two levels that reinforce each other. Crediting contributions in outputs makes individual work visible; building data management into a recognised profession makes it a viable career. Visible contributions strengthen the case for professional careers, and professional careers ensure there are skilled people to make the contributions. FAIR data depends on both being in place.

    A consistent vocabulary for data work

    For the contributions of data stewards and curators to be recognised consistently — across institutions, repositories, publishers and reporting systems — the way that work is described must mean the same thing everywhere. A Data curation role recorded in one system must be understood identically in another. That consistency is what the CASRAI Dictionary provides: a shared vocabulary so that the professional work of curating and stewarding data is understood and credited the same way wherever it appears. The recognition of data professionals is also a concern of research administration, where contributions, careers and the systems that record them come together. FAIR data is one of open science’s great ambitions; recognising the people who make data FAIR — in the record and in their careers — is how that ambition is sustained.