Research Data Management Glossary

The goal of this glossary is to gather all the key terms needed for collaborators to share a common understanding of the Research Data Management domain.

Our glossaries are intended as living documents and we actively welcome comment and suggestions for improvements.


AccessThe continued, available for use, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for. Users who have access can retrieve, manipulate, copy, and store copies on a wide range of hard drives and external devices.

Access control listA list used to grant permission matched against credentials.

Access controlsGiven a data object name, access controls define access relationships between the following metadata: data object name, a user name (or user group, or user role), and access permission. The information can be stored as metadata information associated with each data object. The information can be generated dynamically by applying the access controls of the collection that organizes the data objects.

Access workflowA type of access entity that contains the services and functions which make the data object holdings and their information content and related services visible to data consumers.

Accession numberNumbers used by the National Center for Biotechnology Information (NCBI) that are unique and citable.

Analogue materialsNon-digital materials that have a physical presence (e.g., written and printed material).

Analogue signalsContinuous electronic signals.

Analytical quality controlProcesses and procedures designed to ensure that the results of laboratory analysis are consistent, comparable, accurate and within specified limits of precision.

AnalyticsThe discovery of meaningful multidimensional patterns in data.

AnomalyA rule, practice, or observation that is different from what is normal or usual.

AnonymityA form of privacy that is not usually needed or wanted. There are occasions, however, when a user may want anonymity (for example, to report a crime). The need is sometimes met through the use of a site, called a remailer that re-posts a message from its own address, thus disguising the originator of the message.

Application Vulnerability Description LanguageAn XML definition for exchange of information relating to security vulnerabilities of applications exposed to networks.

Applied scienceThe application of existing scientific and professional knowledge to develop practical applications in a scientific field (e.g., actuarial science, agriculture, biology, chemistry, forestry, meteorology, physics, planetary and earth sciences), scientific regulation, or patent.

ArchitectureFundamental organization of a system embodied in its components, their relationships to each other and to the environment, and the principles guiding its design and evolution. The term is not always used in normative or prescriptive ways. In some cases, the architecture may need to be flexible and thus more of an open framework rather than being a fixed set of components and services equal to everyone.

ArchiveA place or collection containing static records, documents, or other materials for long-term preservation.

ArchivingA curation activity that ensures that data are properly selected, stored, and can be accessed, and for which logical and physical integrity are maintained over time, including security and authenticity.

At-risk dataData that are at risk of being lost. At-risk data include data that are not easily accessible, have been dispersed, have been separated from the research output object, are stored on a medium that is obsolete or at risk of deterioration, data that were not recorded in digital form, and digital data that are available but are not useable because they have been detached from supporting data, metadata, and information needed to use and interpret them intelligently.

AuditAn independent evaluation of an organization, system, process, project or product.

AuthenticationThe process of confirming the identity of a principal entity.

Authenticity metadataA type of metadata that conveys information needed to link a data object to its original source.


Behavioural competenciesObservable and measurable knowledge, skills, abilities or personal characteristics needed to achieve performance output or outcome needs.

Best practiceA technique or methodology that, through experience and research, has proven to reliably lead to a desired result.

Big dataAn evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that have the potential to be mined for information.

Bit SequenceA representation of digital content in an assembly of the fundamental unit of digital bits.

Bit StreamAn unstructured sequence of bits that is identified as a unit (e.g., bits in a communication transmission). It may be stored as a unit or may exist as a pattern and be generated. A digital object may be represented as a bit stream of finite length that encodes its informational content.

Black boxAny system or service whose inner workings are not understood by or accessible to its user.

BlueprintA design for a framework that can be re-used and re-purposed by applying minor changes that do not require changing the underlying design principles.

Born digitalDigital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form.

Boundary valueA data value that corresponds to a minimum or maximum input or output value specified for a system or component.

BugA coding error in a computer program which causes the program to perform in an unintended or unanticipated manner.


Canonical data collectionA data collection that has been normalized by some established criteria to allow for effective data management. Examples include: data files that belong to a certain experiment, all files that are created by one specific simulation, all files that belong to a specific observation (same day, same place, etc.).

CatalogueA type of collection that describes, and points to features of another collection.

CataloguingAn intellectual process of describing objects in accordance with accepted library principles, particularly those of subject and classification order.

CausationThe capacity of one variable to influence another. The first variable may bring the second into existence or may cause the incidence of the second variable to fluctuate.

Certified productA product that has been inspected, evaluated, tested, or otherwise determined to be in conformance or compliance with applicable or specified provisions of referenced standards, codes, or other requirements and certified by an authority which is recognized or has the legal power to grant such certification. Certified products imply a guarantee or warranty of product conformance and that the product is under the test and surveillance procedures of a specified certification system.

Change logTracks the progress of each change from submission through review, approval, implementation and closure. The log can be managed manually by using a document or spreadsheet, or it can be managed automatically with a software or Web-based tool.

Change managementA systematic approach to dealing with change, both from the perspective of an organization and on the individual level.

ChecksumA checksum is a type of metadata and an important property of a data object to allow verifying identity and integrity.

Chief Data OfficerA corporate officer responsible for enterprise-wide governance and utilization of information as an asset, via data processing, analysis, data mining, information trading and other means.

Chief Digital OfficerA person who helps an organisation drive growth by converting traditional "analog" activities to digital ones, and overseas operations in the rapidly changing digital sectors such as mobile applications, social media and related applications, virtual goods, as well as "wild" web-based information management and marketing.

Chief Information OfficerThe most senior executive in an enterprise responsible for the information technology and computer systems that support enterprise goals.

Chief Technology OfficerTypically is at the same level as, or reporting directly to the chief information officer, the Chief Technology Officer is primarily concerned with long-term and "big picture" issues (while still having deep technical knowledge of the relevant field).

Citable dataA type of referable data that has undergone quality assessment and can be referred to as citations in publications and as part of research objects.

ClientIndividuals, units or organizations using a service or product.

Cloud computingA large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically- scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over the Internet.

Cloud ecosystemAn ecosystem that includes not only traditional elements of cloud computing such as software and infrastructure, but also consultants, integrators, partners, third parties and anything in their environments that has a bearing on the other components.

Collection management identificationA type of data provenance that adds metadata to identify data collections.

Comma separated valuesA file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a comma from the next column's value and each row starts a new line.

CommitThe final step in the successful completion of a previously started database change as part of handling a transaction in a computing system.

ComplianceConformance with a law or regulation.

ComponentAn entity with a discrete structure within a system considered at a particular level of analysis (e.g., an assembly or software module). Components may be characterized by the services they offer and the internal structures that are required to offer those services.

Compute intensiveAny computer application that requires a lot of computation, such as meteorology programs and other scientific applications.

Computer codeA series of computer instructions written in some human readable computer language, usually stored in a text file.

Computer intensiveAny computing application that requires the resources of a lot of computers, such as grid computing.

ConceptThe smallest, unambiguous unit of thought that is uniquely identifiable.

Confidential informationAny information obtained by a person on the understanding that they will not disclose it to others, or obtained in circumstances where it is expected that they will not disclose it.

ConfidentialityThe duties and practices of people and organizations to ensure that individual's personal information only flows from one entity to another according to legislated or otherwise broadly accepted norms and policies.

ConformanceThe state of having satisfied the requirements of some specific standard(s) and/or specification(s).

Consensus standardA standard developed through the cooperation of all parties who have an interest in participating in its development and/or use.

Consumer dataThe information trail customers leave behind as a result of their Internet use. This data, which sometimes comprises personal information, comes from such sources and channels as social media networks, marketing campaigns, customer service requests, call centre communications, online browsing data, mobile applications, purchasing history and preferences, and more.

ContainerSomething able to hold objects. A data repository can hold data objects and collections. In this case the data repository may be considered a type of data container.

Content informationThe set of information that is the original target object that has been registered and is under preservation.

Content replicationA type of Digital migration where there is no change to the Packaging Information, the Content Information, or the PDI. The bits used to represent these Information Objects are preserved in the transfer to the same or new media instance.

Controlled vocabularyA list of standardized terminology, words, or phrases, used for indexing or content analysis and information retrieval, usually in a defined information domain.

CorpusA set of documents that has a scientific meaning. A corpus can be produced by an individual researchers activity (including its archival materials), or from a laboratory research, field campaign or science and culture heritage project, a survey, etc.

CorrelationA statistical measure that indicates the extent to which two or more variables fluctuate together. Correlation does not imply causation. There may be, for example, an unknown factor that influences both variables similarly.

Corrupt dataDeterioration of computer data as a result of some external agent such as viruses, hardware or software incompatibility, flaws, or failures, power outages, dust, water, extreme temperatures, etc.

CreativityThe ability to have an innovative approach to research by creating new or modified current concepts, theories, approaches and/or solutions.

Cross-disciplinaryExplains aspects of one discipline in terms of another (e.g., the physics of music; the politics of literature).

CurationThe activity of managing and promoting the use of data from their point of creation to ensure that they are fit for contemporary purpose and available for discovery and reuse.

Curation workflowA type of workflow that includes active steps to curate data as an aid to on-going management of data through its lifecycle.


Dark dataOperational data that are not being used, such as information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes.

Darwin information typing architectureA document creation and management specification that builds content reuse into the authoring process.

DataFacts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation.

Data access protocolA system that allows outsiders to be granted access to databases without overloading the system.

Data acquisitionThe process of acquiring data from some source. For example, data may be acquired by download from a repository, transfer from a data logger, data capture, etc.

Data analysisA data lifecycle stage that involves the techniques that produce synthesized knowledge from organized information.

Data archiveAn archival service providing the long-term permanent care and accessibility for digital objects with research value.

Data availabilityThe state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user.

Data captureThe process or means of obtaining and storing external data, particularly images or sounds, for use at a later time.

Data catalogueA curated collection of metadata about datasets and their data elements.

Data centreA facility providing IT services, such as servers, massive storage, and network connectivity.

Data citationOffers proper recognition to authors as well as permanent identification through the use of global persistent identifiers in place of URLs which can change frequently.

Data cleaningA continuous process that requires corrective actions throughout the data lifecycle.

Data completenessThe degree to which all required measures are known. Values may be designated as "missing" in order not to have empty cells, or missing values may be replaced with default or interpolated values. In the case of default or interpolated values, these must be flagged as such to distinguish them from actual measurements or observations.

Data complianceData compliance consists of the ongoing processes to ensure adherence of data to both enterprise business rules (government department, university, industry, or agency), and to legal, regulatory and accreditation requirements.

Data containerA software stack that is chunking digital objects at a physical layer. Typical containers are file systems, database management systems, content management systems, clouds etc.

Data curationA managed process, throughout the data lifecycle, by which data & data collections are cleansed, documented, standardized, formatted and inter-related. This includes versioning data, or forming a new collection from several data sources, annotating with metadata, adding codes to raw data (e.g., classifying a galaxy image with a galaxy type such as "spiral").

Data custodianA data custodian is an IT individual or organization responsible for the IT infrastructure providing and protecting data in conformance with the policies and practices prescribed by data governance.

Data de-noisingRemoving noise from data.

Data destructionThe process of destroying data stored on tapes, hard disks and other forms of electronic media so that it is completely unreadable and cannot be accessed or used.

Data dictionaryA collection of descriptions of the data objects or items in a data model.

Data dredgingA data mining practice in which large volumes of data are analyzed seeking any possible relationships between data. The traditional scientific method, in contrast, begins with a hypothesis and follows with an examination of the data.

Data driven decision managementAn approach to governance that values decisions that can be backed up with data that can be verified. The success of the data-driven approach is reliant upon the quality of the data gathered and the effectiveness of its analysis and interpretation.

Data driven disasterA serious problem caused by one or more ineffective data analysis processes. In addition to the financial burden, problems with data quality and analysis can have a serious impact on security, compliance, project management and human resource management, among others.

Data elementA unit of data for which the definition, identification, representation (term used to represent it), and permissible values are specified by means of a set of attributes.

Data entityAn object, event, or phenomenon about which data are stored in a database and which has intermediate representation in a Data Model.

Data explorationData exploration involves summarizing the main characteristics of a dataset using visualization and should be the first step in data analysis.

Data file formatThe layout of a file in terms of how the data within the file are organized. A program that uses the data in a file must be able to recognize and possibly access data within the file.

Data governanceThe exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets.

Data harmonizationIn the context of epidemiology: Making data from different sources comparable. The processes involved in producing inferentially equivalent data.

Data hygieneThe collective processes conducted to ensure the cleanliness of data. Data are considered clean when they are relatively error-free.

Data identifierAn identifier that uniquely distinguishes one set of data from all others.

Data ingestionThe process of obtaining, importing, and processing data for later use or storage in a database. This process often involves altering individual files by editing their content and/or formatting them to fit into a larger document.

Data integrationCombining diverse datasets from disparate sources into one unified dataset or database. Data are accessed and extracted, moved, validated, cleaned, transformed and loaded.

Data integrityThe assurance that information can only be accessed or modified by those authorized to do so.

Data itemA type of data element that expresses a proposition that binds one or more property values to some data entity.

Data librarianData experts who have a librarian background. Data librarians often carry out curation and metadata related work. There is much overlap between data librarians, data managers, and data stewards.

Data lifecycleRefers to all the stages in the existence of digital information from creation to destruction. A lifecycle view is used to enable active management of the data objects and resource over time, thus maintaining accessibility and usability.

Data linkageThe process of bringing together from two or more different sources, data that relate to the same individual, family, place or event).

Data managementThe activities of data policies, data planning, data element standardization, information management control, data synchronization, data sharing, and database development, including practices and projects that acquire, control, protect, deliver and enhance the value of data and information.

Data management infrastructureAn infrastructure used to provide data management and enforce data management policies. A data management infrastructure would include resources such as a data repository and an information catalogue.

Data management planA formal statement describing how research data will be managed and documented throughout a research project and the terms regarding the subsequent deposit of the data with a data repository for long-term management and preservation.

Data management policyA written document backed by management describing policy and providing guidance to ensure that appropriate standards, consistent guidelines, and common strategies are used, providing linkages to and consistency with other similar systems, and fostering a true network across an organization producing data.

Data martA repository of data designed to serve a particular community of knowledge workers. The goal of a data mart is to meet the particular demands of a specific group of users.

Data migrationThe process of transferring data between storage types, formats, information technologies, or computer systems. A data migration project is usually undertaken to replace or upgrade servers or storage equipment, for a website consolidation, to conduct server maintenance or to relocate a data center.

Data miningThe process of analyzing multivariate datasets using pattern recognition or other knowledge discovery techniques to identify potentially unknown and potentially meaningful data content, relationships, classification, or trends.

Data modelA model that specifies the structure or schema of a dataset. The model provides a documented description of the data and thus is an instance of metadata. It is a logical, relational data model showing an organized dataset as a collection of tables with entity, attributes and relations.

Data modelingData modeling formalizes and documents existing processes and events. It captures and translates complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering.

Data mungingA series of potentially destructive or irrevocable changes to a piece of data or a file. Common munging operations include removing punctuation or html tags, data parsing, filtering, and transformation.

Data organizationDenotes the complexity of measures that are used by a repository to form aggregations of data objects (including collections and metadata) to describe the properties of data objects, to register PIDs, to build the PID records, to link between all components, and to set up the containers (software stack) that are used to store all … Continue reading Data organization

Data policyAn organizationís stated data/information management processes designed to assist and protect the organization's data research assets. It is a set of high-level principles that establish a guiding framework for data management. A data policy can be used to address strategic aspects such as data access, relevant legal matters, data stewardship issues and custodial duties, data … Continue reading Data policy

Data preprocessingAny type of processing performed on raw data to prepare it for another processing procedure. Preprocessing may include: data sampling, data transformation, de-noising, data normalization, data standardization, or feature extraction.n/a

Data processingA generic concept referring to all kinds of procedures being executed on data at any point in the data life cycle.n/a

Data productionIncludes all activities involved in the planning, collecting, processing, analysis and maintenance of data in the original research project. Among these activities are selecting a study design, constructing instruments for data collection, conducting data collection/creation, performing data editing/verification/validation, analyzing data, backing up data versions and preparing and tagging metadata.n/a

Data profilingThe statistical analysis and assessment of the quality of data values within a dataset for consistency, uniqueness and logic. The data profiling process cannot identify inaccurate data; it can only identify business rules violations and anomalies. The insight gained by data profiling can be used to determine how difficult it will be to use existing … Continue reading Data profiling

Data publicationThe release of research data, associated metadata, accompanying documentation, and software code (in cases where the raw data have been processed or manipulated) for re-use and analysis in such a manner that they can be discovered on the Web and referred to in a unique and persistent way. Data publishing occurs via dedicated data repositories … Continue reading Data publication

Data qualityThe reliability and application efficiency of data. It is a perception or an assessment of dataset's fitness to serve its purpose in a given context. Aspects of data quality include: Accuracy, Completeness, Update status, Relevance, Consistency across data sources, Reliability, Appropriate presentation, Accessibility. Within an organization, acceptable data quality is crucial to operational and transactional … Continue reading Data quality

Data recoveryThe process of restoring data that have been lost, accidentally deleted, corrupted or made inaccessible for any reason. The data recovery process may vary, depending on the circumstances of the data loss, the data recovery software used to create backups, and backup target media. In some cases, end users may be able to restore lost … Continue reading Data recovery

Data reductionThe process of reducing the amount or size of stored data. This may be achieved by eliminating redundant copies of data files, deduplicating data files by removing redundant records, or by compressing the data files.n/a

Data reference modelA framework whose primary purpose is to enable information sharing and reuse across the federal government via the standard description and discovery of common data and the promotion of uniform data management practices.n/a

Data registrationA curation process on a data object by which it receives a persistent object identifier (PID) from a trusted registration authority. Registration must be accompanied by the step(s) to upload the data object to a persistent repository. RELATED TERM. Repository; Persistent identifierREFERENCE. Research Data Alliance ; NISO (2004) Understanding Metadata. Bethesda, MD: NISO Press, … Continue reading Data registration

Data repository managementA type of data management using repositories. It is the set of policies that govern the organization, control, and properties of the repository such as: required file formats, access control restrictions, integrity, replication, retention, disposition, etc.n/a

Data representationAn object describing the context of the data, including provenance, description, structural, and administrative information.n/a

Data rescueRecovery and/or transformation and digitization of dark data and at-risk data so that they can be preserved, accessed, shared, and used. Data rescue also involves the addition of rich metadata to make the content understandable and more easily re-usable.REFERENCE. ; ;

Data residencyThe physical or geographic location of an organization's data or information. Data residency also refers to the legal or regulatory requirements imposed on data based on the country or region in which it resides. Cloud computing, which allows organizations to deliver hosted services over the Internet, can create data residency concerns. Users need to know … Continue reading Data residency

Data retention policyAn organization's established protocol for retaining information for operational or regulatory compliance needs. The objectives of a data retention policy are to keep important information for future use or reference, to organize information so it can be searched and accessed at a later date, and to dispose of information that is no longer needed. A … Continue reading Data retention policy

Data reviewAn activity through which the correctness conditions of the data are verified. It also includes the specification of the type of the error or condition not met, and the qualification of the data and its division into the "error-free" and "erroneous" data. Data review consists of both error detection and data analysis, and can be … Continue reading Data review

Data samplingSelection of a statistically representative subset from a large population of datan/a

Data scalingTechniques used to deal with parameters having different units and scales. SYNONYM. Data rescaling. RELATED TERM. Data standardizationn/a

Data selectionA process that creates a new dataset from an original source. Examples include: creating a subset of the data,querying a database.n/a

Data sharingThe practice of making data available for reuse. This may be done, for example, by depositing the data in a repository, through data publication. SYNONYM. Data dissemination; Data posting. RELATED TERM. Repository; Data publicationn/a

Data splittingAn approach to protecting sensitive data from unauthorized access by encrypting the data and storing different portions of a file on different servers. An unauthorized person would need to know the locations of the servers containing the parts, be able to get access to each server, know what data to combine, and how to decrypt … Continue reading Data splitting

Data standardizationIn the context of data analysis and data mining: Where "V" represents the value of the variable in the original datasets: Transformation of data to have zero mean and unit variance. Techniques used include: (a) Data normalization; (b) z-score scaling; (c) Dividing each value by the range: recalculates each variable as V /(max V - … Continue reading Data standardization

Data stewardData stewardship is a shared responsibility between Principal Investigators and data stewards. Principal Investigators are responsible for, and data stewards provide support for: (a) Data collection, data integration, or reuse of existing data; (b) Review of data quality; (c) Description of scientific workflow/process; (d) Provision of standards-compliant metadata; and, (e) Submission of data and data … Continue reading Data steward

Data storeA repository for persistently storing collections of data, such as a database, a file system or a directory. The data stored can be of any type that can be rendered in digital format and placed in electronic media. Examples include text, image, video files and audio files.n/a

Data streamA sequence of digitally encoded, coherent signals used to send or receive a representation of information content as transmitted.n/a

Data structureA specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. Any data structure is designed to organize data to suit a specific purpose so that it can be accessed and worked with in appropriate ways. In computer programming, a … Continue reading Data structure

Data structure continuumThe continuum of data structure that includes unstructured data, semi-structured data, and structured data.n/a

Data table attribute1. A field or column in a database table. It is an abbreviation for 'physical data attribute' which is a single data element related to a data object, such as a table in a database. The database schema associates one or more attributes with each database entity (i.e. table). 2. A term for a logical … Continue reading Data table attribute

Data tensionHuman tension and/or stress related to the sharing or release of data resulting from concerns about: (a) unknowns about users, uses, and what users will learn from the data before the data producers themselves learn it; (b) what users will learn from the data; (c) data quality; (d) data traceability (or lack thereof); (e) potential … Continue reading Data tension

Data traceabilityData traceability follows the lifecycle of data to track all access and changes to the data. It helps demonstrate transparency, compliance and adherence to regulations. Data traceability, along with data compliance, can be considered part of a data audit process. Data traceability is fundamental to reproducible research.n/a

Data transformationManipulation of raw data to produce a single output. RELATED TERM. Data selection; Data processing; Data pre-processingn/a

Data type registry1. A registry that links data types of all sorts with the executable data processing functions that can be useful for working with a specific data type. Examples include: complex file types in biology (diagnosis), registering categories that appear in PID records to describe data properties. Data types range from complex digital objects to simple … Continue reading Data type registry

Data upload databaseA collection of interrelated data often with controlled redundancy, organized according to a scheme to serve one or more applications; the data are stored so that they can be used by several programs without concern for data structures or organization.n/a

Data validationProvides well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system. Data validation checks that data are valid, sensible, reasonable, clean, usable, and secure before they are processed. Failures or omissions in data validation can lead to data corruption, security vulnerability. Improperly validated data … Continue reading Data validation

Data warehouseA central repository for all or significant parts of the data that an organization's various business systems collect. A data warehouse tends to be a strategic but somewhat unfinished concept. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the … Continue reading Data warehouse

Data wranglingThe process of manually or semi-automatically converting or mapping data from one form into another format that allows for more convenient consumption of the data with the help of semi-automated tools. Gathering and organizing disparate data from different sources, often collected by many different investigators. Activities include developing and supporting search tools that utilize standardized … Continue reading Data wrangling

Data z-score scalingVariables are recalculated as (V - mean of V)/s, where "V" represents the value of the variable in the original dataset, and "s" is the standard deviation. As a result, all variables in the dataset have equal means (0) and standard deviations (1) but different ranges. Also known as z-score scaling.n/a

DatabaseA collection of data that is organised in a according to a conceptual structure/model describing the characteristics of these data and the relationships among their corresponding entities, supporting one or more application areas. A database allows its contents to be easily accessed, managed and updated. The type of database used depends on the requirements of … Continue reading Database

Database administrationThe function of managing the physical aspects of data resources, including database design and integrity, backup and recovery, performance and tuning.REFERENCE. DAMA Dictionary of Data Management

DatasetAny organized collection of data in a computational format, defined by a theme or category that reflects what is being measured/observed/monitored. The presentation of the data in the application is enabled through metadata.REFERENCE. Research Data Alliance ; Mapping the Data Landscape 2011 Summit; TBS Standard on Geospatial Data (ISO 19115:2003); Environment Canada data stewardship … Continue reading Dataset

Dataset seriesA collection of datasets sharing the same product specification. A dataset series is a type of aggregation or collection with some "logical grouping" such as by a topic (specification) with the (product) unit being a dataset series. Example: A series of earth observations. Each year, month or week (depending on the volume) might be a … Continue reading Dataset series

DatetimeA standard way to express a numeric calendar date that eliminates ambiguity, acceptable formats being defined by ISO 8601. ISO 8601 is applicable whenever representation of dates in the Gregorian calendar, times in the 24-hour timekeeping system, time intervals and recurring time intervals or of the formats of these representations are included in information interchange. … Continue reading Datetime

De facto standardA standard that is widely accepted and used, but lacks formal approval by a recognized standards developing organization (e.g., the QWERTY keyboard).n/a

De-anonymizationDe-anonymization is a reverse engineering process in which de-identified data are cross-referenced with other data sources to re-identify the personally identifiable information. This could occur if a de-identification process had not been not successfully performed, or had not been undertaken in the first place.n/a

De-identification1. The act of minimally perturbing individual-level data to decrease the probability of discovering an individualís identity. It involves masking direct identifiers (e.g., name, phone number, address) as well as transforming indirect identifiers that could be used alone or in combination to-identify an individual (e.g., birth dates, geographic details, dates of key events). If done … Continue reading De-identification

Deep archiveA storage location for data that will probably not be accessed again, but must be kept in case of a compliance audit or some other reason.n/a

DefectNon-conformance to requirements.n/a

Demilitarized zoneIn the context of computer networks: A physical or logical sub-network that separates an internal local area network (LAN) from other untrusted networks, usually the Internet. External-facing servers, resources and services are located in the DMZ so they are accessible from the Internet but the rest of the internal LAN remains unreachable. This provides an … Continue reading Demilitarized zone

DenormalizationIn a relational database, denormalization is an approach to speeding up read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized. A denormalized database should not be confused with a database that has never been normalized. After data has been duplicated, the … Continue reading Denormalization

Derived data productThe results of applying a procedure to transform a data object in order to obtain a desired data product that is stored in a repository along with the provenance and descriptive metadata. RELATED TERM. Data transformationn/a

Descriptive metadataEnables identification, location, and retrieval of information resources by users, often including the use of controlled vocabularies for classification and indexing and links to related resources.REFERENCE. DCC/TC3+

DigitalA record created digitally in the day-to-day business of the organisation and assigned formal status by the organisation. Examples include: word processing documents, emails, databases, or intranet web pages. SYNONYM. Electronic recordREFERENCE. Research Data Alliance ; Digital preservation coalition;

Digital Object IdentifierA name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. A DOI is a type of Persistent Identifier (PID) issued by the International DOI Foundation. This permanent identifier is associated with a digital object that permits … Continue reading Digital Object Identifier

Digital archiving1. In the context of library and archiving communities: Digital archiving is often used interchangeably with digital preservation. 2. In the context of computing» Digital archiving is process of backup and ongoing maintenance as opposed to strategies for long-term digital preservation. RELATED TERM. ArchivingREFERENCE. Digital preservation coalition

Digital dataData in the form of digital materials. RELATED TERM. Digital materialsn/a

Digital infrastructureThose layers that sit between base technology (a computer science concern) and discipline-specific science. The focus is on value-added systems and services that can be widely shared across scientific domains, both supporting and enabling large increases in multi- and interdisciplinary science while reducing duplication of effort and resources (e.g., including hardware, software, personnel, services and … Continue reading Digital infrastructure

Digital materialsA broad term encompassing: (a) digital surrogates created as a result of converting analogue materials to digital form (digitisation); (b) "born digital" for which there has never been and is never intended to be an analogue equivalent; and, (c) digital records. RELATED TERM. Born digital; Digital objects; Digital records; Digital data; Electronic recordsREFERENCE. Digital preservation … Continue reading Digital materials

Digital objectA digital object is editable, interactive, accessible and modifiable by means of digital objects other than the one governing its behaviour, and is distributed over information infrastructures. It is a machine-independent data structure consisting of one or more elements in digital form that can be parsed by different information systems; the structure helps to enable … Continue reading Digital object

Digital preservationThe series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation is defined very broadly and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change. Those materials may be records created during the … Continue reading Digital preservation

Digital research dataResearch data which is in digital form. It may have been originally created in digital form, or it may have been converted from paper, or other form to a digital representation.n/a

Digital scholarshipIncorporates: ï building a digital collection of information for further study and analysis; ï creating appropriate tools for collection- building; ï creating appropriate tools for the analysis and study of collections; ï using digital collections and analytical tools to generate new intellectual products; and, ï Creating authoring tools for these new intellectual products, either in … Continue reading Digital scholarship

Digital signalsNon-continuous electronic signals.n/a

DigitisationThe process of creating digital files by scanning or otherwise converting analogue materials. The resulting digital copy, or digital surrogate, would then be classed as digital material and then subject to the same broad challenges involved in preserving access to it, as "born digital" materials.REFERENCE. Digital preservation coalition

Dirty dataData that contain errors. Dirty data can be caused by a number of factors including: inaccurate, incomplete or erroneous data such as spelling or punctuation errors, incorrect data or incorrect data type associated with a field, incomplete or outdated data, duplicate data, inconsistent data, incorrectly ordered data, improper parsing of fields from disparate systems, etc. … Continue reading Dirty data

DissambuationThe act of interpreting an author's intended use of a word that has multiple meanings or spellings.n/a

Document type definitionThe building blocks of an XML document.n/a

Documented dataData that are delivered with all associated metadata, data dictionary, description of methods and instruments used to collect and process the data, and other supporting data (e.g., duplicate sample results, replicate analyses, percent recovery, etc.).n/a

Dublin CoreAn initiative to create a digital "library card catalog" for the Web. Dublin Core is made up of 15 metadata elements that offer expanded cataloging information and improved document indexing for search engine programs. The 15 metadata elements used by Dublin Core are: title (the name given the resource), creator (the person or organization responsible … Continue reading Dublin Core

Dynamic dataData the content of which is changing frequently and at asynchronous moments. Examples include: Data streams that are generated by sensors when it is unpredictable when data segments will appear in time (i.e. data streams have gaps); Data streams that are generated by humans in crowdsourcing scenarios where it is not clear when which cell … Continue reading Dynamic data


E-ResearchComputationally intensive, large-scale, networked and collaborative forms of research and scholarship across all disciplines, including all of the natural and physical sciences, related applied and technological disciplines, biomedicine, social science and the digital humanities.n/a

E-Research infrastructureComprises the ICT assets, facilities and services that support research within institutions and across national innovation systems, and that enable researchers to undertake excellent research and deliver innovation outcomes.n/a

E-ScienceScience supported to a significant degree by digital information-processing and/or computational technologies, or wholly based on these. Note that such a definition is functional, not some intrinsic property of the science. Data-based science, that is science which is based wholly or in part on exploiting existing information, is included within this definition. E-Science includes a … Continue reading E-Science

Extensible Markup LanguageExtensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. SYNONYM. XMLn/a

EcosystemThe complex of a community of organisms and its environment functioning as an ecological unit.REFERENCE. Merriam-Webster dictionary

Electronic health record1. A compilation of core electronic health data submitted by various healthcare providers and organizations, accessible by numerous authorized parties from a number of points of care, possibly even from different jurisdictions. 2. An official health record for an individual that is shared among multiple facilities and agencies. 3. Electronic health records typically include: Contact … Continue reading Electronic health record

Electronic medical recordAn electronic version of the paper record that doctors have traditionally maintained for their patients and which is typically only accessible within the facility or office that controls it. RELATED TERM. Electronic health recordREFERENCE. Canadian Medical Protective Association (2014) Electronic Records Handbook

Encoding schemaMachine processable specifications which define the structure and syntax of metadata specifications in a formal schema language.REFERENCE. Rhys Francis/TC3+

Engineering and scientific supportA technical service involved in the performance, inspection and leadership of skilled technical activities.Examples include the: (a) Planning, design and making of maps, charts, drawings, illustrations and art work; (b) Design of three-dimensional exhibits or displays within a predetermined budget and pre-selected theme; (c) Conduct of analytical, experimental or investigative activities in the natural, physical … Continue reading Engineering and scientific support

EnhancementA noteworthy improvement to a product as part of a new version of it.n/a

Error1. The difference between a computed, observed, or measured value or condition and the true, specified, or theoretically correct value or condition. 2. An incorrect step, process, or data definition. 3. An incorrect result. 4. A human action that produces an incorrect result.n/a

Error seedingThe process of intentionally adding known faults to those already in a computer program for the purpose of monitoring the rate of detection and removal, and estimating the number of faults remaining in the program.n/a

EvaluationEvaluation is a decision about significance, value, or quality of something, based on careful study of its good and bad features.n/a

ExecutiveA position located no more than three hierarchical levels below the highest level in an organization, and that have significant executive managerial or executive policy roles and responsibilities or other significant influence on the direction of the organization. Executives are responsible and accountable for exercising executive managerial authority or providing recommendations and advice on the … Continue reading Executive

Extensible resource identifierA defining scheme used for identification of resources (including people and organizations) and the sharing of data across domains, enterprises, and applications. XRI TC will define a Uniform Resource Identifier (URI) scheme and a corresponding Uniform Resource (URN) namespace.n/a

Extract-Transform-LoadETL involves the following steps: (a) Extract data from homogeneous or heterogeneous data sources which are often managed by different people. An intrinsic part of the extraction involves data validation to confirm whether the data pulled from the sources have the correct/expected values; (b) Transform the data for storing it in proper format or structure … Continue reading Extract-Transform-Load


FailureThe inability of a system or component to perform its required functions within specified performance requirements.n/a

Fair useA legal concept that allows the reproduction of copyrighted material for certain purposes without obtaining permission and without paying a fee or royalty. Purposes permitting the application of fair use generally include review, news reporting, teaching, or scholarly research. When in doubt, the quickest and simplest thing may to request permission of the copyright owner.n/a

Feature extractionSelecting specific data that are significant in some particular contextn/a

FieldA data table column name. SYNONYM. Column. RELATED TERM. Attributen/a

FirefightingAn emergency allocation of resources, required to deal with an unforeseen problem.n/a

Fixed dataData that are not, under normal circumstances, subject to change. Examples of fixed data include results from concluded research, medical records, and historical data. SYNONYM. Reference data; Archival data; Fixed-content data; Permanent datan/a

Foundational interoperabilityFoundational interoperability allows data exchange from one information technology system to be received by another and does not require the ability for the receiving information technology system to interpret the data.REFERENCE. Healthcare information management and systems society

FrameworkA real or conceptual structure intended to serve as a support or guide for the building of something that expands the structure into something useful. The ability to make refinements may require that the design is fully known, and this is not necessarily known at the outset. "Framework" is thus sometimes used as a 'fuzzy' … Continue reading Framework


Golden record1. A single, well-defined version of all the data entities in an organizational ecosystem. In this context, a golden record is sometimes called the "single version of the truth," where "truth" is understood to mean the reference to which data users can turn when they want to ensure that they have the correct version of … Continue reading Golden record

Governance1. Exercising authority to provide direction and to undertake, coordinate, and regulate activities in support of achieving this direction and desired outcomes. 2. Governance can be thought of as the role of an organizationís board of directors or its equivalent that is focused on defining that organizationís purpose and the development of the strategies, objectives, … Continue reading Governance

Governance and accountability modelProvides the relationship and process context for working together to ensure outcomes are achieved.n/a

GranularityREFERENCE. DAMA Dictionary of Data Management; Wikipedia

GremlinAn imaginary creature that causes trouble in devices and systems of all kinds. During the Second World War, the term was used by British airmen to refer to ongoing trouble with aircraft in spite of mechanics' best efforts. Gremlins sometimes appear today in computer systems and networks. Although gremlins never do their dirty work in … Continue reading Gremlin

GridAny distributed infrastructure that is federated to combine resources from multiple organizations managed by different administrative domains. The Grid aims to coordinate the sharing of resources in a dynamic and multi-institutional setting to provide additional functionality beyond its constituent parts: brokering, workflow coordination, integration of computing and storage. In order for this to happen, interoperability … Continue reading Grid


Hashing1. The transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. 2. Used … Continue reading Hashing

Health scienceThe application of a comprehensive knowledge of professional specialties in the fields of dentistry, medicine, nursing, nutrition and dietetics, occupational and physical therapy, pharmacy, psychology and social work to the safety and physical and mental well-being of people; and, in the field of veterinary medicine, to the prevention, diagnosis and treatment of animal diseases and … Continue reading Health science

Heat mapA two-dimensional representation of data in which values are represented by colors. Heat maps communicate relationships between data values that would be would be much more difficult to understand if presented numerically in a spreadsheet.n/a

High quality dataHigh-quality data are complete, timely, accurate, consistent, relevant, reliable, traceable, cleaned, validated, and well documented.n/a

Human-readable formatData and code that are commented so that humans can understand what it represents, itís design, and purpose.REFERENCE. Wilson G, Aruliah DA, Brown CT, Hong NPC, Davis M, Guy RT, Haddock SHD, Huff K, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P (2012). Best practices for scientific computing , arXiv, 29 November, 1-6.

Hypermedia As The Engine Of application StateHypermedia As The Engine Of application State. SYNONYM. HATEOSn/a


ISO 17025Specifies the general requirements for the competence to carry out tests and/or calibrations, including sampling. It covers testing and calibration performed using standard methods, non-standard methods, and laboratory-developed methods. Originally known as Guide 25, ISO 17025 was initially issues in 1999. The 2005 revision of ISO 17025 introduced greater emphasis on the responsibilities of senior … Continue reading ISO 17025

ISO 19115 Metadata profileA metadata profile that specifies the elements and syntax to be used when implementing the international geospatial standard (ISO 19115: 2003) in North America. SYNONYM. North American Profile for ISO 19115; NAP.REFERENCE. Government of Canada, Environment Canada data stewardship handbook (draft)

ISO 8000ISO/TS 8000-1:2011 contains a statement of the scope as a whole, principles of data quality, the high-level data architecture of ISO 8000, a description of the structure of ISO 8000, and a summary of the content of the other parts of the general data quality series of parts of ISO 8000. It also describes the … Continue reading ISO 8000

ISO 9000A family of standards and guidelines related to quality management systems, terminology, and tools, such as auditing. It states requirements for what an organization must do to manage processes influencing quality. While ISO 9000 is primarily concerned with processes and not products, the way an organization manages its processes affects the final product and helps … Continue reading ISO 9000

Identity ecosystemMore formally known as the National Strategy for Trusted Identities in Cyberspace, the identity ecosystem is a proposal from the United States federal government to improve identity authentication on the Internet and make online transactions safer. The proposal has four goals: To develop a comprehensive Identity Ecosystem framework; To build and implement an interoperable identity … Continue reading Identity ecosystem

ImpactIn the context of a researcher's activities, impact is the consequence of the research and new knowledge on the advancement of the specialty. Science-based policies, regulations, services and technology transfers are some examples of ways target results can be achieved and impact demonstrated. Impact is one of four valued outcomes. In a 5-level incumbent-based process, … Continue reading Impact

ImportUse an application (software) to open a file that is in a format different from the format the application creates on its own. Assuming the application knows how to import (reformat) the file, it does so and then opens it for the user to work on. After working with the opened file, the application user … Continue reading Import

Incumbent-basedIn the context of a researcher's job, an incumbent-based position means that the researcherís achievements in research contexts determine his/her level for initial appointments and promotion in a job. Incumbents are promoted by appointment to a higher level in their own positions based upon the incumbent's' qualifications.. Only valued outcomes are used to assess a … Continue reading Incumbent-based

Indeterminate employmentEmployment of no fixed duration, whether part-time, full-time or seasonal.n/a

InformationThe aggregation of data to make coherent observations about the world, meaningful data, or data arranged or interpreted in a way to provide meaning.

Information governanceREFERENCE. U.K. Government, 2013

Information management advisorA person having a broad knowledge of information management disciplines and who provides guidance and support to program and staff functions on all aspects of managing the information resource.n/a

Information management specialistA person who is expert in one or more of the information management disciplines that support the effective and efficient management of information.n/a

Information silosHeterogeneous data sources.n/a

Information technology specialistInformation systems and technology infrastructure manager, expert, or technician.n/a

InnovationIn the context of a researcher's activities, innovation is the development of modified or novel approaches, theories, concepts, ideas or solutions. Innovation is one of four valued outcomes.RELATED TERM. Valued outcome; Incumbent-based; Research, development and analysis; Managing research; Representation and client servicesIn a 5-level incumbent-based process, demonstrated valued outcomes of innovation in research, development and … Continue reading Innovation

InputA variable (whether stored within a component or outside it) that is read by the component.n/a

Instrument1. A tool or device that is used to do a particular task. 2. A device that is used for making measurements of something.n/a

Instrument output dataRaw electronic data generated by an instrument, analyzer, or data logger before any human action on the data and before any processing of the data by automated or semi-automated 3rd-party software or algorithms. RELATED TERM. Raw datan/a

Integrated access managementA combination of business processes, policies and technologies that allows organizations to provide secure access to confidential data. Integrated access management software is used by enterprises to control the flow of sensitive data in and out of the network.n/a

Integration1. The act of bringing together smaller components into a single system that functions as one. 2. In the context of information technology: The end result of a process that aims to stitch together different, often disparate, subsystems so that the data contained in each becomes part of a larger, more comprehensive system that, ideally, … Continue reading Integration

IntegrityIn the context of data and network security: The assurance that information can only be accessed or modified by those authorized to do so. Measures taken to ensure integrity include controlling the physical environment of networked terminals and servers, restricting access to data, and maintaining rigorous authentication practices. Data integrity can also be threatened by … Continue reading Integrity

Intellectual leadershipThe capacity to influence stakeholders and the direction of research activities; the ability to shape others' understanding in ways that capture interest, inform and gain support; and, the capacity to influence the actions and opinions of others.REFERENCE. Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of … Continue reading Intellectual leadership

Inter-disciplinaryA study undertaken by scholars from two or more distinct scientific disciplines. The research is based upon a conceptual model that links or integrates theoretical frameworks from those disciplines, uses study design and methodology that is not limited to any one field, and requires the use of perspectives and skills of the involved disciplines throughout … Continue reading Inter-disciplinary

Interface testingTesting conducted to evaluate whether systems or components pass data and control correctly to each other.n/a

International Standards OrganizationA worldwide federation of national standards bodies from 143 countries. ISO is a non-governmental organization that promotes the development of standardization and related activities to facilitate the international exchange of goods and services, and to develop cooperation in intellectual, scientific, technological, and economic activity. The results of ISO technical work are published as international standards. … Continue reading International Standards Organization

International chemical identifierIn the context of chemistry: The International Chemical Identifier is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations. International Union of Pure and Applied Chemistry (IUPAC)REFERENCE. MIT data management and publishing

International standardA standard that is used in multiple nations and whose development process is open to representatives from all countries. Some international standards are promulgated by multinational treaty organizations (e.g., the International Telecommunications Union (ITU); the United Nations Food and Agriculture Organization (FAO)). Some international standards are promulgated by multinational non treaty organizations (e.g., the International … Continue reading International standard

InteroperabilityThe capability to communicate, execute programs, or transfer data among various functional units in a useful and meaningful manner that requires the user to have little or no knowledge of the unique characteristics of those units. Foundational, syntactic, and semantic interoperability are the three necessary aspects of interoperability.REFERENCE. Research Data Alliance ; The Open … Continue reading Interoperability

Investigation1. In the context of research, development and analysis: The quest to find the answer to a question using the scientific method. 2. In a legal or administrative context: An inquiry into concerns or allegations related to wrongdoing or illegal activity. RELATED TERM. Auditn/a


Key stakeholderA subset of Stakeholders who, if their support were to be withdrawn, would cause the project to fail.REFERENCE. Cornell Project Management Methodology

KnowledgeThe rules and organizing principles gleaned from aggregated data. The internalized or understood information that can be used to make decisions.REFERENCE. William Hersh (2007); Carol Tenopir (2007). See, Zins (2007)


Laboratory managerThe person who is responsible for the overall administration and the scientific and technical operation of a laboratory including the supervision of tests and the reporting of results of tests. The laboratory manager is responsible for assuring that the laboratory complies with all laws and regulations and is in conformance with all applicable standards. General … Continue reading Laboratory manager

Laboratory supervisorA person who under the general supervision of a laboratory manager supervises laboratory personnel and who may perform tests requiring special scientific skills. SYNONYM. Laboratory technical directorn/a

Laboratory technicianA person who under direct supervision performs laboratory tests which require limited technical skill and responsibilities.n/a

Laboratory technologistA person who under general supervision performs tests which require the exercise of independent judgment.n/a

LandscapeAn area defined by elements and their interaction (interfaces, protocols) where many specifications are unclear, but where it is nonetheless possible to indicate some essential functions even though not all elements (components, services) are known. A landscape may contain multiple frameworks at different stages of development and sophistication.n/a

Legacy dataData that fall into the category of dark data or at-risk data. RELATED TERM. Dark data; At-risk datan/a

Linked open dataData where relationships/connections between them are available to allow easy data access. A typical case of a large Linked dataset is DBPedia (, which essentially makes the content of Wikipedia available in RDF. This related collection of interrelated datasets is stored on the Web and available via a common format -RDF. Research Data Alliance … Continue reading Linked open data

Long-term preservationLong-term preservation - Continued access to digital materials, or at least to the information contained in them, indefinitely.REFERENCE. Digital preservation coalition


Machine readableIn a form that can be used and understood by a computer.n/a

Machine-readable formatA broad term encompassing: (a) digital surrogates created as a result of converting analogue materials to digital form (digitisation); (b) "born digital" for which there has never been and is never intended to be an analogue equivalent; and, (c) digital records. RELATED TERM. Born digital; Digital objects; Digital records; Digital data; Electronic recordsREFERENCE. Open Data … Continue reading Machine-readable format

Manage datasets in a repositoryImplement the policies that govern the arrangement, naming, descriptive metadata, provenance metadata, representation metadata, administrative metadata, access controls, retention, disposition, integrity, and replication of digital objects.n/a

Manage metadata catalogImplement the policies that govern the choice of metadata schema, reserved vocabularies, metadata organization in tables, and metadata properties (creation date, access control, ownership, etc.).n/a

ManagerProgram delivery managers and support function managers, at all levels in an institution who are accountable for the direct delivery and support of programs and services within their domain of business responsibility. SYNONYM. Chief. RELATED TERM. Research manager; Project manager; Principal Investigatorn/a

Managing researchIn the context of a researcher's activities, Managing research is the processes related to the planning, organizing, setting objectives, controlling and evaluating of RDA activities and their associated human and financial resources. It includes the provision of leadership to, and assessment of, other scientists, engineers, technologists, and/or other staff. Managing research is one of the … Continue reading Managing research

Mandatory standardRequires compliance because of a government statute or regulation, an organization internal policy, or contractual requirement. Failure to comply with a mandatory standard usually carries a sanction, such as civil or criminal penalties, or loss of employment.REFERENCE. American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""

MashupLightweight composite applications that source all of their content from existing systems and data sources; they have no native data store or content repository. To access the resources that they leverage, mashups employ the technologies of the Web, including representational state transfer (REST) APIs, RSS and ATOM feeds and widgets.REFERENCE. Research Data Alliance ; … Continue reading Mashup

MaskingThe application of a set of data transformation techniques to de-identify data without any concern for the analytical utility of the data. This is a good approach for fields that are not required to be analyzed. Masking is applied to direct identifiers such as name and phone number. Masking techniques include, among others, removal of … Continue reading Masking

Meaningful useIn the context of health information technology (HIT): defines minimum government standards for using electronic health records (EHR) and for exchanging patient clinical data between healthcare providers, between healthcare providers and insurers, and between healthcare providers and patients.n/a

Medium-term preservationMedium-term preservation - Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely.REFERENCE. Digital preservation coalition

Meets requirementsA program, system, dataset, or product that meets predefined requirements for a stated purpose - or does not meet the requirements. Preferable to using words such as perfect, outstanding, excellent, extremely good, very good, reasonable, acceptable, fine, adequate, all right, satisfactory, ok, or tolerable which are ambiguous unless defined in terms of specific, predefined requirements. … Continue reading Meets requirements

Message privacyIn an open network such as the Internet, message privacy, particularly for e-commerce transactions, requires encryption. The most common approach on the Web is through a public key infrastructure (PKI). For e-mail, many people use Pretty Good Privacy (PGP), which lets an individual encrypt a message or simply send a digital signature that can be … Continue reading Message privacy

MetadataLiterally, "data about data"; data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes. Business metadata includes the names and business definitions of subject areas, entities and attributes, attribute data types and other attribute properties, range descriptions, valid domain values and their … Continue reading Metadata

Metadata catalogueA catalogue containing metadata records in XML-encoded (machine-readable and human-readable) format that enables services to find data and services.n/a

Metadata datasetThe set of metadata describing a specific dataset.n/a

Metadata recordA collection of data defined by a theme, category, which reflects what is being measured, observed, monitored at the various sites. The Metadata Record is an information resource of business value.REFERENCE. Research Data Alliance ; DCC/TC3+

MiddlewareComputer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to perform communication and input/output, so they can focus on the specific purpose of their application.REFERENCE. Research Data Canada Infrastructure Committee; Wikipedia

MigrationA means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next. The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the … Continue reading Migration

Minimal metadataA description with very little curation that would include at least a name and PID of a data object. Minimal metadata is only marginally targeted at discovery since there is much better infrastructure to accomplish this.n/a

Moof monsterA vague and indefinable source of trouble for users of information technology. The term is used especially by people who frequent Internet Relay Chat (IRC) channels. If you are suddenly disconnected from your channel, it can be attributed to the moof monster. You are said to have been "moofed." The term seems reminiscent of the … Continue reading Moof monster

Murphy’s Law


NamespaceUniquely identifies a set of names so that there is no ambiguity when objects having different origins but the same names are mixed together. Using the Extensible Markup Language (XML), an XML namespace is a collection of element type and attribute names. These element types and attribute names are uniquely identified by the name of … Continue reading Namespace

National standardFrom an "official" perspective, a national standard is adopted by a national standards body (e.g., Standards Council of Canada, American National Standards Institute, British Standards Institution) and made available to the public. Practically speaking, however, a national standard is any standard that is widely used and recognized within a country. In this context, even government … Continue reading National standard

Negative testingTesting aimed at demonstrating that something does not work. SYNONYM. Dirty testingn/a

Noisy dataMeaningless data, including: Any data that cannot be understood and interpreted correctly by machines, such as unstructured text; any data that has been received, stored, or changed in such a manner that it cannot be read or used by the program that originally created it. RELATED TERM. Corrupt datan/a

Non identifiable dataData that could not lead to the identification of a specific individual, to distinguishing one person from another, or to personally identifiable information. These may be data that have been de-identified, or that could not lead to personally identifiable information in the first place. RELATED TERM. Non personally identifiable informationn/a

Non personally identifiable informationData that could not lead to the identification of a specific individual, to distinguishing one person from another, or to personally identifiable information. These may be data that have been de-identified, or that could not lead to personally identifiable information in the first place.n/a

NormalizationThe process of organizing data into tables in such a way that the results of using the database are always unambiguous and as intended. Normalization is typically a refinement process after the initial exercise of identifying the data objects that should be in the database, identifying their relationships, and defining the tables required and the … Continue reading Normalization


OAI repositoryA type of repository with a network accessible server that can process the 6 OAI-PMH requests in the manner described in the OAI Implementation Guide.n/a

Object attributeAn object model that is the logical attributes or properties associated with a particular object. In a data object this would be the associated properties.n/a

Object modelA collection of descriptions of classes or interfaces, together with their member data, member functions, and class-static operations.n/a

Object propertyThe characteristics of any digital object can be described by a number of properties which are typically stored in metadata and/or PID records.n/a

Open Archives Initiative Protocol for Metadata HarvestingA low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP. SYNONYM. OAI-PMHREFERENCE. oai.org

Open dataStrutured data that are accessible, machine-readable, usable, intelligible, and freely shared. Open data can be freely used, re-used, built on, and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.REFERENCE. Research Data Alliance ; Science as an Open Enterprise (SOE)as quoted by TC3+; Open Data 101 (Government of … Continue reading Open data

Open governmentA governing culture that holds that the public has the right to access the documents and proceedings of government to allow for greater openness, accountability, and engagement.n/a

Operational managementOngoing organizational activities associated with supporting functional elements, as opposed to project elements. Operational management also includes support of products that the organization has created through project activity.REFERENCE. Project Management Institute (2006) The Standard for Program Management.

Organizational leadershipOrganizational leadership is: (a) The ability to attract, assess, mobilize and focus energies and talent to work towards a shared purpose aligned with the mandate of the organization; (b) The ability to change culture, processes and priorities within the organization; and, (c) The ability to mentor.REFERENCE. Government of Canada (2006) Model guide for the preparation … Continue reading Organizational leadership

Original repositoryA type of repository where the original copy of data was stored and probably a data identifier registered.n/a


PID attributeA single data element related to a PID and part of its record content.n/a

PID domainFor a single identifier, the class of entity it refers to. For a PID system, the typical class of entities it is intended to be used for. Examples include: digital objects, physical objects, bodies, actors.n/a

PID recordA type of record (and organization) that stores an instance of an executable/understandable PID. The content of a PID record distinguishes a registered digital or data object from other DOs. A PID record is a type of record that includes property information that characterizes the digital object it is identifying. Important parts of a PID … Continue reading PID record

PID resolutionThe process of resolving a PID to a useful state of information about a digital object by using a globally available system.n/a

PID serviceA service that provides a connection between a PID and its target object.n/a

PID systemConsists of at least one PID resolver, a name schema and a defined mechanism for issuing PIDs that conform to the name schema. Examples include: DOI, Handle System, URN, ARK, PURL, etc.n/a

Peer reviewA process by which a scholarly work (such as a paper or a research proposal) is checked by a group of experts in the same field to make sure it meets the necessary standards before it is published or acceptedREFERENCE. Merriam-Webster dictionary

Persistent identifierA persistent identifier is a long-lasting reference to a digital object that gives information about that object regardless what happens to it. Developed to address "link rot," a persistent identifier can be resolved to provide an appropriate representation of an object whether that objects changes its online location or goes offline. SYNONYM. PIDREFERENCE. Research Data … Continue reading Persistent identifier

Persistent uniform resource locatorThis is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. SYNONYM. PURLREFERENCE. MIT data management and publishing

Personal information privacyThe World Wide Web Consortium's Platform for Personal Privacy Project (P3P) offers specific recommendations for practices that will let users define and share personal information with Web sites that they agree to share it with. The P3P incorporates a number of industry proposals, including the Open Profiling Standard (OPS). Using software that adheres to the … Continue reading Personal information privacy

Personally identifiable information1. Data which relate to a living individual who can be identified (a) from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller, and includes any expression of opinion about the individual and any indication of … Continue reading Personally identifiable information

Physical scienceThe application of comprehensive scientific and professional knowledge to the following applied sciences: physics, planetary, and earth sciences.n/a

Pipe separated valuesA file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a pipe (n/a

PreprintPreliminary version of an article that has not undergone review but that may be shared for comment. Preprints may be considered as grey literature. RELATED TERM. Working papern/a

PreservationAn activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology. SYNONYM. ConservationREFERENCE. JISC/TC3+; TBS Information Management Glossary (National Archives of Canada Preservation Policy)

Preservation metadataDocuments actions that have been undertaken to preserve a digital resource such as migrations and checks sum calculations. Example: Metadata Encoding and Transmission Standard (METS)REFERENCE. DCC/TC3+

Pretty Good Privacy IDSYNONYM. PGP ID. RELATED TERM. Pretty Good Privacy fingerprint; Message privacyn/a

Pretty Good Privacy fingerprintRELATED TERM. Pretty Good Privacy ID; Message privacyn/a

Principal InvestigatorThe Principal Investigator (P.I.) is a researcher who has a research leadership role and is the point of contact for a project or partnership that applies the scientific method, historical method, or other research methodology for the advancement of knowledge resulting in independent, objective, high quality, traceable, and reproducible results. The P.I. has primary responsibility … Continue reading Principal Investigator

Privacy1. In the context of the Internet: Most Internet users want that personal information they share will not be shared with anyone else without their permission. Privacy can be divided into the following concerns: (a) What personal information can be shared with whom; (b) Whether messages can be exchanged without anyone else seeing them; (c) … Continue reading Privacy

Privacy governanceMonitoring the risk to privacy posed by data requests from researchers, and the practices of data custodians in providing data (information governance) to ensure that confidentiality is protected. Such governance requires specialized knowledge of technology, law, and statistical methods. RELATED TERM. Information governance; Governance and accountability moden/a

Privacy-preserving data linkageData linkage where the resulting product has been de-identified. RELATED TERM. De-identificationn/a

ProcessA set of interrelated actions and activities performed to achieve a specified set of products, results, or services.REFERENCE. Project Management Institute (2006) The Standard for Program Management.

ProductivityIn the context of a researcher's activities, productivity is the generation of outputs (also called contributions) being produced by a researcher, in accordance with the rate consistent with the specialty or type of work. Productivity is one of four valued outcomes. In general, outputs may include, for example: peer-reviewed publications, scientific products, science advice, research … Continue reading Productivity

Professional standardEthical or legal duty of a professional to exercise the level of care, diligence, and skill prescribed in the code of practice of his or her profession, or as other professionals in the same discipline would in the same or similar circumstances. SYNONYM. Professional standard of caren/a

ProgramA group of related projects managed in a coordinated way to obtain benefits and control not available from managing them individually. Programs may include elements of related work (e.g., ongoing operations) outside the scope of the discrete projects in a program.REFERENCE. Project Management Institute (2006) The Standard for Program Management; Cornell Project Management Methodology

Program governanceThe process of developing, communicating, implementing, monitoring, and assuring the policies, procedures, organizational structures, and practices associated with a given program.REFERENCE. Project Management Institute (2006) The Standard for Program Management.

Program managerThe person responsible for creating the organizational environment culture by providing clear direction and circumstances that allow people to be successful. The program manager is judged on the elements time, cost, and scope, cumulatively for all the projects and operations within the program. Program management decisions are both tactical and strategic in nature. The strategy … Continue reading Program manager

Project1. A funded research activity with defined start and end dates. 2. A product or service that can solve a problem or address a need in the organization. The Project Charter describes the business case or project need; describes the proposed solution and/or the product description; identifies the clients of the project and why they … Continue reading Project

Project lifecycleDescribes the processes and tasks that must be completed to produce a product or service. Different project lifecycles exist for specific products and services. (For example, the lifecycle followed to build a house is very different from the lifecycle followed to develop a software package.)REFERENCE. Cornell Project Management Methodology

Project management lifecycleDefines how to manage a project. It will always be the same, regardless of the project lifecycle being employed.REFERENCE. Cornell Project Management Methodology

Project managerThe person who is tasked with delivering a project within the boundaries and framework established by the program manager. The project manager is and should be delivery and execution focused and is judged on the elements of time, cost, and scope of the project. The person responsible for ensuring that the Project Team completes the … Continue reading Project manager

Project quality controlThe Principal Investigator and the project team work together to inspect the accomplished work to ensure its alignment with the project scope, data fitness for use, and data end-user needs.n/a

Project team memberResponsible for executing tasks and producing deliverables as outlined in the Project Plan and directed by the Project Manager, at whatever level of effort or participation has been defined for them.REFERENCE. Cornell Project Management Methodology

Proportionate governanceKeeping the procedural mechanisms that researchers and data custodians must follow when engaged in data sharing and linkage proportional to the degree of risks associated with such practices. Proportionate governance operates in situations that are too variable to be regulated by hard laws (e.g., custom data access requests). It requires that analytical judgments be performed … Continue reading Proportionate governance

ProprietaryRefers to information (or other property) that is owned by an individual or organization and for which the use is restricted by that individual or organization.n/a

ProtocolsThe special set of rules that regulate how components within a system are interacting. Protocols are crucial parts of interface specifications. They do not only specify message content, but also procedural aspects.n/a

ProvenanceA type of historical information or metadata about the origin, location or the source of something, or the history of the ownership or location of an object or resource including digital objects. For example, information about the Principal Investigator who recorded the data, and the information concerning its storage, handling, and migration.n/a

Provenance metadataInformation concerning the creation, attribution, or version history of managed data. Provenance metadata that indicates the relationship between two versions of data objects and is generated whenever a new version of a dataset is created. Examples include: (i) the name of the program that generated the new version, (ii) the commit id of the program … Continue reading Provenance metadata


Quality assuranceThe process or set of processes used to measure and assure the quality of a product. SYNONYM. QAn/a

Quality controlThe process of meeting products and services to consumer expectations. SYNONYM. QCn/a


Raw dataData that have not been processed for meaningful use. Although raw data have the potential to become "information," they require selective extraction, organization, and sometimes analysis and formatting for presentation. As a result of processing, raw data sometimes end up in a database, which enables the data to become accessible for further processing and analysis … Continue reading Raw data

Re-useUse of content outside of its original intention.n/a

Real-time dataData that are being received, processed and stored at the time of their occurrence with only small delays. Examples include: stock quotes, manufacturing statistics, Web server loads, data warehouse activity and sensor feeds to data collectors. Real-time data are often used for navigation or tracking. Real-time data are data streams that are typically generated by … Continue reading Real-time data

RecognitionIn the context of a researcher's activities, recognition is a measure of credibility and stature of a researcher within the scientific community, and with clients and stakeholders, in accordance with the specialty or type of work. Recognition is one of four valued outcomes. RELATED TERM. Valued outcome; Incumbent-based; Research, development and analysis; Managing research; Representation … Continue reading Recognition

Record1. A collection of data items arranged for processing by a program. Multiple records are contained in a file or dataset. Typically, records can be of fixed-length or be of variable length with the length information contained within the record. 2. A record (sometimes called a row) is a group of fields (sometimes called columns) … Continue reading Record

Record provenance informationInformation for a data object that includes: * the person who deposited the data object in the repository, * the source of the data object, * the date when the object was deposited, and * authenticity information needed to link the data object to its original source.n/a

Record standardizationA process in which files are first parsed (assigned to appropriate fields in a record) and then translated to a common format. For example, if an original record had the client's name and address as "Bob Jones, VP Acme. Co., 15 S. Main St, Brooklyn" the standardized record might read "Bob Jones, Vice President, Acme … Continue reading Record standardization

Records retention scheduleA policy that depicts how long data items must be kept, as well as the disposal guidelines for these data items.n/a

Redundancy1. A system design in which a component is duplicated so if it fails there will be a backup. 2. When there is duplication that is unnecessary or that is the result of poor planning.n/a

Referable dataA type of data (digital or not) that is persistently stored and which is referred to by a persistent identifier Digital data may be accessed by the identifier. Some data object references may access a service on the object.n/a

Reference modelA design covering a class of frameworks with the following characteristics: (1) it can be used to generate more specific models that still belong to the class and (2) it can be used to compare a concrete framework design to identify whether it belongs to the same class.n/a

Reference resolutionThe process of resolving a reference to useful information by using a globally available system.REFERENCE. Digital preservation coalition

ReformattingCopying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file reformatting).REFERENCE. Digital preservation coalition

RefreshingCopying information content from one storage media to the same storage media.n/a

Regional standardREFERENCE. American National Standards Institute ANSI ""Standards Management: A Handbook for Profit""

Registered dataData that have gone through a registration process and have been assigned an identifier metadata to aid in their search and retrieval.n/a

RegistryA database containing information about trusted repositories that are provided by the repository managers and are useful for human and machine users. It is a registry information system on which a register is maintained. These registries do not contain information about all metadata descriptions of digital objects, nor do they offer a list of PIDs … Continue reading Registry

Related scientific activitiesActivities that complement and extend R&D by contributing to the generation, dissemination, and application of scientific and technological knowledge. RELATED TERM. Research and Developmentn/a

Relational databaseA collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive … Continue reading Relational database

RelationsIndicates how the different components within a system are "linked" to fulfill the tasks. "Relations" are thus defined by the services they are making use of and by the interface specifications.n/a

ReliabilityThe probability of a given system performing its mission adequately for a specified period of time under the expected operating conditions.n/a

Remote accessThe ability to get access to a computer or a network from a remote distance. Access may be through an Internet service provider (ISP) or through a dedicated line between a computer or a remote local area network and the "central" or main corporate local area network. A dedicated line is more expensive and less … Continue reading Remote access

Remote data accessThe ability to access and download data from a repository. RELATED TERM. Remote accessn/a

Repeatable processA set of actions that allow for a more efficient use of limited resources and reduce unwanted variation during the development and implementation of various projects. Repeatable processes allow a project team to make efficient use of project components that have proved to be successful in the past and reduce unnecessary variations that can tie … Continue reading Repeatable process

Replica numberA type of metadata used as part of a replication process or access.n/a

Replication1. In data management context: The generation of a copy of a data object that is referenced by the same name, but with a different replica number. When changes are made to the data object, the replica can be updated to track the changes. AKA, duplication. As part of replication data may be given a … Continue reading Replication

RepositoryRepositories preserve, manage, and provide access to many types of digital materials in a variety of formats.

Repository accessIn accessing a repository one uses a client (application) to discover relevant digital objects within a repository, and then retrieve a copy of a desired digital object.n/a

RepresentationA resource that conveys either the content of a resource (if it is a digital object instance), or provides a digital object that conveys the intention of the resource in a form useful to a user (machine or human).n/a

Representation and client servicesIn the context of a researcher's activities, Representation is the process of representing and speaking at local, national, and international fora. Client service is the process of interaction for facilitation of the knowledge/information transfer to clients. Areas where the representation and client services context is quite significant are: technology transfer and industrial liaison; scientific liaison … Continue reading Representation and client services

Representation objectProvides some context for a data object. It contains provenance, description (e.g. format, encoding scheme, algorithm, structural, and administrative information about the object. This is a form of metadata.REFERENCE. Peng (2011)

Reproducible researchPublished results can be replicated using the documented data, code, and methods employed by the author or provider without the need for any additional information or needing to communicate with the author or provider. SYNONYM. ReproducibilityREFERENCE. Buckheit and Donohue 1995; Donohue 2010; Peng 2011; Gandrud 2013; George 2015.

Repurposed dataNew datasets obtained by combining data appropriately from a variety of existing files, generating new data products that did not previously exist. Repurposed data result from data wrangling. RELATED TERM. Data wranglingn/a

RequirementsFeatures of a program, system, dataset, or product that are quantifiable, detailed, and relevant to the specified end use. SYNONYM. Featuresn/a

Requirements analysisThe process of determining user expectations for a program, system, dataset, or product. Requirements analysis is a team effort that must take into account hardware, software, end use, and human factors engineering expertise. Requirements analysis also requires skills in dealing with people. Requirements analysis involves frequent communication with end users to determine specific feature expectations, … Continue reading Requirements analysis

Requirements creepA tendency for requirements to increase during development beyond those originally foreseen. Requirements creep may be driven by a deeper understanding of the system as the project progresses leading to a re-evaluation of the requirements analysis. SYNONYM. Feature creep; Scope creepn/a

Requirements stability indexA metric used to organize, control, and track changes to the originally specified requirements for a new system, project or product.n/a

ResearchA systematic investigation to establish facts, including the input data, the code, and the full software environment that produced the research results.REFERENCE. Government of Canada ""Annual Science and Technology Data Publication""

Research and developmentCreative work undertaken on a systematic basis to increase the stock of knowledge, including knowledge of humankind, culture and society, and the use of this stock of knowledge to devise new applications. SYNONYM. R&D. RELATED TERM. Related scientific activities; Research, development and analysisn/a

Research contextThere are three contexts of research work in which a researcher is expected to conduct his/her activities: (1) Research, development and analysis (RDA); (2) Managing research; and, (3) Representation and client services. A researcher's primary area of work is RDA. RELATED TERM. Incumbent-based, Research, development and analysis; Managing research; Representation and client servicesREFERENCE. Government of … Continue reading Research context

Research dataData that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming … Continue reading Research data

Research data formatAcceptable formats for transmitting and sharing different types of research data include: (a) Quantitative tabular data with minimal metadata, i.e. a dataset with or without attribute labels but no other metadata in addition to the data matrix; (b) Quantitative tabular data with extensive metadata, i.e. a dataset with attribute labels, code labels, defined missing values, … Continue reading Research data format

Research data managementData Management refers to the storage, access and preservation of data produced from a given investigation. Data management practices cover the entire lifecycle of the data, from planning the investigation to conducting it, and from backing up data as it is created and used to long term preservation of data deliverables after the research investigation … Continue reading Research data management

Research data management infrastructureThe configuration of staff, services and tools assembled to support data management across the research lifecycle and more specifically to provide comprehensive coverage of the stages making up the data lifecycle. It can be organized locally and/or globally to support research data activities across the research lifecycle.n/a

Research data publication workflowActivities and processes in a digital environment that lead to the publication of research data, associated metadata and accompanying documentation and software code on the Web. In contrast to interim or final published products, workflows are the means to curate, document, and review, and thus ensure and enhance the value of the published product. Workflows … Continue reading Research data publication workflow

Research governanceEnsures that the benefits to society of research outweigh any risks, from both an ethical and legal perspective.n/a

Research managerThe person who manages or coordinates resources, personnel, facilities, and operating funds-allocations in an organization conducting research, development and analysis (RDA) in the natural and physical sciences. A research manager determines the nature, priority objectives and the resources committed to their achievement within and across the organizations, and evaluates program outputs in relation to organizational … Continue reading Research manager

Research metadata formatAcceptable formats for transmitting and sharing research metadata include: ISO 19115-2:2009n/a

Research resultsResearch results are the journal articles, reports, books, slideshows, or websites that announce the projectís findings and try to convince us that the results are correct.n/a

Research scientistA scientist who conducts activities in: (1) Research, development and analysis (RDA); (2) Managing research; and, (3) Representation and client services. SYNONYM. RES. RELATED TERM. Scientist; Research contextn/a

Research, development and analysis

Researcher levelA researcher's level is incumbent-based. It may described numerically (e.g., Level 1, Level 2, ..., Level 5), or with a descriptive title (e.g., Lecturer or Adjunct Professor - Professeur associÈ; Clinical or Research Professor - Professeur clinique; Assistant Professor - Professeur adjoint; Associate Professor - Professeur agrÈgÈ; Full Professor - Professeur titulaire). The levels can … Continue reading Researcher level

Researcher promotion documentationThe documentation submitted by a researcher when applying for promotion to a higher level, or for tenure at a University. Depending on the institution, the documentation may or may not be accompanied by a portfolio of complete, full length research outputs. SYNONYM. Career advancement documentation; Promotion documentation. RELATED TERM. Tenure portfolio; Tenure dossier; Researcher level; … Continue reading Researcher promotion documentation

Resistance managementResistance management, a component of change management, aims to minimize or eliminate resistance to change. People in an organization may resist change for a number of reasons, including: (a) People comfortable in their current situation may be reluctant to risk that security; (b) People may concentrate on perceived negative outcomes of the change; (c) People … Continue reading Resistance management

ResourceA source or supply that can be drawn on to support or fulfill a specific need or to handle a situation. Example: Information is a resource that supports and enables delivery, fulfills inquiry requests, and adds value to other products and services. Information is a strategic resource when it is recognized and managed as a … Continue reading Resource

Resource authorizationThe process of deciding if a subject (person, program, device, group, role, etc.) is allowed to have access to or take an action against a resource. Authorization relies on a trusted identity (authentication) and the ability to test the privileges held by the subject against the policies or rules governing that resource to determine if … Continue reading Resource authorization

ResponsibilitySomething that one is required to do as part of a job, role, professional, or legal obligation.n/a

ResultThe impact or effect of something (e.g., a program).n/a

Retention periodA metadata operation to create state information for a data object that defines the date when retention of the data object should be evaluated. The retention period must have an associated disposition policy for deciding what to do when the retention period expires.n/a

Revision control systemA software implementation of revision control that automates the storing, retrieval, logging, identification, and merging of revisions (e.g., GIT, SVN)n/a

RobustnessThe degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.n/a

RoleA function performed by someone in a particular situation, process or operation.n/a


SMARTA mnemonic acronym (Specific, Measurable, Attainable, Relevant, Trackable/Time-bound) used in project management, teaching, and performance management giving criteria to guide in the setting of objectives.n/a

SWOTA mnemonic acronym (Strengths, Weaknesses, Opportunities and Threats) used in structured planning.n/a

Schema1. The organization or structure for a database. The activity of data modeling leads to a schema. (The plural form is schemata.) The term is used in discussing both relational databases and object-oriented databases. The term sometimes seems to refer to a visualization of a structure and sometimes to a formal text-oriented description. Two common … Continue reading Schema

Science1. The intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment. 2. A systematically organized body of knowledge on a particular subject. RELATED TERM. ScientistREFERENCE. Oxford dictionnary

Science and technology dataQualitative or quantitative attributes of a variable or set of variables. Data refers to representations of physical, biological or chemical facts, typically the results of measurements/observations. It also includes related socio-economic and cultural representations. Data are normally in a structured, tabular, numeric, character, geo-referenced, and/or computer-readable format. SYNONYM. Scientific data; Technological data.REFERENCE. Government of Canada, … Continue reading Science and technology data

Scientific data infrastructureWhat is required to enable researchers to create, store and share the data resulting from their experiments, and to find, access and process the data they need. RELATED TERM. Science and technology data; Scientific data servicesREFERENCE. European Commission, Advancing Technologies and Federating Communities/TC3+

Scientific data servicesAssist organizations in the capture, storage, curation, long-term preservation, discovery, access, retrieval, aggregation, analysis, and/or visualization of scientific data, as well as in the associated legal frameworks, to support disciplinary and multidisciplinary scientific research. RELATED TERM. Scientific data infrastructuren/a

Scientific methodAsk the research question, review the relevant scientific literature, design the study, collect the data, analyze and interpret the data, communicate the results.n/a

Scientific workflowA set of chained operations. The simplest computerized scientific workflows are scripts that can involve several ingredients such as data, programs, models and other inputs such as human or sensor observations. Workflows produce outputs that may include, for example, visualizations and analytical results. Preserved workflows are important for reproducible research. They simplify complex sequences of … Continue reading Scientific workflow

ScientistA person who is studying or has expert knowledge of one or more of the natural or physical sciences. RELATED TERM. ScienceREFERENCE. Oxford dictionnary

Semantic dataData that are tagged with particular metadata that can be used to derive relationships between data.REFERENCE. SOE/ TC3+

Semantic interoperabilityThe ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems. Semantic interoperability is achieved when the information transferred has, in its communicated form, all of the meaning required for the receiving system to interpret … Continue reading Semantic interoperability

Semi-structured dataData that have not been organized into a specialized repository, such as a database, but that nevertheless have associated information, such as metadata, that makes them more amenable to processing than raw data. Semi-structured data lie somewhere between structured and unstructured data. They are not organized in a complex manner that makes sophisticated access and … Continue reading Semi-structured data

Service objectIn the context of reproducible research, a service object is a type of digital object containing executable code, considered as a unit.n/a

ServicesA function that is being executed on request that delivers certain expected results.n/a

Short-term preservationShort-term preservation. Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology.REFERENCE. Digital preservation coalition

Silver bulletA methodology, practice, or prescription that promises miraculous results if followed (e.g., structured programming will rid you of all bugs, as will human sacrifices to the Atlantean god Fugawe. Named either after the Lone Ranger whose silver bullets always brought justice or, alternatively, as the only known antidote to werewolves.n/a

SoftwareA set of instructions that direct a computer to do a specific task (Chun, 2004)

SpecialtyRefers to specialization, discipline, field, etc.REFERENCE. Government of Canada (2006) Model guide for the preparation of researchers' career advancement (promotion) documentation ; Government of Canada (2006) Career progression management framework for federal researchers ; Government of Canada (2006) NRCan Model Guide for the Preparation of Researcherís Career Progression Evaluation Documentation (Dossier)

StakeholderIndividuals, groups or organizations that have an interest or share in an undertaking or relationship and its outcome - they may be affected by it, impact or influence it, and in some way be accountable for it.n/a

StandardA document that applies collectively to codes, specifications, recommended practices, classifications, test methods, and guides, which have been prepared by a standards developing organization or group, and published in accordance with established procedures.n/a

Standard Operating ProcedureDetailed, written instructions to achieve uniformity of the performance of a specific function. SYNONYM. SOPREFERENCE. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use

Standard Operating Procedure for the collection of harmonized or integrated dataWritten methods, instructions, and tools that, when applied in different data collection contexts produce data that are ready to be harmonized or integrated without further manipulation. RELATED TERM. Data harmonization; Data integrationn/a

StandardizationThe process of establishing by common agreement the criteria, terms, principles, practices, materials, items, processes, equipment, parts, sub-assemblies, and assemblies appropriate to achieve the greatest practicable uniformity of products and practices, to ensure the minimum feasible variety of such items and practices, and to effect optimum interchangeability or interoperability of equipment, parts, and components.REFERENCE. American … Continue reading Standardization

Statistical de-identificationThe application of a set of data transformation techniques to de-identify data in such a manner that the resulting transformed fields retain a very high analytic value..REFERENCE. El Emam, K. (2013). Privacy Analytics White Paper: Overview of Re-identification Risk Assessment and Anonymization Process. Ottawa (ON): Privacy Analytics, Inc.

Steering committeeThe group responsible for ensuring program goals are achieved and providing support to address program risks and issues. SYNONYM. Governance Board; Program BoardREFERENCE. Project Management Institute (2006) The Standard for Program Management.

Sticky bitsA user ownership access-right flag that can be assigned to digital objects such as directories. When the sticky bit flag is set, files added to the directory will inherit the access permissions associated with the directory.n/a

Storage locationA physical storage location where a data object will be stored upon ingestion into a data repository. This requires identifying the IP address and the physical path name within the storage location where a data object will be stored. The sequence of these chained activities is conceptualized as a workflow object. For retrieval, the data … Continue reading Storage location

StrategyA high level plan of action or policy designed for a long-range or major aim.n/a

Structural metadataA type of metadata that indicates how compound objects are put together (e.g., how pages are ordered to form chapters; how data are organized in a table; how datasets are organized in a collection) 2. The underlying structural metadata of digital objects that tells computers how to assemble them.n/a

Structured dataData whose elements have been organized into a consistent format and data structure within a defined data model such that the elements can be easily addressed, organized and accessed in various combinations to make better use of the information, such as in a relational database. SYNONYM. Structured informationREFERENCE. Kitchin, R. (2014) The Data Revolution: Big … Continue reading Structured data

Support serviceA type of service that provides technical support and assistance to help solve problems related to technical products, including data access, data discovery, data integration and other data management support. Various support services are provided at different phases of the data lifecycle to help manage data and other things used as part of research.REFERENCE. Research … Continue reading Support service

Syntactic interoperabilitySyntactic interoperability defines the structure or format of data exchange and is achieved through tools such as XML or SQL Standards.REFERENCE. Wikipedia; HIMSS (Healthcare information management and systems society)

SystemA combination of interacting elements organized to achieve one or more stated purposes. The system is the aspect that the scientific researcher will interact with. It must be well defined and directly relevant to the research needs, just as is the case for any other scientific instrument. "Systems" may undergo continuous extensions and system elements … Continue reading System

System metadataDigital entity properties that are generated by the data management system (e.g., creation time; owner; storage location; data retention period; the length of time a digital entity will be retained).n/a


Technical metadataDescribes the technical processes used to produce, or required to use a digital objectREFERENCE. DCC/TC3+

TechniqueA defined systematic procedure employed by a human resource to perform an activity to produce a product or result or deliver a service, and that may employ one or more tools.REFERENCE. Project Management Institute (2006) The Standard for Program Management.

Technology1. The application of scientific knowledge for practical purposes. 2. The branch of knowledge dealing with engineering or applied science.n/a

Temporary versionA copy of a data object such as a file during the course of routine operations.n/a

Text fileA file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a character (e.g., pipe). SYNONYM. Comma separated values; Character separated values; Pipe separated values; TXTn/a

ToolSomething tangible, such as a template or software program, used in performing an activity to produce a product or result.REFERENCE. Project Management Institute (2006) The Standard for Program Management.

Topical metadataDescribes the topic or "aboutness" of an information/data object - what are these data about. In order to make sense to an agent or systems, this may include a variety of vocabularies for describing, subjects, topics, categories, etc.n/a

Total Quality ManagementA comprehensive and structured approach to organizational management that seeks to improve the quality of products and services through ongoing refinements in response to continuous feedback. SYNONYM. TQMn/a

TransdisciplinaryResearch efforts conducted by investigators from different disciplines working jointly to create new conceptual, theoretical, methodological, and translational innovations that integrate and move beyond discipline-specific approaches to address a common problem. Transdisciplinary research transcends interdisciplinary research.n/a

Trusted Digital RepositoryAn infrastructure component that provides reliable, long-term access to managed digital resources. It stores, manages, and curates digital objects and returns their bit streams when a request is issued. Trusted repositories undergo regular assessments according to a set of rules such as defined by Data Seal of Approval (DSA) or TRAC (ISO 16363). It is … Continue reading Trusted Digital Repository


Unified data management platformA centralized computing system for collecting, integrating and managing large sets of structured and unstructured data from disparate sources.n/a

Uniform resource identifierA string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. SYNONYM. URIREFERENCE. MIT data management and publishing

Uniform resource namespaceAn Internet resource with a name that, unlike a URL, has persistent significance - that is, the owner of the URN can expect that someone else (or a program) will always be able to find the resource. A frequent problem in using the Web is that Web content is sometimes moved to a new site … Continue reading Uniform resource namespace

Universal Numeric FingerprintA unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., … Continue reading Universal Numeric Fingerprint

Universally Unique IdentifierA 128-bit number used to guarantee unique identity for different objects on the internet over time. File system partitions. SYNONYM. UUIDn/a

University teachingThe application of a comprehensive knowledge of a discipline or disciplines to the development of expertise and the generation of new knowledge through research, and the planning and presentation of courses of study for undergraduates and graduates in universities. RELATED TERM. Researchern/a

Unstructured dataData that have not been organized into a format and identifiable data structure that makes them easy to access and process. These data can often be searched as long as they are digital, but they are difficult to use for computer analyses. SYNONYM. Unstructured informationREFERENCE. Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, … Continue reading Unstructured data

Usable dataData that can be understood and used without additional information. Usable data are delivered in a form that meets the needs of different end-user audiences, is ready for the tasks that the end-user needs to accomplish, and that has been adapted to the end-user's needs (not the other way around). Usable data have been cleaned, … Continue reading Usable data

Use caseA methodology used in system analysis to identify, clarify, and organize system requirements. The use case is made up of a set of possible sequences of interactions between systems and users in a particular environment and related to a particular goal. It consists of a group of elements (e.g.,, classes and interfaces) that can be … Continue reading Use case

Use metadataManages user access, user tracking, and multi-versioning information.n/a

User acceptance testingA phase of development where the product is tested in the "real world" by the intended audience. The experiences of the early users are forwarded back to the developers who make final changes before releasing the product. SYNONYM. UATn/a


Valued outcomeIn the context of a researcher's activities, there are four types of valued outcomes: Innovation, productivity, impact and recognition. These are the driving forces in a researcherís career progression. Even though there are four types of valued outcomes, they are very much linked. For example, the evidence of a scientific researcherís innovation, impact and recognition … Continue reading Valued outcome

Verify checksumGenerate a unique reduced representation for a data object by applying a procedure and compare the result to the original reduced representation that has been stored as provenance information. Examples include: a checksum, a hash, a digital signature.n/a

Version controlControl over time of data, computer code, software, and documents that allows for the ability to revert to a previous revision, which is critical for data traceability, tracking edits, and correcting mistakes. Version control generates a (changed) copy of a data object that is uniquely labeled with a version number. The intent is to track … Continue reading Version control

ViewA way of portraying information from a database. This can be done by arranging the data items in a specific order, by highlighting certain items, or by showing only certain items. For any database, there are a number of possible views that may be specified. Often thought of as a virtual table, the view doesn't … Continue reading View

Voluntary standardGenerally established by private-sector bodies and made available for use by any person or organization, private or government. The term includes what are commonly referred to as "industry standards" as well as "consensus standards." A voluntary standard may become mandatory as a result of its use, reference, or adoption by a regulatory authority, or when … Continue reading Voluntary standard


Web resource1. Addressable units of information that are addressed through Uniform Resource Identifiers (URIs). 2. The early notion of static addressable documents or files has evolved to a more generic and abstract definition. Every 'thing' or entity that can be identified, named, addressed or handled in any way whatsoever in the web at large or in … Continue reading Web resource