Skip to main content
v2026.1714 entries · CC-BY 4.0
CASRAI
Dictionary termTrack BStablev2026.2

Data lake

A storage repository that holds large volumes of structured, semi-structured, and unstructured data in their native formats, deferring schema-on-write requirements so that data can be ingested cheaply and only structured at the time of read or analysis.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    An institutional research-IT data lake holding raw genomics FASTQ files, microscopy images, and instrument logs.

  • Is an instance

    An astronomical observatory's S3-based data lake ingesting raw telescope outputs prior to pipeline processing.

Counter-examples

Looks similar, but isn't

  • Not an instance

    A small structured database is not a data lake.

  • Not an instance

    A curated, schema-on-write data warehouse is the contrasting pattern, not a data lake.

Editorial commentary

The data-lake pattern emerged in the early 2010s (notably promoted by James Dixon at Pentaho) as a counter-position to traditional data warehousing. Modern data lakes are typically built on object storage (S3, Azure Blob, GCS), with optional table layers (Apache Iceberg, Delta Lake, Hudi) and compute engines (Spark, Trino). In research-information contexts, data lakes are used for raw observational data (telemetry, instrument exhaust, log files) and for aggregating large heterogeneous corpora before downstream curation.

References

  • D
  • i
  • x
  • o
  • n
  • J
  • .
  • ,
  • P
  • e
  • n
  • t
  • a
  • h
  • o
  • ,
  • H
  • a
  • d
  • o
  • o
  • p
  • ,
  • a
  • n
  • d
  • D
  • a
  • t
  • a
  • L
  • a
  • k
  • e
  • s
  • (
  • P
  • e
  • n
  • t
  • a
  • h
  • o
  • b
  • l
  • o
  • g
  • ,
  • 2
  • 0
  • 1
  • 0
  • )
  • .
  • R
  • u
  • s
  • s
  • o
  • m
  • P
  • .
  • ,
  • D
  • a
  • t
  • a
  • L
  • a
  • k
  • e
  • s
  • :
  • P
  • u
  • r
  • p
  • o
  • s
  • e
  • s
  • ,
  • P
  • r
  • a
  • c
  • t
  • i
  • c
  • e
  • s
  • ,
  • P
  • a
  • t
  • t
  • e
  • r
  • n
  • s
  • ,
  • a
  • n
  • d
  • P
  • l
  • a
  • t
  • f
  • o
  • r
  • m
  • s
  • (
  • T
  • D
  • W
  • I
  • B
  • e
  • s
  • t
  • P
  • r
  • a
  • c
  • t
  • i
  • c
  • e
  • s
  • R
  • e
  • p
  • o
  • r
  • t
  • ,
  • 2
  • 0
  • 1
  • 7
  • )
  • .

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Data lake"
      vocab-term-identifier="https://casrai.org/dictionary/term/data-lake" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Data lake",
  "identifier": "https://casrai.org/dictionary/term/data-lake",
  "description": "A storage repository that holds large volumes of structured, semi-structured, and unstructured data in their native formats, deferring schema-on-write requirements so that data can be ingested cheaply and only structured at the time of read or analysis.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/research-data-infrastructure/",
  "url": "https://casrai.org/dictionary/term/data-lake",
  "sameAs": [],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}
LAC

Partner Deal

LAC Health Supplies Mobile App

Referenced across the research world

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoORCID logoCrossref logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo
  • ORCID logo
  • Crossref logo

View CASRAI adoption →