Skip to main content
v2026.1714 entries · CC-BY 4.0
Dictionary termTrack BStablev2026.2

Data lake

A storage repository that holds large volumes of structured, semi-structured, and unstructured data in their native formats, deferring schema-on-write requirements so that data can be ingested cheaply and only structured at the time of read or analysis.

ByCASRAI Editorial Board
· Last updated 21 May 2026

Examples

Worked examples

  • Is an instance

    An institutional research-IT data lake holding raw genomics FASTQ files, microscopy images, and instrument logs.

  • Is an instance

    An astronomical observatory's S3-based data lake ingesting raw telescope outputs prior to pipeline processing.

Counter-examples

Looks similar, but isn't

  • Not an instance

    A small structured database is not a data lake.

  • Not an instance

    A curated, schema-on-write data warehouse is the contrasting pattern, not a data lake.

Editorial commentary

The data-lake pattern emerged in the early 2010s (notably promoted by James Dixon at Pentaho) as a counter-position to traditional data warehousing. Modern data lakes are typically built on object storage (S3, Azure Blob, GCS), with optional table layers (Apache Iceberg, Delta Lake, Hudi) and compute engines (Spark, Trino). In research-information contexts, data lakes are used for raw observational data (telemetry, instrument exhaust, log files) and for aggregating large heterogeneous corpora before downstream curation.

References

  • D
  • i
  • x
  • o
  • n
  • J
  • .
  • ,
  • P
  • e
  • n
  • t
  • a
  • h
  • o
  • ,
  • H
  • a
  • d
  • o
  • o
  • p
  • ,
  • a
  • n
  • d
  • D
  • a
  • t
  • a
  • L
  • a
  • k
  • e
  • s
  • (
  • P
  • e
  • n
  • t
  • a
  • h
  • o
  • b
  • l
  • o
  • g
  • ,
  • 2
  • 0
  • 1
  • 0
  • )
  • .
  • R
  • u
  • s
  • s
  • o
  • m
  • P
  • .
  • ,
  • D
  • a
  • t
  • a
  • L
  • a
  • k
  • e
  • s
  • :
  • P
  • u
  • r
  • p
  • o
  • s
  • e
  • s
  • ,
  • P
  • r
  • a
  • c
  • t
  • i
  • c
  • e
  • s
  • ,
  • P
  • a
  • t
  • t
  • e
  • r
  • n
  • s
  • ,
  • a
  • n
  • d
  • P
  • l
  • a
  • t
  • f
  • o
  • r
  • m
  • s
  • (
  • T
  • D
  • W
  • I
  • B
  • e
  • s
  • t
  • P
  • r
  • a
  • c
  • t
  • i
  • c
  • e
  • s
  • R
  • e
  • p
  • o
  • r
  • t
  • ,
  • 2
  • 0
  • 1
  • 7
  • )
  • .

Machine-readable encodings

Use in your systems

JATS XML <role> element
xml
<role vocab="credit"
      vocab-identifier="https://casrai.org/dictionary/"
      vocab-term="Data lake"
      vocab-term-identifier="https://casrai.org/dictionary/term/data-lake" />
Schema.org DefinedTerm (JSON-LD)
json
{
  "@context": "https://schema.org",
  "@type": "DefinedTerm",
  "name": "Data lake",
  "identifier": "https://casrai.org/dictionary/term/data-lake",
  "description": "A storage repository that holds large volumes of structured, semi-structured, and unstructured data in their native formats, deferring schema-on-write requirements so that data can be ingested cheaply and only structured at the time of read or analysis.",
  "inDefinedTermSet": "https://casrai.org/dictionary/domain/research-data-infrastructure/",
  "url": "https://casrai.org/dictionary/term/data-lake",
  "sameAs": [],
  "license": "https://creativecommons.org/licenses/by/4.0/"
}

Adopted by research universities worldwide

University of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logoUniversity of Cambridge logoColumbia University logoUniversity of Edinburgh logoHarvard University logoMassachusetts Institute of Technology logoUniversity of Oxford logoPrinceton University logoStanford School of Medicine logoUniversity College London logo
  • University of Cambridge logo
  • Columbia University logo
  • University of Edinburgh logo
  • Harvard University logo
  • Massachusetts Institute of Technology logo
  • University of Oxford logo
  • Princeton University logo
  • Stanford School of Medicine logo
  • University College London logo

View CASRAI adoption →