Life sciences · Reference
What is a genome?
A genome is the complete set of genetic material in an organism — all of its DNA, including every gene and the non-coding sequences in between — that carries the full instructions needed to build and maintain that organism.
What the genome contains
The genome is the complete library of an organism’s DNA. It includes protein-coding genes, the regulatory sequences that control them, genes for functional RNAs, and large amounts of non-coding DNA. In organisms with nucleated cells, most of the genome resides in the nucleus on chromosomes, with a small separate genome in the mitochondria (and, in plants, the chloroplasts). The size of a genome varies enormously across species and does not map simply onto an organism’s complexity.
The Human Genome Project
The Human Genome Project was an international research effort, running from 1990, that aimed to determine the sequence of the human genome. A working draft was announced in 2000, and the project was declared essentially complete in 2003.
It produced a reference sequence that transformed biology and medicine-related research, enabling the study of genetic variation, disease-associated genes, and human evolution. A truly complete, gap-free human genome sequence was later reported by the Telomere-to-Telomere (T2T) consortium in 2022.
Genomes and research
Comparing genomes within and between species reveals how organisms are related, how genes function, and how variation arises. Genomics — the study of whole genomes — relies on sequencing technologies and on bioinformatics to assemble, annotate, and interpret the data. Reference genomes serve as shared standards against which new sequences are compared, making consistent identifiers and metadata essential for reproducible work.
Standards and data sharing
Because genomic datasets are large and widely reused, the community curates them in public repositories such as the European Nucleotide Archive, GenBank, and Ensembl, under agreed formats and metadata standards. Aligning genome data with the FAIR principles — Findable, Accessible, Interoperable, Reusable — helps ensure that sequences generated by one group can be reliably used by others.
Key facts
At a glance
- Definition: the complete set of an organism’s genetic material
- Human genome size: ~3.2 billion base pairs
- Human protein-coding genes: ~20,000
- Human Genome Project: 1990–2003 (essentially complete 2003)
- Complete gap-free human sequence: T2T consortium, 2022
- Includes: genes plus non-coding and regulatory DNA
Common questions
FAQ
What is the difference between a gene and a genome?+
A gene is a single segment of DNA that codes for a functional product. A genome is the entire collection of an organism’s DNA — all of its genes together with the non-coding sequences between them.
How big is the human genome?+
The human genome contains roughly 3.2 billion base pairs of DNA and around 20,000 protein-coding genes, arranged across 23 pairs of chromosomes plus a small mitochondrial genome.
The step most authors miss
Doing CRediT right? Don’t stop at the statement.
A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.
Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.







