Explainer · Plain-language

What Is Content Analysis? Qualitative & Quantitative Uses

Content analysis is a research method for systematically examining and categorising the content of texts, images, audio, video or other communications. It can be used quantitatively — counting frequencies — or qualitatively — interpreting themes and meanings.

CASRAI plain-language explainers — clear answers to recurring research-administration questions

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.

Krippendorff’s definition and the core procedure

Klaus Krippendorff’s Content Analysis: An Introduction to Its Methodology (2004, 4th ed. 2018) is the canonical methodological reference. Krippendorff defines content analysis as "a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use." The core procedure involves: (1) defining the unit of analysis (word, sentence, theme, image); (2) developing a coding scheme with clear, mutually exclusive, and exhaustive categories; (3) training coders on the scheme; (4) having coders independently code a subsample; (5) calculating inter-rater reliability; and (6) coding the full corpus. The cyclical, iterative nature of refining the coding scheme before full coding is central to the method.

Quantitative (manifest) vs qualitative (latent) content analysis

Quantitative content analysis — sometimes called manifest content analysis — counts the frequency of pre-defined codes: words, phrases, topics, or categories. Results can be expressed numerically and allow statistical comparison across texts or time periods. This approach is associated with Berelson’s early work on wartime propaganda. Qualitative content analysis — analysing latent content, i.e. underlying meanings and themes rather than surface frequencies — is closer to thematic analysis and involves interpretation. Hsieh and Shannon (2005) identify three approaches to qualitative content analysis: conventional (categories emerge from the data), directed (starts with a theoretical framework), and summative (counts keywords first, then explores context).

Inter-rater reliability

Because content analysis often involves multiple coders applying a scheme to the same material, inter-rater reliability (IRR) is a critical quality indicator. The most commonly reported statistics are Cohen’s kappa (κ), which corrects for chance agreement between two coders, and Krippendorff’s alpha (α), which handles multiple coders, ordinal data, and missing values, making it more versatile. Values of κ or α ≥ 0.80 are conventionally considered acceptable for publication; values of 0.67–0.79 allow tentative conclusions. Disagreements between coders should be resolved through discussion and, where necessary, revision of the coding scheme.

Computer-assisted and corpus approaches

Large-scale content analysis increasingly uses software. NVivo, ATLAS.ti, and MAXQDA support the organisation and coding of qualitative data, but do not automate analysis — a human still applies the coding scheme. AntConc is a freely available corpus-analysis tool that calculates word frequencies, concordances, and collocations. More recently, topic modelling (Latent Dirichlet Allocation) and transformer-based models allow fully automated classification at very large scale, though they require validation against human coding. Content analysis is increasingly distinguished from discourse analysis: where content analysis counts and categorises, discourse analysis interrogates how language constructs reality.

Key facts

At a glance

Definition: Systematic categorisation of text/media content to draw valid inferences
Key text: Krippendorff (2004/2018) — canonical methodological reference
Approaches: Quantitative (manifest, frequency) vs qualitative (latent, interpretive)
Reliability: Cohen's kappa (two coders) or Krippendorff's alpha (multiple coders)
Threshold: κ or α ≥ 0.80 conventionally acceptable; 0.67–0.79 tentative
Software: NVivo, ATLAS.ti, MAXQDA (qualitative); AntConc (corpus analysis)
Distinguished from: Discourse analysis (DA interprets how language constructs reality)

Common misconceptions

What people often get wrong

Often heard: Content analysis is only about counting words.

Actually: No — qualitative content analysis interprets latent meaning and themes rather than counting surface frequencies. Both manifest (quantitative) and latent (qualitative) forms are well-established and distinct approaches.

Often heard: Content analysis software automatically generates the findings.

Actually: No — NVivo, ATLAS.ti, and MAXQDA are organisational tools; the analyst still applies the coding scheme. Automated approaches (topic modelling, AI classifiers) require validation against human coding before results can be reported.

Often heard: Content analysis and discourse analysis are the same thing.

Actually: No — content analysis categorises and quantifies textual features; discourse analysis examines how language constructs meanings, identities and power relations. They can be complementary but are methodologically distinct.

Going deeper