Editorial · CASRAI · AI and ML research outputs

Natural Language Processing (NLP) in Research: A Plain Guide

Guides & Explainers AI and ML research outputs

Natural language processing makes human language machine-processable, from tokenisation and embeddings to transformer models. This guide explains the core building blocks, common tasks such as classification and translation, and what researchers should watch for when using NLP.

ByCASRAI Editorial Board

Published 18 Jun 2026· 4 minute read

Natural language processing (NLP) is the field of artificial intelligence concerned with making human language machine-processable, so computers can read, interpret, generate and respond to text and speech. It combines linguistics, statistics and machine learning to turn unstructured language into structured signals a model can work with. NLP now underpins search engines, translation tools, literature-screening systems and the large language models behind modern research assistants.

From raw text to numbers

Computers operate on numbers, not words, so the first job of any NLP pipeline is to convert language into a numerical form. Two steps dominate this process.

Tokenisation splits text into smaller units called tokens, which may be words, sub-words or characters. Modern systems favour sub-word tokenisation because it handles rare words and morphology gracefully without an unmanageably large vocabulary.

Embeddings then map each token to a dense vector of numbers, positioning words with similar meanings near one another in a high-dimensional space. Because embeddings capture semantic relationships learned from large text corpora, “clinician” and “physician” sit close together while “clinician” and “granite” do not. This numerical representation is what downstream models actually learn from. The reliance on learned representations connects NLP to the wider field, which we introduce in what is machine learning.

Transformers: the architecture that changed NLP

The transformer, introduced in 2017, is the architecture behind most current NLP systems. Its key innovation is the attention mechanism, which lets the model weigh the relevance of every word to every other word in a sequence, regardless of distance. This captures long-range context that earlier sequential models struggled with, and it parallelises well, enabling training on vast corpora. Large language models are transformers scaled to billions of parameters and trained on enormous text collections.

Common NLP tasks

NLP is best understood through the tasks it performs. The table below lists those most relevant to research.

Task	What it does	Research example
Text classification	Assigns a category to a document	Screening abstracts for a systematic review
Named entity recognition	Identifies entities such as genes, drugs or places	Extracting chemical names from papers
Machine translation	Converts text between languages	Reading non-English literature
Summarisation	Condenses long text into key points	Digesting large document collections
Question answering	Returns answers from a body of text	Querying a corpus of protocols

How researchers use NLP

Across disciplines, NLP accelerates work that would be impractical by hand. Systematic reviewers use classification to triage thousands of abstracts. Biomedical teams use named entity recognition to mine entities from the literature at scale. Social scientists apply topic modelling and sentiment analysis to large text archives. Curators and metadata specialists increasingly use NLP to normalise terminology against controlled vocabularies such as the CASRAI dictionary, improving the consistency of research records.

Caveats and reproducibility concerns

NLP systems inherit the limitations of their training data. Models can encode and amplify bias present in source corpora; they can produce fluent but factually wrong output, often called hallucination; and their behaviour can shift when an underlying model is updated. For research use, these issues raise real reproducibility questions: a result obtained from one model version may not replicate on the next. Documenting the exact model, version, prompt and preprocessing is therefore essential, a theme we explore in our coverage of reproducibility of machine learning research and our broader AI and ML research outputs hub. Treating NLP as a tool whose outputs require human verification, not an oracle, keeps it trustworthy.

Frequently asked questions

What is the difference between NLP and machine learning?

Machine learning is the general study of systems that learn patterns from data. NLP is the application of those techniques, among others, specifically to human language. Most modern NLP is built on machine learning, but they are not the same thing.

What are embeddings in simple terms?

Embeddings are lists of numbers that represent the meaning of a word or piece of text, arranged so that similar meanings have similar numbers. They let a model treat “begin” and “start” as related while keeping unrelated words apart.

Why are transformers so important in NLP?

Transformers use an attention mechanism to weigh the relevance of all words in a sequence at once, capturing long-range context and training efficiently at scale. They are the foundation of nearly all current large language models.

Can I trust NLP output in research?

Only with verification. NLP models can be biased, can fabricate plausible-sounding content, and can change between versions. Record the model, version and settings, and check outputs against authoritative sources, as set out in our guidance for authors.

Related editorial in this domain

More on AI and ML research outputs

20 Jun 2026

Quantum Computing: Principles and Research Implications

Quantum computing uses qubits, superposition and entanglement to process information in ways classical computers cannot. This explainer defines the principles, situates the NISQ era, and assesses realistic research implications without overstating present-day capabilities.

19 Jun 2026

Overfitting and Underfitting in Machine Learning Explained

Overfitting happens when a model memorises training data instead of learning general patterns, while underfitting means it is too simple to capture them. This guide explains the bias-variance trade-off, regularisation, cross-validation and the threat to reproducible ML.