The pharmaceutical research and development pipeline is the structured, multi-stage process through which a candidate medicine progresses from initial discovery to an approved, monitored product. It moves through discovery, preclinical evaluation, phased clinical trials, regulatory review and post-market surveillance, with rigorous standards and accumulating data governing the decision to advance or halt at every step.
The pipeline is best understood not as a guaranteed route but as a sequence of evidentiary gates. Most candidates that enter discovery never reach patients, and attrition is a designed feature rather than a failure: each stage is intended to identify safety or efficacy problems before more participants and resources are committed.
Discovery and target identification
Discovery begins with understanding the biology of a disease and identifying a molecular target — a protein, receptor or pathway whose modulation might produce a therapeutic effect. Researchers then screen large libraries of compounds to find “hits”, refine them into “leads” through medicinal chemistry, and characterise how they behave. This stage is heavily data-driven, relying on reproducible assays and well-documented methods. Clear, standardised reporting of these early findings — the kind of metadata discipline catalogued in the CASRAI dictionary — makes downstream reuse and verification possible.
Preclinical evaluation
Before any human exposure, candidates undergo preclinical testing in laboratory and animal models to assess pharmacology, toxicology and how the body absorbs, distributes, metabolises and excretes the compound. The aim is to establish a plausible safety margin and a rationale for a starting human dose. Good Laboratory Practice frameworks govern how these studies are conducted and recorded, and the resulting data package supports the application a sponsor must file before clinical testing may begin.
Clinical phases
Human testing proceeds through sequential phases, each answering a different question. The structure and oversight of these phases are explored in detail in our guide to clinical trial phases I to IV.
| Phase | Primary question | Typical focus |
|---|---|---|
| Phase I | Is it safe in humans? | Safety, tolerability, dose range, pharmacokinetics |
| Phase II | Does it work, and at what dose? | Preliminary efficacy, dose-finding, further safety |
| Phase III | Is it effective and safe at scale? | Confirmatory efficacy versus a comparator, broader safety |
| Phase IV | How does it perform in routine use? | Post-approval surveillance, rare effects, long-term outcomes |
Confirmatory phases typically rely on the randomised controlled trial design, which provides the most robust basis for causal claims about benefit and harm. Comparator arms frequently use a placebo, whose role and ethics are discussed in our piece on the placebo effect.
Regulatory review
Once clinical data are assembled, the sponsor submits a marketing authorisation application to a regulator, which assesses quality, safety and efficacy. Reviewers scrutinise the trial designs, statistical analyses and manufacturing controls. Approval is conditional on the totality of evidence supporting a favourable benefit–risk balance for a defined indication and population — not on any single trial in isolation. Regulators may also attach post-approval commitments, such as further studies or restricted use, where uncertainty remains. Because the review weighs an entire evidence package, the credibility of each underlying study — its design, its pre-specified outcomes and its transparent reporting — directly shapes the decision. Weak or selectively reported evidence at any earlier stage can undermine an otherwise promising candidate at this gate.
Post-market surveillance
Approval is not the end of the pipeline. Pharmacovigilance systems continuously monitor real-world safety once a medicine reaches large, diverse populations, capturing rare or delayed effects that controlled trials cannot detect. Findings can lead to label changes, restrictions or, occasionally, withdrawal. This continuous-evidence model reflects the wider research lifecycle, in which knowledge is provisional and updated as data accumulate. Phase IV studies, spontaneous adverse-event reporting and large observational databases all feed this stage, and the same standards of structured data and transparent methods that governed the clinical phases continue to determine how reliably real-world signals can be interpreted and acted upon.
Why the pipeline takes so long and stays uncertain
The pipeline is long and uncertain because biology is difficult to predict and because each stage deliberately raises the evidential bar. A candidate that looks promising in a laboratory model may behave differently in human physiology; one that is safe at a low dose may show toxicity at a therapeutic one; and an effect seen in a small early study may evaporate in a large confirmatory trial. Rather than treat these surprises as setbacks, the staged design exists precisely to surface them while exposure is still limited.
Uncertainty also compounds across stages. Because so few discovery candidates survive to preclinical work, and only a fraction of those entering human testing reach approval, the pipeline is best modelled as a funnel of conditional probabilities. This is why sponsors run programmes as portfolios rather than single bets, and why transparent reporting of failures — not only successes — is so valuable to the wider field. We avoid quoting specific cost or duration figures here precisely because they vary enormously by therapeutic area and are frequently misreported; the structural point stands regardless of the numbers.
Data and standards at each stage
Each stage produces a distinct evidence package, and the value of that package depends on how well it is structured and documented. Discovery generates assay data and compound characterisation; preclinical work produces toxicology and pharmacokinetic datasets; clinical phases yield protocol-bound outcome data; and post-market surveillance accumulates real-world safety signals. When these are recorded with consistent terminology, persistent identifiers and version-controlled protocols, evidence can be audited, pooled across studies and reused — strengthening regulatory decisions and reproducibility alike.
| Stage | Key data produced | Standards focus |
|---|---|---|
| Discovery | Screening hits, assay and structure data | Reproducible methods, metadata |
| Preclinical | Toxicology, pharmacokinetics | Good Laboratory Practice records |
| Clinical | Protocol-bound outcome data | Preregistration, trial governance |
| Post-market | Real-world safety signals | Pharmacovigilance reporting |
The role of standards and data discipline
At every stage, structured data, consistent terminology and transparent methods determine whether results can be trusted and reused. Persistent identifiers, version-controlled protocols and clear documentation of contributions allow regulators, replicators and downstream researchers to interpret findings correctly. The discipline of specifying analyses in advance — explored in our guide to preregistration and Registered Reports — is increasingly applied to clinical work to keep confirmatory claims honest. Guidance for documenting one’s own contributions to such work is set out in our resources for authors.
Frequently asked questions
Why do so few candidates reach approval?
Attrition is intentional. Each gate is designed to stop unsafe or ineffective candidates early, before larger populations are exposed. The high failure rate reflects the difficulty of predicting human biology from early models, not a flaw in the process.
What distinguishes preclinical from clinical work?
Preclinical work occurs in laboratory and animal models to establish a plausible safety case, whereas clinical work involves human participants under regulatory oversight and ethical review.
Does approval mean a medicine is fully understood?
No. Approval reflects a favourable benefit–risk judgement on the evidence available at the time. Post-market surveillance continues to refine that picture, sometimes for many years.
How do standards improve the pipeline?
Consistent terminology, structured metadata and transparent protocols make data verifiable, reusable and comparable across studies, strengthening regulatory decisions and reproducibility throughout the lifecycle.