Editorial · CASRAI · AI and ML research outputs

Neural Networks and Deep Learning Explained

A neural network is a machine-learning model of interconnected units that transform inputs through weighted layers. This explainer covers neurons, weights and activations, training by backpropagation and gradient descent, and what makes a network “deep”.

ByCASRAI Editorial Board

Published 19 Jun 2026· 4 minute read

An artificial neural network is a machine-learning model composed of many simple interconnected units, loosely inspired by biological neurons, that transform input data through successive layers of weighted connections. Deep learning is the use of neural networks with many such layers to learn rich, hierarchical representations directly from data. Together they underpin most of the recent advances in artificial intelligence, from image recognition to the large language models behind generative systems.

Neurons, weights and activations

The basic unit, often called a neuron or node, computes a weighted sum of its inputs, adds a bias term, and passes the result through a non-linear activation function. The weights are the model’s learnable parameters; they determine how strongly each input influences the unit’s output. The activation function, such as the rectified linear unit (ReLU) or the sigmoid, introduces non-linearity, which is essential: without it, stacking layers would collapse into a single linear transformation incapable of modelling complex patterns.

Neurons are organised into layers: an input layer that receives the data, one or more hidden layers that transform it, and an output layer that produces the prediction. Information flows forward through these layers in a process called the forward pass. This architecture is one realisation of the machine-learning ideas described in our explainer on machine learning concepts and methods.

What “deep” means

The word deep refers simply to the number of layers. A network with many hidden layers is “deep”, and depth allows the model to build representations in stages: early layers may detect simple features such as edges in an image, while later layers combine these into increasingly abstract concepts such as shapes and objects. This automatic, layered feature learning is what distinguishes deep learning from earlier methods that relied on hand-engineered features. The historical shift to deep networks is traced in our overview of artificial intelligence definition and history.

Component	Role
Neuron (node)	Computes a weighted sum plus bias, then an activation
Weight	Learnable parameter scaling each input
Activation function	Adds non-linearity (e.g. ReLU, sigmoid)
Layer	A group of neurons; depth is the number of layers
Loss function	Measures error between prediction and target

Training: backpropagation and gradient descent

A neural network learns by adjusting its weights to reduce a loss function that measures how wrong its predictions are. Training proceeds in two coupled steps. First, the forward pass produces predictions and computes the loss. Second, backpropagation uses the chain rule of calculus to compute the gradient of the loss with respect to every weight, efficiently propagating error signals backward from the output layer to the input layer.

These gradients tell an optimiser how to change each weight to reduce the loss. Gradient descent, usually in its stochastic mini-batch form, then nudges the weights a small step in the direction that lowers the loss, controlled by a learning rate. Repeating this over many passes through the data (epochs) gradually improves the model. Because the outcome depends on random initialisation, data ordering and these hyperparameters, careful reporting is essential, as discussed in our guide to reproducibility of machine learning research.

Why documentation matters for neural networks

Because a trained network is defined by millions of learned weights rather than human-readable rules, transparency depends on documentation: what data trained it, how it was evaluated, and what its limits are. Structured artefacts such as model cards, covered in our piece on AI model documentation, address exactly this need, and the controlled terminology in the casrai.org research dictionary helps keep descriptions consistent across the literature.

Frequently asked questions

What makes a neural network “deep”?

Depth refers to the number of layers. A deep network has many hidden layers, which lets it learn features in stages, from simple patterns in early layers to abstract concepts in later ones.

What is backpropagation?

Backpropagation is the algorithm that computes the gradient of the loss with respect to each weight by applying the chain rule backward through the network. These gradients tell the optimiser how to adjust the weights.

What is the role of an activation function?

An activation function adds non-linearity to each neuron. Without it, stacking layers would be equivalent to a single linear transformation, and the network could not model complex relationships.

How does gradient descent train a network?

Gradient descent repeatedly adjusts the weights by a small step in the direction that reduces the loss, using the gradients from backpropagation and a learning rate to control the step size.

Related editorial in this domain

More on AI and ML research outputs

20 Jun 2026

Quantum Computing: Principles and Research Implications

Quantum computing uses qubits, superposition and entanglement to process information in ways classical computers cannot. This explainer defines the principles, situates the NISQ era, and assesses realistic research implications without overstating present-day capabilities.

19 Jun 2026

Overfitting and Underfitting in Machine Learning Explained

Overfitting happens when a model memorises training data instead of learning general patterns, while underfitting means it is too simple to capture them. This guide explains the bias-variance trade-off, regularisation, cross-validation and the threat to reproducible ML.

19 Jun 2026

Supervised vs Unsupervised Learning Explained

Supervised learning trains on labelled data to predict outcomes; unsupervised learning finds structure in unlabelled data. This explainer defines both paradigms, notes reinforcement learning as a third, and compares their tasks, methods and research uses.