Measures of central tendency are summary statistics that describe the centre, or typical value, of a dataset using a single number. The three most common are the mean, the median and the mode. Each captures the centre in a different way, and choosing the right one depends on the shape of the data and the presence of outliers.
The mean
The mean, or arithmetic average, is the sum of all values divided by the number of values. It uses every data point, which makes it efficient, but also sensitive to extreme values. The mean is the natural choice for roughly symmetric data and underlies many statistical methods, including variance and the t-test.
The median
The median is the middle value when the data are arranged in order, splitting the dataset into two equal halves. If there is an even number of values, the median is the average of the two central ones. Because it depends only on rank, the median is resistant to outliers and is the preferred measure of centre for skewed distributions such as incomes or house prices.
The mode
The mode is the value that occurs most frequently. A dataset can have one mode, several modes or none at all. The mode is the only measure of central tendency that can be used with categorical data, such as the most common blood type or eye colour, where calculating a mean or median would be meaningless.
When to use each measure
| Measure | Best for | Sensitive to outliers? |
|---|---|---|
| Mean | Symmetric, continuous data | Yes |
| Median | Skewed data or data with outliers | No |
| Mode | Categorical or multimodal data | No |
The effect of skew and outliers
In a perfectly symmetric distribution, such as the normal distribution, the mean, median and mode coincide. When data are skewed, they separate. In a right-skewed distribution, a long tail of high values pulls the mean above the median, while in a left-skewed distribution the mean is dragged below it. The gap between mean and median is therefore a useful, quick indicator of skew. Because the mean is pulled towards extreme values, reporting the median alongside it for skewed data gives a more honest picture of the centre.
A worked example
Consider seven salaries, in thousands of pounds: 22, 24, 25, 26, 28, 30 and 95. The mean is the sum, 250, divided by 7, which is about 35.7. The median is the fourth value, 26, since the data are already in order. There is no repeated value, so there is no mode. The single high salary of 95 inflates the mean to nearly 36, well above what most people earn in this group, whereas the median of 26 represents the typical salary far better. This illustrates why the median is usually reported for income data. Choosing and stating the appropriate measure supports reproducible reporting, in line with the CASRAI dictionary and our guidance for authors.
Frequently asked questions
Which measure of central tendency is best?
There is no single best measure. The mean suits symmetric data, the median suits skewed data or data with outliers, and the mode suits categorical data. The right choice depends on the distribution and the question.
Why does the mean differ from the median in skewed data?
The mean is influenced by every value, including extremes in the tail, so it is pulled in the direction of the skew. The median depends only on the middle rank and so stays closer to the bulk of the data.
Can a dataset have more than one mode?
Yes. A dataset with two equally common peaks is bimodal, and one with several is multimodal. This can signal that the data come from distinct subgroups worth investigating separately.
Leave a Reply