Tag: descriptive statistics

  • Mean, Median and Mode: Measures of Central Tendency

    Measures of central tendency are summary statistics that describe the centre, or typical value, of a dataset using a single number. The three most common are the mean, the median and the mode. Each captures the centre in a different way, and choosing the right one depends on the shape of the data and the presence of outliers.

    The mean

    The mean, or arithmetic average, is the sum of all values divided by the number of values. It uses every data point, which makes it efficient, but also sensitive to extreme values. The mean is the natural choice for roughly symmetric data and underlies many statistical methods, including variance and the t-test.

    The median

    The median is the middle value when the data are arranged in order, splitting the dataset into two equal halves. If there is an even number of values, the median is the average of the two central ones. Because it depends only on rank, the median is resistant to outliers and is the preferred measure of centre for skewed distributions such as incomes or house prices.

    The mode

    The mode is the value that occurs most frequently. A dataset can have one mode, several modes or none at all. The mode is the only measure of central tendency that can be used with categorical data, such as the most common blood type or eye colour, where calculating a mean or median would be meaningless.

    When to use each measure

    Measure Best for Sensitive to outliers?
    Mean Symmetric, continuous data Yes
    Median Skewed data or data with outliers No
    Mode Categorical or multimodal data No

    The effect of skew and outliers

    In a perfectly symmetric distribution, such as the normal distribution, the mean, median and mode coincide. When data are skewed, they separate. In a right-skewed distribution, a long tail of high values pulls the mean above the median, while in a left-skewed distribution the mean is dragged below it. The gap between mean and median is therefore a useful, quick indicator of skew. Because the mean is pulled towards extreme values, reporting the median alongside it for skewed data gives a more honest picture of the centre.

    A worked example

    Consider seven salaries, in thousands of pounds: 22, 24, 25, 26, 28, 30 and 95. The mean is the sum, 250, divided by 7, which is about 35.7. The median is the fourth value, 26, since the data are already in order. There is no repeated value, so there is no mode. The single high salary of 95 inflates the mean to nearly 36, well above what most people earn in this group, whereas the median of 26 represents the typical salary far better. This illustrates why the median is usually reported for income data. Choosing and stating the appropriate measure supports reproducible reporting, in line with the CASRAI dictionary and our guidance for authors.

    Frequently asked questions

    Which measure of central tendency is best?

    There is no single best measure. The mean suits symmetric data, the median suits skewed data or data with outliers, and the mode suits categorical data. The right choice depends on the distribution and the question.

    Why does the mean differ from the median in skewed data?

    The mean is influenced by every value, including extremes in the tail, so it is pulled in the direction of the skew. The median depends only on the middle rank and so stays closer to the bulk of the data.

    Can a dataset have more than one mode?

    Yes. A dataset with two equally common peaks is bimodal, and one with several is multimodal. This can signal that the data come from distinct subgroups worth investigating separately.

  • What Is Statistics? The Discipline and Its Role in Research

    Statistics is the discipline concerned with collecting, organising, analysing, interpreting and presenting data. At its core it is the science of reasoning under uncertainty: it provides methods for drawing conclusions about a whole population from a limited sample, and for quantifying how much confidence those conclusions deserve. Statistics underpins quantitative research across every field, from medicine and economics to ecology and the social sciences.

    Descriptive versus inferential statistics

    The discipline divides into two broad branches. Descriptive statistics summarise and describe the features of a dataset without claiming anything beyond it. Measures of central tendency such as the mean, median and mode, measures of spread such as the range and standard deviation, and visual summaries such as histograms all belong here. Descriptive statistics tell you what the data at hand look like.

    Inferential statistics go further: they use a sample to make estimates or test claims about a larger population that has not been fully observed. Estimation, hypothesis testing, confidence intervals and regression modelling are all inferential tools. The defining feature of inference is that it carries uncertainty, and statistics provides the machinery to measure that uncertainty rather than ignore it.

    Branch Purpose Typical tools
    Descriptive Summarise observed data Mean, median, standard deviation, charts
    Inferential Draw conclusions about a population Confidence intervals, hypothesis tests, regression

    Populations and samples

    The distinction between a population and a sample is fundamental. A population is the entire set of units a researcher wishes to understand: all adults in a country, every transaction in a year, all stars in a galaxy. A sample is a subset of that population actually measured. Because studying an entire population is usually impractical, researchers work from samples and infer to the whole. A numerical fact about a population is a parameter; the corresponding figure calculated from a sample is a statistic, and statistics as a discipline is largely the study of how well sample statistics estimate population parameters.

    Estimation and hypothesis testing

    Two complementary tasks dominate inferential work. Estimation asks how large a quantity is and how precisely we know it, producing point estimates and interval estimates such as confidence intervals. Hypothesis testing asks whether the data are compatible with a specific claim, typically a null hypothesis of no effect, and summarises that compatibility with measures such as p-values. Both rest on the idea that random sampling produces variation, and that this variation can be modelled probabilistically.

    Variability and probability

    Underlying all of statistics is the recognition that data vary. Two samples from the same population will rarely give identical results, and statistics describes this sampling variation using probability. Measures such as the standard deviation quantify spread within data, while probability distributions describe how estimates would behave across repeated sampling. This probabilistic foundation is what allows statisticians to attach honest measures of uncertainty to their conclusions.

    Why statistics is central to research

    Statistics is not an optional add-on to research; it shapes how studies are designed, how large samples need to be, how data are analysed and how findings are reported. Sound statistical practice is essential for reproducibility, because it disciplines researchers against over-interpreting noise and helps others judge whether a result is robust. Poor statistical practice, by contrast, is a recognised driver of irreproducible findings. CASRAI’s work on standardised reporting and the CASRAI dictionary supports clearer, more comparable statistical reporting across the scholarly record, and the reproducibility category tracks developments in this area.

    Frequently asked questions

    Is statistics a branch of mathematics?

    Statistics uses mathematics, particularly probability theory, but it is usually regarded as a distinct discipline. Its focus is on data, inference and the practical business of learning from observation under uncertainty, not on abstract mathematical structure alone.

    What is the difference between a parameter and a statistic?

    A parameter is a fixed numerical characteristic of a population, such as the population mean. A statistic is the corresponding figure computed from a sample, such as the sample mean. Statistics as a discipline studies how to estimate parameters from statistics.

    Why does statistics matter for reproducibility?

    Reproducibility depends on whether a reported result reflects a genuine effect or random variation. Statistical methods quantify that uncertainty and guard against over-claiming, so transparent statistical reporting is one foundation of a trustworthy scholarly record. See the CASRAI author guidance for reporting practices.