Tag: Big Five

  • The MBTI: A Measurement-Science Critique of the Myers-Briggs Type Indicator

    The Myers-Briggs Type Indicator (MBTI) is a self-report personality questionnaire that classifies respondents into one of 16 “types” using four dichotomies. Developed by Katharine Cook Briggs and Isabel Briggs Myers from Carl Jung’s theory of psychological types, it remains popular in workplaces and coaching. From a measurement-science perspective, however, the instrument has well-documented weaknesses in reliability and validity that explain why academic personality psychology rarely uses it.

    The four dichotomies and 16 types

    The MBTI scores respondents on four opposing pairs and combines the results into a four-letter code:

    Dichotomy Poles Question it addresses
    Attitude Extraversion (E) – Introversion (I) Where attention is directed
    Perceiving function Sensing (S) – Intuition (N) How information is taken in
    Judging function Thinking (T) – Feeling (F) How decisions are made
    Orientation Judging (J) – Perceiving (P) Preferred way of engaging the world

    The four binary outcomes multiply to 16 type codes such as INTJ or ESFP. Each is presented as a qualitatively distinct category rather than a position on a scale.

    The dichotomisation problem

    The central measurement objection is that the MBTI treats continuous traits as categories. Empirical trait distributions are typically unimodal and roughly bell-shaped, not bimodal: most people cluster near the middle rather than at one pole. Imposing a cut-point splits a continuum into two boxes and discards information. Someone scoring just over the boundary is grouped with people far more extreme, while two near-identical respondents either side of the line receive different letters. This is why a small shift on retest can flip a whole type.

    Reliability concerns

    Reliability is the consistency of a measure. Test-retest reliability asks whether the same person obtains the same result on a later occasion. Studies have reported that a substantial proportion of respondents receive a different four-letter type when retested weeks later. Because the type is the headline output, even modest instability at each dichotomy compounds across four binary decisions, undermining the categorical claim that people “are” a fixed type.

    Validity concerns

    Validity asks whether an instrument measures what it claims and predicts what it should. The MBTI’s construct validity is questioned because its Thinking–Feeling and Judging–Perceiving axes do not map cleanly onto the trait structure repeatedly recovered in factor-analytic research. Criterion validity is also limited: type codes are weak predictors of job performance, and the instrument was not designed to rank or select candidates. Using it for hiring or promotion is an inappropriate application that conflicts with responsible-assessment principles.

    Why personality psychology prefers the Big Five

    The dominant model in academic personality research is the Big Five, or Five-Factor Model: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. Unlike the MBTI, it is dimensional rather than typological, so each person receives a continuous score on every factor. The five factors emerged from decades of factor analysis across languages and cultures, show stronger reliability and better criterion validity, and avoid the artefacts introduced by dichotomising. The MBTI’s Extraversion–Introversion axis broadly aligns with the Big Five’s Extraversion dimension, but the framework as a whole captures gradation that a 16-box scheme cannot. A further contrast is that the Big Five includes Neuroticism—a well-replicated dimension of emotional stability with substantial predictive value—which the MBTI omits entirely, leaving a meaningful part of personality unmeasured.

    The Jungian foundations and where the model departs

    The MBTI’s intellectual lineage runs back to Carl Jung’s 1921 work on psychological types, which proposed attitudes (introversion and extraversion) and functions (sensing, intuition, thinking, feeling). Briggs and Myers, who were not academic psychologists, formalised these ideas into a scored questionnaire and added the Judging–Perceiving axis to identify which function a person leads with. The difficulty is that Jung’s typology was a clinical and theoretical scheme, never validated as a measurement instrument. Building a forced-choice questionnaire on top of it inherited the typological assumption—that people fall into discrete kinds—without testing whether the data support discreteness. Modern psychometric research generally finds they do not: trait scores vary smoothly, so the categories are imposed rather than discovered.

    What the evidence base actually looks like

    Much of the supportive literature for the MBTI has appeared in outlets associated with the instrument’s publishers rather than in independent, peer-reviewed personality journals. Independent reviews have repeatedly raised the same points: limited test-retest stability for the overall type, factor structures that do not cleanly reproduce the four advertised dimensions as fully independent, and weak incremental prediction of real-world outcomes once general traits are accounted for. By contrast, the Big Five literature spans thousands of independent studies, multiple languages and decades of replication. This asymmetry in the evidence base is itself a measurement-science signal: an instrument with strong properties tends to accumulate convergent, independent support.

    How to read a type result responsibly

    If an organisation already uses the MBTI, the responsible stance is to treat the four-letter code as a conversation starter, not a verdict. A type should never be recorded on a personnel file, used to allocate roles, or invoked to explain away a colleague’s behaviour. Because the result can change between sittings, any decision that would differ depending on which side of a cut-point someone landed is, by construction, unsafe. Where genuine measurement is needed—research, selection, or development tracking—a dimensional inventory with published reliability and validity is the defensible choice. Documenting which instrument was used and why, much as researchers record terms in a controlled research dictionary, lets others judge the evidence behind a claim.

    A balanced reading

    None of this makes the MBTI useless as a conversational vocabulary or a self-reflection prompt; many people find the language engaging. The measurement-science point is narrower and evidence-based: a tool valued for facilitation should not be repurposed as a precise, predictive instrument for high-stakes decisions. Practitioners who need defensible measurement should consult validated dimensional inventories and document their psychometric properties. The wider lesson connects to reproducibility reform: popularity is not evidence, and instruments deserve the same scrutiny as the findings they generate.

    Frequently asked questions

    Is the MBTI scientifically valid?

    The MBTI has well-documented limitations in reliability and validity. Critics highlight unstable retest results and weak prediction of outcomes such as job performance, which is why it is uncommon in peer-reviewed personality research.

    Why do MBTI results sometimes change between tests?

    Because the instrument places hard cut-points on continuous traits, people who score near a boundary can flip to the opposite letter on a small change. Across four dichotomies, this produces a different overall type.

    What is the difference between the MBTI and the Big Five?

    The MBTI sorts people into 16 categorical types, whereas the Big Five gives continuous scores on five dimensions. The Big Five generally shows stronger reliability and validity and is the standard in academic work. Authors reporting personality measures should describe the model and its psychometrics.

    Should the MBTI be used for hiring?

    No. The instrument was not designed for selection and its criterion validity for job performance is weak. Using categorical type codes to screen candidates conflicts with responsible-assessment practice.