Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Bayes theorem

Put very briefly Bayes theorem interrelates:

  • the likelihood (a posteriori, after the event) that X is true given Y was observed, P(X|Y),
  • the likelihood X will be true, (a priori, prior to the event) P(X), and
  • the likelihood that Y occurs given X is true, P(Y|X).


P(X|Y) =  P(X)P(Y|X)

None of which is very intelligible, nor explains how Bayes theorem is useful, what it assumes, or why its application might be controversial.


Formulae for predictive values

Bayes theorem is a formula to give the probability that a given cause was responsible for an observed outcome - assuming that the probability of observing that outcome for every possible cause is known, and that all causes and events are independent.

However, the positive and negative predictive values can also be obtained by simple algebraic rearrangement of the terms in the table below.

 True situation (cause)Total
Test result
(observed outcome)

n =overall total = a + b + c + d
P =true Prevalence = [a + c] / n
Se =Sensitivity = a / [a + c]
Sp =Specificity = d / [b + d]
PPV =Positive Predictive Value = a / [a + b]
NPV = Negative Predictive Value  = d / [c + d]

Although these terms are calculated from the table cell contents, a, b, c, d and n, the table contents can also be calculated from the first four terms above - as follows:

a = P × Se × n = 
 ×   a 
 × n
b = (1 − P) × (1 − Sp) × n = 
 ×   b 
 × n
c =P × (1 − Se) × n =
 ×   c 
 × n
d =(1 − P) × Sp × n =
 ×   d 
 × n

The positive and negative predictive values can be obtained in the same way - except we do not need the overall total (n) because it cancels out.

PPV = a =   a/n =   P × Se
a + b{a+b}/n [P × Se] + [(1 − P) × (1 − Sp)]
NPV = d =   d/n =   (1 − P) × Sp
c + d{c+d}/n [P × (1 − Se)] + [{1 − P) × Sp]


Bayes theorem and inference

Given the simple relationship between predictive values and frequencies a,b,c,d, if you feel the above formulae are a waste of time let us try to explain why this may not be the case.

Imagine you have just been tested for a fatal infectious disease whose treatment is extremely unpleasant and/or risky.
  • If you assume this test is 100% reliable, you will infer that, whatever the outcome, you will know that you are OK or not.
  • Of course, since absolutely no test is ever 100% reliable, whatever that test's outcome, your next question should be how likely is it this result is wrong?
In other words:
Given I have tested positive, what is the probability I was really uninfected?
Given I have tested negative, what is the probability I was really infected?

Note that, you will only need the answer to one of those questions.

In other words, the questions are conditional upon the outcome of your test, as are their answers - assuming you cannot be simultaneously infected and uninfected, and a single test cannot find you are both positive and negative.

These issues are important because the answers to those two questions may be very different.

For example, we have dimensioned the table below to show the true infection status and observed test outcomes of n results where 28.42% were infected, but 28.42% of those infected were (wrongly) found negative, and 24.69% of those uninfected were (wrongly) found positive.

Answering your question is easy, if you happen to be a randomly-selected member of those n test-results,:

  • If you tested positive, the probability you are uninfected is b/a+b or 1 − PPV.
  • If you tested negative, the probability you are infected is c/c+d or 1 − NPV.

This simple arrangement rarely works in practice because, whilst the infection rate (A+C/N) is estimated from the population-at-large, the test's sensitivity (a/a+c) and specificity (d/b+d) are estimated using a subset of that population.

Bayes theorem enables you to combine this information, as proportions or probabilities, to find the probability a specified cause was responsible for a specified outcome - whether or not n and N are the same.


Bayesian inference is rather different from most of the other forms of inference in this course - such as 'standard errors', 'confidence intervals' or 'hypothesis tests' - indeed, this huge branch of statistics is often considered to be an completely alternative approach to reaching inference. Since this course concentrates upon classical frequentist statistics we cannot reasonably hope to provide anything other than the briefest outline here.

At its heart, Bayesian inference is effectively an application of Bayes theorem. In many ways it is about what you do with a test's result, rather than something you use instead of a test. Bayesian inference enables you to choose between a set of mutually-exclusive explanations, or to quantify belief. Assuming you are not interested in an algebraic 'proof' of this theorem, let us try to explain some of its reasoning - and introduce a few useful terms.

Imagine for a moment that you have three bundles of money which you have just spent some time counting.
  • Bundle 1 contains 199 notes, of which 76 are 100$ notes.
  • Bundle 2 contains 469 notes, of which 4 are 100$ notes.
  • Bundle 3 contains 396 notes, of which 44 are 100$ notes.

Having put them carefully away, you notice a 100$ note has slipped out of one of your counted bundles - unfortunately, you have no idea which.

    Assuming you do not wish to recount them, which bundle is most likely to be missing such a note, and can we quantify that assessment?

In order to quantify that situation, let us make three assumptions:

  1. These 3 bundles are the only possible source of that 100$ note.
    This assumption is crucial and central to what follows - it defines the 'sample space' of all possible outcomes.
  2. Prior to this event, each bundle was equally likely to have lost a note.
    This assumption enables us to divide up the sample space in the same way as the overall proportion infected in our example above.
  3. Any type of note was equally liable to slip out of a bundle.
    This assumption enables us to obtain the likelihood of a specified outcome given each possible explanation.

Given those assumptions are correct:

  • If P(A|B) is the probability of observing A given B, we can readily estimate the likelihood of loosing a 100$ note from a given bundle. Let us say that P(A=1) is the probability of loosing a 100$ note, and P(A=2) is the probability of loosing any other sort of note.
    • Since 76/199 of bundle 1 were 100$ notes, the conditional probability of this event, P(A=1|B=1), was 0.38
    • 4/469 of bundle 2 were 100$ notes, so the likelihood, P(A=1|B=2), was 0.01
    • 44/396 of bundle 3 were 100$ notes, so the probability that A=1 given B=3 is P(A=1|B=3), or 0.11
  • If P(B) is the probability of loosing a 100$ note from an unspecified bundle, prior to this event, and P(B=1) is the probability of loosing a 100$ note from bundle 1,
    • then, provided that note cannot have come from anywhere else, P(B=1) + P(B=2) + P(B=3) must equal 1, and for these 3 bundles, the prior probabilities (being equal) are therefore P(B=1) = P(B=2) = P(B=3) = 1/3.

Since the probability this note arose from a particular bundle depends upon P(A|B) and P(B) we might reasonably assume their combined probability is P(A|B) P(B)

In which case the total probability that a 100$ note arose from any of those bundles, P(A=1|B), is the sum of those combined probabilities:
Σ[P(A=1|B) P(B)] or [P(A=1|B=1)P(B=1)] + P(A=1|B=2)P(B=2) + P(A=1|B=3)P(B=3)

Therefore, the proportion of events where a 100$ note arises from bundle 1, P(A=1|B=1)is:

[P(A=1|B=1)P(B=1)] / Σ[P(A=1|B) P(B)]

You may wish to compare this formula with those for positive and negative predictive value (PPV & NPV). Applying this formula we obtain the following probabilities:

  • P(B=1 given A=1) = 0.7615
  • P(B=2 given A=1) = 0.0170
  • P(B=3 given A=1) = 0.2215

Given the various assumptions are correct, our 100$ note probably came from bundle 1 - at least that explanation is (0.7615/0.2215=) 3.4 times as likely as it arising from bundle 3.

Fairly obviously, we could extend this reasoning to any number of bundles - provided these assumptions are met, and we know the contents of every bundle.

In real life, a number of things can modify this simple state of affairs.

  1. For example you might decide that the probability of a note slipping out of a bundle is directly related to the number of notes in that bundle.

  2. Or that, because you do not remember seeing a stray note when you put bundle 1 away, you feel (p=) 0.98 or 98% certain that it came from bundle 2 or 3.

In principle, at least, it is easy enough to allow for such alterations.

  1. In the first case, if ni is the number of notes in the ith bundle, and Σn is the total number of notes, then the prior likelihood that note originated from the ith bundle, P(B=i) is ni/Σn

    Applying Bayes theorem we obtain the following posterior probabilities:

    • P(B=1 given A=1) = 0.6129
    • P(B=2 given A=1) = 0.0323
    • P(B=3 given A=1) = 0.3549

    Again, given the various assumptions are correct, our 100$ note probably came from bundle 1 - but that explanation is only (0.6129/0.3548=) 1.7 times as likely as it arising from bundle 3.

  2. Whereas, in the second case, the prior likelihood that 100$ note originated from bundle 1 is 0.02 (=1 − 0.98). Provided we can assume our 100$ note is equally likely to have come from bundles 2 or 3 Bayes theorem gives these posterior likelihoods:
    • P(B=1 given A=1) = 0.1153
    • P(B=2 given A=1) = 0.0631
    • P(B=3 given A=1) = 0.8217

    Now, our 100$ note is (0.6129/0.3548=) 1.7 times as likely to have come from bundle 3 than bundle 1.

Notice that, even though we labelled those bundles of notes 1 2 & 3, their identity is actually a nominal variable - but the same reasoning would apply if our possible causes were a discrete variable, such as odds-ratios (divided into class-intervals) representing the effect of chemotherapy on pancreatic carcinoma. In that situation we could propose various prior likelihood distributions, either arbitrary, or summarising previous studies, or representing a hypothesis - whether pessimistic or optimistic.


Although it is very seductive, using Bayesian inference to combine subjective and objective likelihoods has clear risks, and makes some statisticians understandably nervous. When a potential cause has a zero prior, its posterior is also zero - assuming obviously impossible results are discarded.

"The new always happens against the overwhelming odds of statistical laws and their probability, which for all practical, everyday purposes amounts to certainty;
the new therefore always appears in the guise of a miracle.
Hannah Arendt (1906-75)
German-born U.S. political philosopher. The Human Condition, (1958) pt. 5, ch. 24