Lecture 11

Introduction to probability

Reasoning about probability and common pitfalls

Dr Lincoln Colling

05 Dec 2022

Psychology as a Science


What do we mean by “probability”?

It might seem like there’s an easy answer to this question, but there’s at least three senses of probability.

These different senses are often employed in different contexts, because some make more sense in some contexts relative to others

The three I’ll cover are:

  • The classical view of probability

  • The frequency view of probability

  • The subjective view of probability

The classical view

The classical view is often used in the context of games of chance like roulette and lotteries

We can sum it up as follows:

If we have an (exhaustive) list of events that can be produce by some (exhaustive) list of equiprobable outcomes (the number of events and outcomes need not be the same), the probability of a particular event occurring is just the proportion of outcomes that produce that event.

To make it concrete we’ll think about flipping coins. If we flip two coins the possible outcomes that can occur are:


The classical view

If we’re interested in a particular event—for example, the event of “obtaining at least one head from two flips”—then we just count the number of outcomes that produce that event.


Three out of four outcomes would produce the event of “at least one head”, so the probability is \(\frac{3}{4}\) or 0.75

If you’re viewing probability like this, it’s very important to be clear about what counts as a possible outcome.

E.g., When playing the lottery, how many outcomes are there?

  • Two outcomes? You pick the correct numbers or you don’t? So the probability of winning is \(\frac{1}{2}\)?

  • Of course not! There’s 45,057,474 possible outcomes, and 1 leads to you winning with 45,057,473 leading to you not winning!

The frequency view

When you take a frequency view of probability you’re making a claim about how often, over some long period of time some event occurs.

  • The frequency view is often the view that we take in science. If we wanted to assign a probability to the claim “drug X lowers depression”, we can’t just think of each possible outcome that could occur when people take Drug X and then count up how many lead to lower depression and how many did not.

  • No way to make an exhaustive list of every possible outcome!

  • But we can run an experiment where we give Drug X and see whether it lowers depression. And we can repeat this many times. Then we count up the proportion of experiments in which depression was lowered.

  • That is then the probability that Drug X lowers depression.

The subjective view (credences)

Consider the following statement:

The Australian cricket team will lose the upcoming test series against South Africa

There is a sense in which you can assign a probability to this

  • But it isn’t the classical kind—we can’t just enumerate all the possible outcomes that lead to this event

  • Nor is it the frequency kind—we can’t repeat the 2022/2023 cricket tour over and over and see how often Australia lose.

When we talk about probability in this context we mean something like degree of belief, credence, or subjective probability.

Probability in this context is the answer to the question “how sure are you that the Australian cricket team will lose the upcoming test series against South Africa?”

Calculating with probability

The different views of probability have got to do with what the numbers mean, but once we have the numbers there are no real disagreements about how we do calculations with those numbers1

Some properties of probabilities will help us to do calculations

When we attach numbers to probabilities those numbers must range from 0 to 1

  • If an event has probability 0 then it is impossible

  • If an event has probability 1 then it is guaranteed

These two simple rules can help us to check our calculations with probabilities. If we get a value more than 1 or a value less than 0, then something has gone wrong!

The addition law

Whenever two events are mutually exclusive:

The probability that at least one of them occurs is the sum of their individual probabilities

If we flip a coin, one of two things can happen. It can land Heads, or it can land Tails. It can’t land heads and tails (mutually exclusive), and one of those things must happen (it’s a list of all possible events)

  • What’s the probability that at least one of those events happens? Since one of those events must happen the probability must be 1

  • But we can work it out from the individual probabilities

  1. \(\frac{1}{2}\) possible outcomes produces Heads—P(Heads) = 0.50

  2. \(\frac{1}{2}\) possible outcomes produces Tails—P(Tails) = 0.50

The probability of at least one of Heads or Tails occurring is 0.5 + 0.5 = 1

Mutually exclusive and non-mutually exclusive events

Consider a deck of cards:

  1. What is the probability of pulling out a Spade or a Club?

  2. What is the probability of pulling out a Spade or an Ace

In situation (1) the events are mutually exclusive or disjoint. A card can’t be a Spade AND a Club. It will either be a Space, a Club, or something else.

The addition rule applies:

  • P(Spade) + P(Club) = Probability of selecting a spade or a club.

In situation (2) the events are not mutually exclusive. A card can be both a Spade and an Ace.

  • So we need different rules

To make this clear, we’ll take a look at an example

First, we can set how many of each colour we have.

Then we can set how many of each color will have white dots.

We can now ask a question like: What is the probability of selecting a circle that is Red or has a white dot?

Not double counting

But if all that maths is too difficult, then we can just work out the probability by counting! All we need to do is to count up all the circles that are either Red or have a dot. And we just divide that by the total number of circles.

Two or more events

In the last example we asked about the probability of selecting a red circle or a circle with a white dot

In that example we’re dealing with a single event where we could, for example, select:

  • A red circle
  • A circle with a white dot
  • A blue circle

But sometimes we have to deal with multiple events.

We’ve already seen an example with coin flipping

Two or more events

Let’s say we flip a coin three times, we might want to work out the probability of getting, for example, Heads, then Tails, and then Heads again

  • We can’t just add up the probabilities, because we’d get \(\frac{1}{2}\) + \(\frac{1}{2}\) + \(\frac{1}{2}\) = \(\frac{3}{2}\).

  • Before we work it out mathematically, we’ll work it out by counting

Figure 1: Possible sequences after coin flips

If you don’t want to count, and you just want to work it out mathematically, then you can do this by multiplying together the probability for each of the events. Doing this gives us the following:

\(\frac{1}{2}\) × \(\frac{1}{2}\) × \(\frac{1}{2}\) = \(\frac{1}{8}\)

Independence and non-independence

In the previous example, the two choices were independent

  • This means that knowing whether you got Heads/Tails on the first flip didn’t impact how you calculated the probability of getting Heads/Tails on the second flip

  • We can calculate the probability of each event without considering anything about the other event

But sometimes this isn’t the case… sometimes knowing what happened on the first event changes how to calculate the probability of the second event

Let us look at a simple example…

Conditional probability

Let’s say we’re going to roll a die…

But instead of just rolling a die, we’ll first select one of two dice. The setup is as follows:

  1. First, pick either a 20-sided dice (D-20) or a 6-sided dice (D-6)

  2. Second, roll the dice.

If I ask you, what is the probability of rolling a 20? The answer you give will change if I tell you which die you picked

  • For the coin flip, being told about the first flip doesn’t change your calculation for the second

  • For the dice roll, being told you picked the D-20 or D-6 does change your calculation

    • If you picked a D-20 then the probability that you rolled a 20 is \(\frac{1}{20}\)

    • If you picked a D-6 then the probability that you rolled a 20 is 0

Working with conditional probabilities

  • We often encounter conditional probabilities in everyday life where we use some bit of information to help us work out the probability of something.

  • However, reasoning about conditional probabilities can be difficult and as a result people make a lot of mistakes when dealing with them.

The most common mistake that you’ll encounter is the confusion being P(A|B) and P(B|A).

Or as in the dice example:

  • P(Roll 20 | D20), which is 1/20

  • P(D20 | Roll 20), which is 1

Working with conditional probabilities

The other common mistake is confusing the conditional probabilities for the unconditional probabilities

That is, confusing, for example, P(A|B) and P(A)

Or in the dice example:

  • P(D20 | Roll 20), which is 1

  • P(D20) which is 0.5

There is a mathematical formula that relates all these quantities together.

This is known as Bayes theorem

  • Bayes theorem allows us to update our probability calculations when we find out new information

For example, we can update our calculation for rolling a 20 when we find out that we selected a D-20

Bayes theorem

Bayes theorem can help us think through conditional probabilities, because sometimes conditional probabilities can be very unintuitive

Consider the following example:

Does a positive test mean somebody is sick?

There is a test for an illness. The test has the following properties:

  1. About 80% of people that have the illness will test positive

  2. Only ~5% of people that don’t have the illness will test positive

Somebody, who may be sick or healthy, takes the test and tests positive…

Is that person sick?

Does a positive test mean somebody is sick?

Let’s say a sample of 100 people

Bayes theorem

We can work out the answer to the previous question just by counting the dots, but we can also use Bayes theorem.

Bayes theorem is given as:

\[P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\]


\[P(🤮\ |\ ✅) = \frac{P(✅\ |\ 🤮) \times P(🤮)}{P(✅)}\]

Note that the crucial values here are P(🤮 ) and P(✅ ). These are sometimes referred to as the prior probabilities or unconditional probabilities.

If you change the value of P(🤮 ) then you’re changing how rare or common the disease

And P(✅ ) is function of the P(✅| 🤮 ), P(✅| 😁 ), and P(🤮 )

Reasoning with Bayes theorem and conditional probabilities

  • Reasoning about conditional probabilities like the testing example can be difficult because people often forget about the P(🤮) and P(✅ ) bits.

  • But we ignore P(🤮) and P(✅ ) we can see it’s easy to make mistakes!

  • Another common error is to confuse P(🤮|✅) and P(✅|🤮) or to think
    P(🤮|✅) = P(✅|🤮)

  • But we saw from our earlier example that this isn’t the case

    • You can think of the following example to help remind you of this. P(Lives in London | Is Boris Johnson) = 1, but P(Is Boris Johnson | Lives in London) = 1 in 9 Million

The media and (the scientific literature) is unfortunately littered with examples of people getting muddled with conditional probabilities

And some of these confusions can be dangerous!

I’ll just pick out two more examples to finish on…

Does the Covid vaccine work?

You might have heard the following statistic in the media/online

50% of the people that die from Covid have been vaccinated

I’ve seen this stat on social media along with the claim that it shows that the Covid vaccine doesn’t work

Let’s assume the stat is accurate. Does this mean that the vaccine doesn’t work?

Does the Covid vaccine work?

  • Green circles are the vaccinated and the orange circles are the unvaccinated

  • A red dot means the person died of covid

In this example, we’re keeping the vaccine efficacy constant (50% chance of dying in the unvaccinated and 10% chance of dying in the vaccinated) and we’re only changing the vaccination rate

Does the Covid vaccine work?

  • The Covid vaccine example is an example of people confusing P(A|B) for P(B|A)

  • What you want to know is \(P(💀\ | 💉)\)

  • But on social media people were talking about \(P(💉\ | 💀)\)

A high value for \(P(💉\ |💀)\) is consistent with a low value for \(P(💀\ |💉)\) if \(P(💉)\) is high!

This is an example from social media, but some scientists make this error too!

Racial bias in police shootings

A few years ago Johnson et al published a study (in a very prestigious journal) about racial bias in police shooting.

Their finding can be summed up as follows:

There is no racial bias in police shootings because people shot by police are more likely to be White than Black

This was picked up by the conservative media (e.g., Fox news) to show that movements like BLM were fighting against a problem that didn’t exist!

But is the reasoning correct, and do the data show what Johnson et al claim?


Racial bias in police shootings

Johnson et al, the journal reviewers, the journal editors and the media were looking at

  • Whether P(Black|Shot) was larger than P(White|Shot)

  • Should’ve been looking at whether P(Shot|Black) was larger than P(Shot|White)

This paper has now been retracted from the journal after a campaign that started on Twitter, but the damage may already be done

But let’s step through it to find the errors…

Racial bias in police shootings

Let’s first look at the data Johnson et al present

Figure 2: Sample of police shootings. Pink circles correspond to White victims and Black circles to Black victims

Figure 2 shows the people shot by the police.

The probability that a person is White (P(White|Shot)) is \(\frac{20}{30}\) or 66.67%

The probability that a person is Black (P(Black|Shot)) is \(\frac{10}{30}\) or 33.33%

These are the two probabilities that Johnson et al looked at

Racial bias in police shootings

  • But let’s add some additional data. These are the people that have had encounters with police that didn’t end in a shooting

  • Jonson et al didn’t report this data, so I’m made this up for illustration

  • We need this data because instead of looking at P(Black|Shot)/P(White|Shot) we need to look at P(Shot|Black) and P(Shot|White)

Figure 3: Sample of all people encountered by the police without getting shot. Pink circles correspond to White people and Black circles to Black people

Racial bias in police shootings

Putting it all together we see this:

Figure 4: Sample of all people encountered by the police. Pink circles correspond to White people and Black circles to Black people. Red dots correspond to shooting victims.

These are all the encounters that occurred between the police and civilians including those that ended in the police shooting a civilian and those that did not.

  • If we focus just on the people who are Black then P(Shot|Black) = 50%

  • If we focus just on the people who are White then P(Shot|White) = 25%

So the data presented by Johnson et al are consistent with a racial bias in police shooting assuming my assumptions are correct

My assumptions could, however, be incorrect but Johnson et al didn’t collect this data because their faulty logic meant they didn’t realise it was important…

Racial bias in police shootings

How important? Both these images are consistent with the data reported by Johnson et al

  • Both figures above are consistent with P(Black|Shot) = 0.33 and P(White|Shot) = 0.67

  • But one gives P(Shot|Black) = 0.5 and P(Shot|White) = 0.25

  • And the other gives P(Shot|Black) = 0.14 and P(Shot|White) = 0.67

Either one of these could be the case, but Johnson et al’s data can’t tell us which and therefore, they have no basis to support their claim

Final thoughts

I hope this serves as a sobering message for just how important research methods (including probability theory) is in your training

  • You might one day be in the position to make policies for governments, so I hope you don’t fall victim to faulty reasoning when you do!

  • And that you know how to assess and interpret research correctly

