It might seem like there’s an easy answer to this question, but there’s at least three senses of probability.
These different senses are often employed in different contexts, because some make more sense in some contexts relative to others
The three I’ll cover are:
The classical view of probability
The frequency view of probability
The subjective view of probability
The classical view
The classical view is often used in the context of games of chance like roulette and lotteries
We can sum it up as follows:
If we have an (exhaustive) list of events that can be produce by some (exhaustive) list of equiprobable outcomes (the number of events and outcomes need not be the same), the probability of a particular event occurring is just the proportion of outcomes that produce that event.
To make it concrete we’ll think about flipping coins. If we flip two coins the possible outcomes that can occur are:
HH, HT, TH, TT
The classical view
If we’re interested in a particular event—for example, the event of “obtaining at least one head from two flips”—then we just count the number of outcomes that produce that event.
HH, HT, TH, TT
Three out of four outcomes would produce the event of “at least one head”, so the probability is \(\frac{3}{4}\) or 0.75
If you’re viewing probability like this, it’s very important to be clear about what counts as a possible outcome.
E.g., When playing the lottery, how many outcomes are there?
Two outcomes? You pick the correct numbers or you don’t? So the probability of winning is \(\frac{1}{2}\)?
Of course not! There’s 45,057,474 possible outcomes, and 1 leads to you winning with 45,057,473 leading to you not winning!
The frequency view
When you take a frequency view of probability you’re making a claim about how often, over some long period of time some event occurs.
The frequency view is often the view that we take in science. If we wanted to assign a probability to the claim “drug X lowers depression”, we can’t just think of each possible outcome that could occur when people take Drug X and then count up how many lead to lower depression and how many did not.
No way to make an exhaustive list of every possible outcome!
But we can run an experiment where we give Drug X and see whether it lowers depression. And we can repeat this many times. Then we count up the proportion of experiments in which depression was lowered.
That is then the probability that Drug X lowers depression.
The subjective view (credences)
Consider the following statement:
The Australian cricket team will lose the upcoming test series against South Africa
There is a sense in which you can assign a probability to this
But it isn’t the classical kind—we can’t just enumerate all the possible outcomes that lead to this event
Nor is it the frequency kind—we can’t repeat the 2022/2023 cricket tour over and over and see how often Australia lose.
When we talk about probability in this context we mean something like degree of belief, credence, or subjective probability.
Probability in this context is the answer to the question “how sure are you that the Australian cricket team will lose the upcoming test series against South Africa?”
Calculating with probability
The different views of probability have got to do with what the numbers mean, but once we have the numbers there are no real disagreements about how we do calculations with those numbers1
Some properties of probabilities will help us to do calculations
When we attach numbers to probabilities those numbers must range from 0 to 1
If an event has probability 0 then it is impossible
If an event has probability 1 then it is guaranteed
These two simple rules can help us to check our calculations with probabilities. If we get a value more than 1 or a value less than 0, then something has gone wrong!
The addition law
Whenever two events are mutually exclusive:
The probability that at least one of them occurs is the sum of their individual probabilities
If we flip a coin, one of two things can happen. It can land Heads, or it can land Tails. It can’t land heads and tails (mutually exclusive), and one of those things must happen (it’s a list of all possible events)
What’s the probability that at least one of those events happens? Since one of those events must happen the probability must be 1
But we can work it out from the individual probabilities
\(\frac{1}{2}\) possible outcomes produces Heads—P(Heads) = 0.50
\(\frac{1}{2}\) possible outcomes produces Tails—P(Tails) = 0.50
The probability of at least one of Heads or Tails occurring is 0.5 + 0.5 = 1
Mutually exclusive and non-mutually exclusive events
Consider a deck of cards:
What is the probability of pulling out a Spade or a Club?
What is the probability of pulling out a Spade or an Ace
In situation (1) the events are mutually exclusive or disjoint. A card can’t be a Spade AND a Club. It will either be a Space, a Club, or something else.
The addition rule applies:
P(Spade) + P(Club) = Probability of selecting a spade or a club.
In situation (2) the events are not mutually exclusive. A card can be both a Spade and an Ace.
So we need different rules
To make this clear, we’ll take a look at an example
First, we can set how many of each colour we have.
Then we can set how many of each color will have white dots.
viewof blue2_with_ = Inputs.bind( Inputs.range([1, blue2], { label:"Number of blue with dots",step:1 }), viewof blue2_with)viewof red2_with_ = Inputs.bind( Inputs.range([1, red2], {label:"Number of red with dots",step:1 }), viewof red2_with)
We can now ask a question like: What is the probability of selecting a circle that is Red or has a white dot?
texmd`$P(\mathrm{Red})$ = ${frac(red2/(red2 + blue2),red2 + blue2)} and$P(\mathrm{Dot})$ = ${frac((red2_with + blue2_with) / (red2 + blue2), red2 + blue2)}.We can't just add these two numbers, because we'll double count some of the circles. So after we add up the two numbers, we'll need to subtractthe number of double counted circles.`
Not double counting
texmd`- The red circles with white dots get counted twice- So we need to subtract this amount.First we work out $P(\mathrm{Red})$ + $P(\mathrm{Dot})$.- Using the numbers on the previous slide this gives us ${frac(((red2 + red2_with + blue2_with))/(blue2 + red2), blue2 + red2)}.- Then we subtract ${frac(red2_with / (blue2 + red2), blue2 + red2)}. - This gives us $P(\mathrm{Red} \cup \mathrm{Dot})$ = ${frac((red2 + blue2_with)/(blue2 + red2), blue2 + red2)}.`
But if all that maths is too difficult, then we can just work out the probability by counting! All we need to do is to count up all the circles that are either Red or have a dot. And we just divide that by the total number of circles.
viewof show_count = Inputs.toggle({ label:md`Show Red ∪ Dot`,value:false })
If you don’t want to count, and you just want to work it out mathematically, then you can do this by multiplying together the probability for each of the events. Doing this gives us the following:
In the previous example, the two choices were independent
This means that knowing whether you got Heads/Tails on the first flip didn’t impact how you calculated the probability of getting Heads/Tails on the second flip
We can calculate the probability of each event without considering anything about the other event
But sometimes this isn’t the case… sometimes knowing what happened on the first event changes how to calculate the probability of the second event
Let us look at a simple example…
Conditional probability
Let’s say we’re going to roll a die…
But instead of just rolling a die, we’ll first select one of two dice. The setup is as follows:
First, pick either a 20-sided dice (D-20) or a 6-sided dice (D-6)
Second, roll the dice.
If I ask you, what is the probability of rolling a 20? The answer you give will change if I tell you which die you picked
For the coin flip, being told about the first flip doesn’t change your calculation for the second
For the dice roll, being told you picked the D-20 or D-6 does change your calculation
If you picked a D-20 then the probability that you rolled a 20 is \(\frac{1}{20}\)
If you picked a D-6 then the probability that you rolled a 20 is 0
Working with conditional probabilities
We often encounter conditional probabilities in everyday life where we use some bit of information to help us work out the probability of something.
However, reasoning about conditional probabilities can be difficult and as a result people make a lot of mistakes when dealing with them.
The most common mistake that you’ll encounter is the confusion being P(A|B) and P(B|A).
Or as in the dice example:
P(Roll 20 | D20), which is 1/20
P(D20 | Roll 20), which is 1
Working with conditional probabilities
The other common mistake is confusing the conditional probabilities for the unconditional probabilities
That is, confusing, for example, P(A|B) and P(A)
Or in the dice example:
P(D20 | Roll 20), which is 1
P(D20) which is 0.5
There is a mathematical formula that relates all these quantities together.
This is known as Bayes theorem
Bayes theorem allows us to update our probability calculations when we find out new information
For example, we can update our calculation for rolling a 20 when we find out that we selected a D-20
Bayes theorem
Bayes theorem can help us think through conditional probabilities, because sometimes conditional probabilities can be very unintuitive
Consider the following example:
Does a positive test mean somebody is sick?
There is a test for an illness. The test has the following properties:
About 80% of people that have the illness will test positive
Only ~5% of people that don’t have the illness will test positive
Somebody, who may be sick or healthy, takes the test and tests positive…
md`- ${data.g[0] + data.g[1]} sick people and ${functions.frac( data.conds.dot_red, data.g[0] + data.g[1])} or ${functions.round3( data.conds.dot_red*100)}% test **positive**`
md`- ${data.g[2] + data.g[3]} healthy people, and only${functions.frac( data.conds.dot_green, data.g[2] + data.g[3])} or ${functions.round3( data.conds.dot_green*100)}% test **postive**.`
md`- There are ${data.g[1] + data.g[2]} people that test positive. - ${functions.frac( data.g[1] / (data.g[1] + data.g[2]), data.g[1] + data.g[2])} or ${functions.round3( (data.g[1] / (data.g[1] + data.g[2])) *100)}% are sick - ${functions.frac( data.g[2] / (data.g[1] + data.g[2]), data.g[1] + data.g[2] )} or ${functions.round3((data.g[2] / (data.g[1] + data.g[2])) *100)}% are healthy.- A person testing **positive** is more likely to be ${ data.g[1] / (data.g[1] + data.g[2]) > data.g[2] / (data.g[1] + data.g[2])?"**sick**":"**healthy!**"}`
Bayes theorem
We can work out the answer to the previous question just by counting the dots, but we can also use Bayes theorem.
Note that the crucial values here are P(🤮 ) and P(✅ ). These are sometimes referred to as the prior probabilities or unconditional probabilities.
If you change the value of P(🤮 ) then you’re changing how rare or common the disease
And P(✅ ) is function of the P(✅| 🤮 ), P(✅| 😁 ), and P(🤮 )
Reasoning with Bayes theorem and conditional probabilities
Reasoning about conditional probabilities like the testing example can be difficult because people often forget about the P(🤮) and P(✅ ) bits.
But we ignore P(🤮) and P(✅ ) we can see it’s easy to make mistakes!
Another common error is to confuse P(🤮|✅) and P(✅|🤮) or to think
P(🤮|✅) = P(✅|🤮)
But we saw from our earlier example that this isn’t the case
You can think of the following example to help remind you of this. P(Lives in London | Is Boris Johnson) = 1, but P(Is Boris Johnson | Lives in London) = 1 in 9 Million
The media and (the scientific literature) is unfortunately littered with examples of people getting muddled with conditional probabilities
And some of these confusions can be dangerous!
I’ll just pick out two more examples to finish on…
Does the Covid vaccine work?
You might have heard the following statistic in the media/online
50% of the people that die from Covid have been vaccinated
I’ve seen this stat on social media along with the claim that it shows that the Covid vaccine doesn’t work
Let’s assume the stat is accurate. Does this mean that the vaccine doesn’t work?
Does the Covid vaccine work?
Green circles are the vaccinated and the orange circles are the unvaccinated
A red dot means the person died of covid
import {viewof figure as covid} from"@ljcolling/covid"import {viewof short_description as covid_desc} from"@ljcolling/covid"import {viewof positive_tests as covid_pos} from"@ljcolling/covid"import {viewof incidence as vax_rate} from"@ljcolling/covid"viewof covid
viewof covid_pos
viewof vax_rate
viewof covid_desc
In this example, we’re keeping the vaccine efficacy constant (50% chance of dying in the unvaccinated and 10% chance of dying in the vaccinated) and we’re only changing the vaccination rate
Does the Covid vaccine work?
The Covid vaccine example is an example of people confusing P(A|B) for P(B|A)
What you want to know is \(P(💀\ | 💉)\)
But on social media people were talking about \(P(💉\ | 💀)\)
A high value for \(P(💉\ |💀)\) is consistent with a low value for \(P(💀\ |💉)\) if \(P(💉)\) is high!
This is an example from social media, but some scientists make this error too!
Racial bias in police shootings
A few years ago Johnson et al published a study (in a very prestigious journal) about racial bias in police shooting.
Their finding can be summed up as follows:
There is no racial bias in police shootings because people shot by police are more likely to be White than Black
This was picked up by the conservative media (e.g., Fox news) to show that movements like BLM were fighting against a problem that didn’t exist!
But is the reasoning correct, and do the data show what Johnson et al claim?
NO!
Racial bias in police shootings
Johnson et al, the journal reviewers, the journal editors and the media were looking at
Whether P(Black|Shot) was larger than P(White|Shot)
Should’ve been looking at whether P(Shot|Black) was larger than P(Shot|White)
This paper has now been retracted from the journal after a campaign that started on Twitter, but the damage may already be done
But let’s step through it to find the errors…
Racial bias in police shootings
Let’s first look at the data Johnson et al present
The probability that a person is White (P(White|Shot)) is \(\frac{20}{30}\) or 66.67%
The probability that a person is Black (P(Black|Shot)) is \(\frac{10}{30}\) or 33.33%
These are the two probabilities that Johnson et al looked at
Racial bias in police shootings
But let’s add some additional data. These are the people that have had encounters with police that didn’t end in a shooting
Jonson et al didn’t report this data, so I’m made this up for illustration
We need this data because instead of looking at P(Black|Shot)/P(White|Shot) we need to look at P(Shot|Black) and P(Shot|White)
Racial bias in police shootings
Putting it all together we see this:
These are all the encounters that occurred between the police and civilians including those that ended in the police shooting a civilian and those that did not.
If we focus just on the people who are Black then P(Shot|Black) = 50%
If we focus just on the people who are White then P(Shot|White) = 25%
So the data presented by Johnson et al are consistent with a racial bias in police shooting assuming my assumptions are correct
My assumptions could, however, be incorrect but Johnson et al didn’t collect this data because their faulty logic meant they didn’t realise it was important…
Racial bias in police shootings
How important? Both these images are consistent with the data reported by Johnson et al
Both figures above are consistent with P(Black|Shot) = 0.33 and P(White|Shot) = 0.67
But one gives P(Shot|Black) = 0.5 and P(Shot|White) = 0.25
And the other gives P(Shot|Black) = 0.14 and P(Shot|White) = 0.67
Either one of these could be the case, but Johnson et al’s data can’t tell us which and therefore, they have no basis to support their claim
Final thoughts
I hope this serves as a sobering message for just how important research methods (including probability theory) is in your training
You might one day be in the position to make policies for governments, so I hope you don’t fall victim to faulty reasoning when you do!
And that you know how to assess and interpret research correctly
The exam
The final assessment of this course is the final exam
The final exam is worth 50% of your grade and it covers the material from the 11 weekly lectures
Doesn’t include material on R and RStudio
Doesn’t include material from the ethics lecture
The exam will be online (This is specific to PAAS. Your other exams are probably in person)
Format
Mostly multi-choice, with a few questions where you have to enter in numbers
Some of the numeric questions will just involve finding the correct number in a table, but some will involve calculating a number
A sample exam will be made available on Canvas in the next couple of days