Psychology as a Science
The conclusions that we can draw from research depends on how the knowledge was generated
For any piece of research we plan (or any research we read), we must be able to answer:
How do we actually test hypotheses appropriately?
How do we generalise our findings?
How do we quantify seemingly unquantifiable things?
The answer to these questions lies in research design
Research designs can vary on lots of different dimensions:
Some designs have some kind of manipulation and others don’t
Some designs involve multiple measurements from the same people and some design compare groups
Some designs take all their measurement at one point in time and others follow participants across time
The design we choose depends on:
An example: Ice cream and murder
We might decide to conduct some research into this relationship between ice cream and murder to see whether there’s actually a causal relationship1
This is the question we hope our research will answer
We might have something like the following:
Does eating ice cream make you more prone to murderous tendencies?
In our hypothesis we specify the outcome we expect
We might have something like the following:
Eating ice cream increases the desire to commit murder
To test this hypothesis we’ll design an experiment…
Our experiment might be something like the following:
Invite a group of people into the lab
Give half the people some ice cream to eat, and don’t give any ice cream to the other half (our manipulation)
We then get all participants to look at pictures of people (the stimuli) and rate how much they want to eliminate them on a scale from 0 (no desire) to 9 (all the desire possible)
After the experiment we might thank the participants, and debrief them by describing the aims of the study in more depth…
In our study we’re manipulating one thing and we’re measuring one thing
This means our study has one independent variable (IV) and one dependent variable (DV)
You’ll encounter the terms independent variable (IV) and dependent variable (DV) a lot, so let’s define what they mean:
The dependent variable is the variable that you analyse. Its value depends on the value of other variables. It’s the thing we’re measuring, and it’s sometimes also called the outcome.
An independent variable is a variable that influences the values of your dependent variable. It’s the thing we’re manipulating, and it’s sometimes also called the predictor.
In a well designed experiment, we can be confident in saying our manipulation caused a change in our outcome
But this isn’t the case with our study, because we’re missing a lot of things (or at least we haven’t specified them yet). Including:
Controls
Randomisation
Blinding
A theoretical framework
Our imaginary study didn’t use any controls (we touched on controls last week too)
We recruited all kinds of people without giving consideration to how different characteristics might affect our results:
We might have children as well as adults in our sample
We might have people with lactose intolerance in our sample who would’ve experienced discomfort eating ice-cream
We didn’t have standardised instructions for participants who enrolled in the study
We didn’t control our IV appropriately: We might have often changed the brand, the flavour, or the amount of ice-cream. Maybe one day we gave frozen yoghurt instead of ice cream
Now we don’t know exactly what caused any changes in our outcome
It could be that only strawberry mini milks cause murderous tendencies
We didn’t control the lab environment it was conducted in. On some days the heating was up super high and on others we had the windows wide open
Another feature that might have been missing from our study is randomisation
We didn’t randomly assign people to the groups
Maybe we recruited all our participants for the ice cream condition first, and we did this outside of a dentists office
It might be that most of these participants had sensitive teeth and so eating cold food made them angry
A well-designed experiment should randomise both participant allocation and stimulus presentation order (which we touched on last week in the memory example)
Another feature that might have been missing from our study is blinding
Maybe we told participants that we were interested in the effects of ice cream on murderous tendencies
Maybe we also gave all the participants the ice cream ourselves
If neither the participants nor the researcher know which condition the participants are put in, the study design is known as double-blind
The choice of predictor (IV) and outcome (DV) variables does not happen in a theoretical vacuum
These choices should be base on theory, but in our experiment these choices weren’t based on theory
It could be that murder causes people to eat ice-cream, in which case we should probably swap the IV and DV
Or it might be that they’re completely unrelated and any effect we find is just a coincidence
We’ve already talked a bit about experimental designs, but experiments actually come in different types
True experiments
Quasi-experiments
And natural experiments
Sometimes it’s not logistically, or ethically, possible to do a true experiment, so that’s where quasi-experiments and natural experiments come in handy
True experiments usually have tight controls
They can be somewhat artificial because they abstract away from the real world
This means they lack something called ecological validity
Ecological validity refers to the ability to generalise the results from an experiment to the real world
But experiments provide the most rigorous methodology for investigating causal relationships.
Experiments can be difficult to perform from a logistical point of view, because randomisation can be difficult, and sometimes manipulating IVs directly can be difficult or impossible
Quasi-experiments are similar to true experiments except for participant randomisation
This makes them useful in situations were randomisation isn’t possible
In situations like this, we should still try to match the participants so that the groups don’t differ on any relevant characteristics, except for the ones we’re investigating
Natural experiments are studies where randomisation and manipulation occur through natural or socio-political processes
One example might be twin studies
Identical twins share essentially 100% of their genes
Fraternal twins share on average 50% of their genes
Both kinds of twins tend to share the same home environment (raised together)
Comparing similarities between identical twins and similarities between fraternal twins, we can estimate the role of genes and environment in all sorts of things (physical/mental health, personality, cognitive ability, etc.)
Other kinds of natural experiments might be a result of policy changes (like smoking bans, or changes in the length of compulsory eduction) or natural events
Aspects of study design
Studies can vary on whether the manipulation or measurements occurs between groups
In between-subjects or independent designs we compare different groups of participants
In within-subjects or repeated measures we take repeated measurements from participants
Mixed designs have both within-subject and between-subject manipulations
Within-subject designs have some disadvantages like order effects (people might perform differently in the second condition because they get better at the task, or worse because they get tired)
But with within-subject designs it can sometimes be easier to detect differences between conditions
Studies can also vary in terms of whether participants are measured at one points in time or whether they’re followed over time
Cross-sectional designs
Take a cross-section of the sample at a single point in time
Logistically easier than other types of studies
Not very useful for telling us how things change over time
Longitudinal designs
Involves repeated measurements of the same characteristics from the same participants at multiple different points in time
Logistically very difficult to do and can be expensive. Some can run for years or even decades
Very useful for seeing how things change over time. Particularly useful for studying e.g., developmental processes
Because they can run for so long there can be issues with missing data
Missing data can be complex to deal with because sometimes data is missing at random, but other times it can be tracking something you’re interested in
E.g., A study on whether dating apps help you find love might show that no people find love on the apps, but that might just be because those that do find love drop out of the study
Whenever we’re trying to measure something there are some issues that we need to be aware of1
In psychology we measure lots of things that are difficult to observe directly
We try to measure these things using a range of tools including questionnaires, and experimental tasks
We design these tools using the theoretical underpinnings behind the constructs we’re trying to measure
Construct validity is the extent to which a tool can be justifiably trusted to actually measure the construct it is supposed to measure.
We want to be able to generalise the findings from our studies beyond the particular people that took part in our study
And we want to be able to generalise the findings from our studies beyond the exact experimental tasks and setup used in our study
A study has external validity if it can be generalised to the population of people with relevant characteristics
Ecological validity is a type of external validity that is particularly relevant to experimental designs
Researchers have questioned whether the results from typical psych studies are generalisable
Most psychology studies are conducted in a small handful of countries in the Global North (e.g., in North America, Europe, Australia/New Zealand)
Many of these studies also make use of undergraduate psychology students for their participants
More generally, typical psychology studies are conducted in societies that are WEIRD:
Understanding exactly whether and how these impact the generalisability of psychology findings means running more studies with samples that aren’t WEIRD
Reliability is about the consistency of a measure
A measure is reliable if it produces the same results each time it’s used on the same participant
E.g., If we’re measuring maths anxiety with a questionnaire then our questionnaire is reliable if we get similar scores each time we test a particular participant
This kind of stability over time is known as test-retest reliability
The last couple of things we’ll cover in this lecture will be about the jargon we use to talk about the nature of the measurements we’re taking
The first set of terms describe the kind of information we’re working with
We call this the level of measurement
There are four levels of measurement
Nominal/categorical
Ordinal
Interval
Ratio
Sometimes a construct can fall into many of these levels, and it’s on the researcher to decide what measurement level is the most appropriate to use.
Refers to names, categories, labels, or group membership.
Some examples include:
Can’t compare the different groups in any quantifiable way
Individual observations can be ordered in a meaningful way
For example:
We could order marathon runners ranked in order of who came 1st, 2nd, or 3rd.
However, doesn’t give information about the differences between individual points
E.g., We don’t know how much faster the winner is compared to the runner-up
The distance between 1st and 2nd doesn’t have to be the same as the distance between 2nd and 3rd
Common in psychology because of Likert scale
At the interval level of measurement, the differences (intervals) between pairs of adjacent values are the same
But there is no absolute zero point
The ratio level is similar to the interval level, but there is a meaningful 0 point.
Some examples of the ratio level of measurement that you might encounter in psychology are:
When we represent variables with numbers we can have different types depending on the type of data
Continuous variables can contain any numerical value within a certain range
Discrete variables can only contain some values
Binary variables can only take one of two possible values (Special case of discrete variables )
Our IVs and DVs can be any type (continuous, discrete, binary) or any level of measurement (nominal, ordinal, interval, ratio). It all depends on the study!
End