class: center, middle, inverse, title-slide # Organizing data ## Practical 7 --- <style type="text/css"> kbd { padding: 2px 4px; font-size: 90%; color: rgb(var(--font-col)); background-color: #efefef; border-radius: 3px; box-shadow: none; border: solid 1px; } </style> ## Plan for today - Questions about last weeks practical - Attendance pin - Recapping some previous material - Introduction to this week's topic - This week's worksheet --- # Attendance pin ![:attend] --- ## R Studio projects I've notice lots of people are *creating a new folder* for each week, but they haven't been creating **new projects** Each week you should be creating a new project for your work **How to create a new project** Go to **File** > **New Project** and enter the name of your project in the box labelled **Directory name** Revisit [Practical 2](https://paas.netlify.app/practicals/02_project_files/worksheets/) if you need a refresher --- ### The create project dialogue box <img src="./assets/create_new_project.png" /> --- ### R Studio with an open project <img src="./assets/new_project_with_sub_folders.png" width="85%" /> --- ### Finding your project folder in Files When we start working with files **always** make sure you know what is in your **project folder** (your HERE) .pull-left[<img src="./assets/pwd.png" width="85%" />] .pull-right[To find your **HERE**, so go to the menu bar on the files pane - Click the button labelled **More** - Select **Go To Working Directory** This will show you your **project folder** in your **Files** pane] <br /> Before you start giving directions to a file, make sure your **HERE** is where you think it is --- ### Opening your project folder in Finder/File Explorer If you want to move files into your project folder with Finder/File Explorer then you can use to **Files** panes to open a new Finder/File Explorer window *at your project folder* .pull-left[<img src="./assets/finder.png" width="85%" />] .pull-right[To find your **HERE**, so go to the menu bar on the files pane - Click the button labelled **More** - Select **Show Folder in New Window** This will open a new Finder/File Explorer window] Make sure that when you're moving files, that you're moving them to the correct place. --- ## Objects and data I've noticed a few people struggling with when to put stuff in quotes `""` The rule is: - If it's a character string, then it goes in `""` - Nothing else goes in `""` Just think of this simple example: ```r my_object <- "a letter string" ``` The **object name** `my_object` isn't in `""`. The characters string being **assigned** to that object **is** in `""` --- ### Using objects and strings as function input Let's take the `here::here()` function as an example. If you wrote: ```r here::here(images/my_dog.png) ``` Then that means that the input would be **an object** called `images` divided by an object called `my_dog.png` But you won't have an object called `images` or one called `my_dog.png` If you typed the `images/my_dog.png` into the console you'd get an error ```r > images/my_dog.png Error: object 'images' not found ``` Compare with the following: ```r > "images/my_dog.png" [1] "images/my_dog.png" ``` Your **input** to a **function** shouldn't produce an **error**! --- ### Using an object name to get it's content Contrast this with the situation where you have an **object** that has been assigned a value. First we assign a value to the object ```r > fav_food <- "Pizza" ``` Now the object contains that value ```r > fav_food [1] "Pizza" ``` And we can use that **object name** anywhere where we might want to use it's content --- ### What about numbers? Numbers can be **numbers**... that is, things that we can do maths on... Or they can be character strings. Think about things like your phone number, or the names of roads like the A14. The numbers in these aren't the kind of thing you do maths with. Remember the rule: Character strings go in `""`, and nothing else does! What happens if we put numbers in quotes? Then we can't do maths on them! ```r > "10" + "10" Error in "10" + "10" : non-numeric argument to binary operator ``` Compare to this: ```r > 10 + 10 [1] 20 ``` --- ## Organizing data In today's practical we're going to learn how to organize data in a **tidy** way <img src="./assets/tidydata_2.jpg" /> --- We organise data in a **tidy** way, because it makes it easier to work with. For next week, we'll be learning how to work with **tidy** data <img src="./assets/tidydata_3.jpg" /> --- ### What is **tidy** data? <img src="./assets/tidydata_1.jpg" /> --- ### Generating some data In today's practical we're going to run a short experiment to generate some data. We'll then enter that data into a spreadsheet in a **tidy** way. **The Experiment** We'll be run a short Stroop task. In the Stroop task you'll see colour words printed in different colours. The words are presented under two conditions: In the **congruent condition** the colour will match the word. In the **incongruent condition** there will be a mismatch between the colour and the word. <br /> Your task is to identify the **colour the word is printed in**. At the end of the task, you'll be presented with **two** numbers. How quickly you could identify the colour in the **congruent** condition, and how quickly you could identify the colour in the **incongruent** condition. --- ### Organizing the data Once everyone has done the experiment then we'll need to enter it into the computer and organize it somehow. **Identifiers** When we collect data we will have **identifiers** that might tell us something about: 1. Who the data was collected from (subject IDs) 2. Which group the person was part of (group ID)<sup>1</sup> 3. Which condition the data was collected in (condition ID)<sup>2</sup> You'll always have subject IDs, but whether you'll also have group IDs and condition IDs will depend on whether you're employing a repeated-measures, between-subjects, or mixed design. .footnote[<sup>1</sup>We'll only need this if we have a *between-groups* or **mixed-design** design. <sup>2</sup>We'll only need this if we have a *repeated-measures* or **mixed-design** design.] --- #### Rules for identifiers There are some general rules that should be followed when *creating* identifiers. Many of these rules are the same as those that should be followed when naming *files* and *folders*. 1. Identifiers should **not** contain spaces (use underscores instead i.e., _) 1. Identifiers should all be the same number of characters long This means that if you're using sequential numbers you should pad the numbers with extra zeros i.e., 001, 002, 003, 011, 100 and not 1, 2, 3, 11, 100 1. Identifies must never start with a number (e.g., use *cond01* instead of *01*) 1. No special characters e.g., ü, é, ø, ā, æ, å, !, #, *, ~ etc 1. Try to keep the names are short as practically possible 1. Very important: **Identifiers must be unique** --- #### Some example identifiers .pull-left[ **Bad Identifies** Subject_1 Subject_10 01 02 bill smith roger bannister] .pull-right[ **Better Identifiers** subject_001 subject_002 p10292 p10293] You'll be able to use the worksheet to generate a unique participant ID and a group ID for yourself. --- #### Entering the data At the end of the experiment, and once you're generated your participant ID and group ID, you'll have four bits of data: 1. A number (a reaction time) for the **congruent condition** 1. A number (a reaction time) for the **incongruent condition** 1. Your **participant ID** 1. Your **group ID** We'll also need **condition identifiers**. For this we'll use: - `con` - `inc` For the **congruent** and **incongruent** conditions, respectively. --- ### **Tidy** data vs **messy** data When we enter the data into the spreadsheet, we're going to use the **tidy** format. .pull-left[ **Wide format** | id | con | inc | group | | ---- | ------ | ------ | -----| | p001 | 3042 | 4234 | G1 | | p002 | 4674 | 6244 | G2| | p003 | 3346 | 6048 | G1| | p004 | 3467 | 4055 | G1 | Each **row** represents one **person**] .pull-right[ **Tidy format** | id | condition | rt | group | | ---- | --------- | ----- | | p001 | con | 3042 | G1 | | p001 | inc | 4234 | G1 | | p002 | con | 4674 | G2 | | p002 | inc | 6244 | G2 | Each **row** represents one **measurement**] In the **tidy** format, each **row** holds a **measurement**, and the **columns** hold information **about that measurement** --- <div style='text-align:center;padding-top:2em;font-size:2em;'><p>I'll open up the breakout rooms now for you to do the Stroop task and get our data</p> <p>After this, I'll bring you back into the main room...</p></div> --- #### Getting the date into a file We're going to put our data into a Google Sheet. .pull-left[ - Put your participant ID in the **id** column. - Put the appropriate condition ID (**con** or **inc**) in the **condition** column - Put the measurement in the **rt** column - Put your group ID in the **group** column ] .pull-right[ <img src="./assets/sheets.png" />] --- #### Downloading the data Now that the data has been entered into the spreadsheet we can download it! 1. Go to the **File** menu 2. Select **Download** 3. Select **Comma-separated values (.csv, current sheet)** 4. Save the file as `stroop.csv` 4. Save (or move) this file to the `data` sub-folder of your project folder --- ## Working through the worksheet To work through the worksheet we'll be using a few **packages**. They should all already be installed on your computer, so you'll just need to load them. 1. **here** for working with file paths 2. **readr** for reading in the data 3. and **tibble**, because when we read the data in we'll read it into a **tibble** --- <div style='text-align:center;padding-top:5em;font-size:2em;'><p>I'll put you back into breakout rooms to work on the remaining tasks</p></div>