top of page

Gemstracker and HandWristStudyGroup data

R-course: RYouReady

General introduction to R and GitHub

R-course: RYouReady

Applying R to HandWristStudyGroup data

RYouReadyCleaning2: the rows/subjects

In CleanItUp 1 you renamed and organized the variables. Now it is time to inspect the rows, in our case the patients, and filter or group patients where necessary. In the video below (10:20 min) you will get to know how to sort variables.​​

Assignment: sorting variables

  • Inspect all variables in the Example_LongFormat data frame with the 'summary' command

  • Sort in the data in Example_LongFormat based on the average pain (variable 'Vaspijngemiddeld1') from high to low. What is the maximum reported pain? And the minimal reported pain? 

  • Now do the same, first filter for all females and then for all males. Save this in a new data frame, using the select command to only save the variable 'Vaspijngemiddeld1'. What is the maximum pain of all females? And of all males? From looking at the data frame, how many males and females are there?  

#Inspect variables
summary(Example_LongFormat)

#Sort data based on average pain
data_desc <- Example_LongFormat  %>%
  arrange(desc(vasPijnGemiddeld_1)) %>%
  select(vasPijnGemiddeld_1)

#Filter for females and males
data_desc_female <- Example_LongFormat%>%
  filter(Geslacht == "F") %>%
  arrange(desc(vasPijnGemiddeld_1)) %>%
  select(vasPijnGemiddeld_1)

 

data_desc_male<- Example_LongFormat %>%
  filter(Geslacht == 'M') %>%
  arrange(desc(vasPijnGemiddeld_1)) %>%
  select(vasPijnGemiddeld_1)
 

Answers

It is important to realize the difference in R when you are considering these three almost similar sets of 2 lines of code:

Example_LongFormat%>%

  mutate(vasPijnGemiddeld_50 = vasPijnGemiddeld_1>50)

 

Example_LongFormat <- Example_LongFormat%>%

  mutate(vasPijnGemiddeld_50 = vasPijnGemiddeld_1>50)

 

new_frame <- Example_LongFormat%>%

  mutate(vasPijnGemiddeld_50 = vasPijnGemiddeld_1>50)

 

The first two lines just print the result in the console but does not store the data.

 

The second 2 lines add a variable to the data frame Example_LongFormat. You can check this with the comment view (Example_LongFormat), the last variable should now by the variable vasPijnGemiddeld_50. Or you can use names(Example_LongFormat) to just check all variables in the data frame.

 

The third 2 lines of code leave the data frame vasPijnGemiddeld_50 unchanged, but creates a new data frame called new_frame that has all variables from Example_LongFormat and with the variable vasPijnGemiddeld_50 added. Again, you can view(new_frame) or use names(new_frame)

 

Assignment: add all three lines of code to your script and see what each of them does in the console and to the variables in your global environment.

In the video below (12:10 min), we take a few extra steps in combining filters and groups to summarize data.

Assignment Group_By

  • Now again calculate the maximum pain reported by males and females, but use the group_by function. Name the maximum max_VASpain in the summarize function and also remove any missing values by adding na.rm=TRUE to your code. 

  • Use the summarize construct shown in the video to calculate maximum, minimum, average and median average pain for both males and females. ​Also name these variables in the summarize function (e.g. min_VASpain, mean_VASpain, etc.)

data_summarize <- Example_LongFormat %>%
  group_by(Geslacht) %>%
  summarize(max_VASpain = max(vasPijnGemiddeld_1, na.rm=TRUE))

 

view(data_summarize)

 

#Use the summarize construct to calculate maximum, minimum, average and median average pain for both males and females.

 

data_summarize <- Example_LongFormat %>%
  group_by(Geslacht) %>%
  summarize(max_VASpain = max(vasPijnGemiddeld_1, na.rm=TRUE),
            min_VASpain = min(vasPijnGemiddeld_1,na.rm=TRUE),
            mean_VASpain = mean(vasPijnGemiddeld_1,na.rm=TRUE),
            median_VASpain = median(vasPijnGemiddeld_1,na.rm=TRUE),
            sd_VASpain = sd(vasPijnGemiddeld_1,na.rm=TRUE))

 

view(data_summarize)

Answers

If you want to reuse the data you calculated later, you will have to save it. This is explained in the video below (6:12 min):

Assignment: save summary data frame

  • Now again calculate the maximum pain reported by males and females, but use the group_by function, and analyse it seperately for patients operated on their left and on their right side. 

  • Again, use the summarize construct shown in the video to calculate maximum, minimum, average and median average pain for both males and females operated on either left or right hand. Save the results in a new data frame called PainMalesFemalesLeftRigth.

#Analyse maximum reported pain seperately for males and females, and patients operated on ther left and on their right side

data_summarize <- Example_LongFormat %>%
  group_by(Geslacht, zijde) %>%
    summarize(max_VASpain = max(vasPijnGemiddeld_1, na.rm=TRUE))

 

#use the summarize construct shown in the video to calculate maximum, minimum, average and median average pain for both males and females operated on either left or right hand. Save the results in a new data frame called PainMalesFemalesLeftRigth.

PainMalesFemalesLeftRight <- Example_LongFormat %>%
  group_by(Geslacht, zijde) %>% 
    summarize(max_VASpain = max(vasPijnGemiddeld_1, na.rm=TRUE),
            min_VASpain = min(vasPijnGemiddeld_1,na.rm=TRUE),
            mean_VASpain = mean(vasPijnGemiddeld_1,na.rm=TRUE),
            median_VASpain = median(vasPijnGemiddeld_1,na.rm=TRUE),
            sd_VASpain = sd(vasPijnGemiddeld_1,na.rm=TRUE))

Answers

How to mutate

Sometimes it is necessary to calculate new variables based on the variables you already have. Such as calculate BMI from length and weight, or the age of someone based on the data of completing a questionnaire and the date of birth. R uses the somewhat confusing term mutate for this. This is explained in the two videos below; video 1 (10:05 min) en video 2 (8:49 min).

Assignment: Mutate 

  • Use the mutate function to add a new variable to the data frame data_long with the name pain_average, by averaging the three separate items scored by the patients ("vasPijnGemiddeld_1","vasPijnRust_1", "vasPijnBelasten_1"). Note: You have to not only use the mutate command, but also assign the variable to the data from using Example_LongFormat <- Example_LongFormat %>% mutate….

  • Add to the same pipe a variable that defines if the vasPijnGemiddeld_1 is above or below 50.

#Create variable pain_average, by averaging the three separate items scored by the patients ("vasPijnGemiddeld_1","vasPijnRust_1", "vasPijnBelasten_1")

data_long <- Example_LongFormat %>%
  mutate(pain_average = ((vasPijnGemiddeld_1 + vasPijnRust_1 + vasPijnBelasten_1)/3),na.rm = TRUE)

#Add to the same pipe a variables that defines if the vasPijnGemiddeld_1 is higher than 50

data_long <- Example_LongFormat %>%
  mutate(pain_average = ((vasPijnGemiddeld_1 + vasPijnRust_1 + vasPijnBelasten_1)/3),na.rm = TRUE) %>%
  mutate(vasPijnGemiddeld_50 = vasPijnGemiddeld_1 > 50)

      

Answers

Now that we did some basic cleaning we can go to the next step: graphically displaying your data.

VizWhiz1

bottom of page