Gemstracker and HandWristStudyGroup data
R-course: RYouReady
General introduction to GitHub
Applying R to HandWristStudyGroup data
4.2 Visualization
Below, we will present several examples of typical plots that you can make and present examples for you to practice with.
Development of scores over time
In this screencast, Ruud Selles will give you insight into how to create the following plot:
In this plot we see the EQ5D-index score (a measure for quality of life) of patients after the lockdown during the COVID-19 pandemic in the Netherlands on March 23 2020. In addition, we can compare the EQ5D-index scores of these patients with patients from the same time period in 2018 and 2019. Looking at the graphs, we can see that there are no obvious differences between these time periods in EQ5D-index scores.
This plot can be found in the following article:
In the following screencast, Ruud Selles will show you how the graph is made.
The R-code for this graph can be found here:
#load packages --------------------
library(tidyverse)
library(ggplot2)
#manually load the following data file: ------------------
load("Insert the location of your data file")
#select only the two variables that we need for this analysis
data_analyses <- data_analyses %>%
select(Invuldatum_EQ5D, EQ5D_index)
#dataset for lockdown in 2020
data_analyses_2020 <- data_analyses %>%
filter(Invuldatum_EQ5D >= "2020-03-23"& Invuldatum_EQ5D <= "2020-05-04") %>%
mutate(MeasurementYear = as.factor(2020)) %>%
mutate(DaysFromLockdown = Invuldatum_EQ5D-as.Date(c("2020-03-23")))
#dataset for lockdown in 2018
data_analyses_2018 <- data_analyses %>%
filter(Invuldatum_EQ5D >= "2018-03-23" & Invuldatum_EQ5D <= "2018-05-04") %>%
mutate(MeasurementYear = as.factor(2018)) %>%
mutate(DaysFromLockdown = Invuldatum_EQ5D-as.Date(c("2018-03-23")))
#dataset for lockdown in 2019
data_analyses_2019 <- data_analyses %>%
filter(data_analyses$Invuldatum_EQ5D >= "2019-03-23" & data_analyses$Invuldatum_EQ5D <= "2019-05-04") %>%
mutate(MeasurementYear = as.factor(2019)) %>%
mutate(DaysFromLockdown = Invuldatum_EQ5D-as.Date(c("2019-03-23")))
#combine the three datasets that we just created
data_analyses_all <- rbind(data_analyses_2020, data_analyses_2018,data_analyses_2019)
#plot Q5D with facets ---------
EQ5D_facets <- ggplot(data_analyses_all, aes(x = DaysFromLockdown, y= EQ5D_index)) +
facet_wrap(data_analyses_all$MeasurementYear) +
labs(x="Days since March 23, 2019", y="EQ5D index score") +
geom_jitter(color = "black") +
geom_smooth(method = 'lm') +
scale_y_continuous(limits = c(-0.05,1.05), breaks = seq(0,1.0,0.1)) +
scale_x_continuous(limits = c(-0.5,60), breaks = seq(0,10,60)) +
theme_minimal() +
theme(axis.text.x = element_text(size = 9), axis.text.y = element_text(size = 8), axis.title.x = element_text(size = 12, face = "bold"), axis.title.y = element_text(size = 12, face = "bold"), legend.title = element_text(size = 12, face = "bold"), legend.text =element_text(size= 10))
EQ5D_facets
#Save the plot
ggsave("EQ5DTimefacets.png")
Assignment
After watching the screencast and the code for this plot, try to apply this code to the example dataset:
-
Using the example dataset, create a similar plot to see if the pain scores during a summer differ from a winter
-
Play around with geom_smooth to see the best visualization of the change over time in each winter and summer
#Create a similar plot to see if the pain scores during 2 summers versus 2 winters in the dataset differ
-----
Plots with multiple timepoints
In this next screencast, Mark van der Oest will tell more about plots with different timepoints visualized.
Mark uses for his work functions that he has written himself. If you want a very short introduction of what functions are in R, take, for example, a look at this video.
You can find the R-script for this kind of plots here:
#make sure you have a dataset in long format, with a uniform name for rounddescription for all timepoints you want to plot
#see example below
# data_alles_long_status <- data_alles_long %>% # wide data on alle timepoints
# inner_join(data_alles_status_wide %>%
# select(Patient.traject.ID)) %>% # only with patients you included
# mutate(rounddescription = rounddescription.x) # with a uniform rounddesriptoin column
#load packages and functions from Mark
source("......../PulseR/Personal stuff/Mark O/Functions_mark.R")
#used the agg_data_graph function
#the inputs are:
# outcome: name of your outcome mesure, will be on the y axis
# time_points: select the timpoints you want, will be on de x axis
# labels: assigns labels to the x axis
#make aggregated datasets for different timepoints
data_totaal <- agg_data_graph(outcome = "totaalscore",
time_points = c("Intake", "3 maanden", "12 maanden"),
labels = c("pre-operative", "3 months post-operative", "12 months post-operative"), data_long = data_alles_long_status )
data_pijn <- agg_data_graph(outcome = "pijnscore",
time_points = c("Intake", "3 maanden", "12 maanden"),
labels = c("pre-operative", "3 months post-operative", "12 months post-operative"), data_long = data_alles_long_status )
data_functie <- agg_data_graph(outcome = "functiescore",
time_points = c("Intake", "3 maanden", "12 maanden"),
labels = c("pre-operative", "3 months post-operative", "12 months post-operative"), data_long = data_alles_long_status )
# combine into 1 dataframe
data_graph <- data_totaal %>%
rbind(data_functie) %>%
rbind(data_pijn) %>%
mutate(scale = c(rep("Total score", 3),
rep("Function score", 3),
rep("Pain score", 3)))
data_graph$scale <- factor(data_graph$scale, levels = c("Total score", "Pain score", "Function score"))
ggplot(data_graph, aes(x = rounddescription, y = mean)) +
geom_line(aes(group = scale, colour = scale), size = 1.5) +
geom_errorbar(width=.1, aes(ymin=min, ymax=max, colour = scale)) +
geom_point(aes(colour = scale), shape=21, size=1.7) +
geom_label(aes(label = paste("N = ", N, sep ="")), nudge_y = -10, data = data_functie) +
ylab("mean PRWE score") +
ylim(0,75) +
xlab("") +
scale_color_grey() +
theme_classic() +
theme(axis.text = element_text(size = 12),
text = element_text(size = 12))
Survival analysis plots
In this next screencast, Mark van der Oest will tell more about plots for survival analysis. These plots are essential when for example, presenting the return to work after treatment or the duration of cancer-free survival.
You can find the R-script for this kind of plots here:
source("........./PulseR/Personal stuff/Mark O/Functions_mark.R")
load("Your datafile")
library(survival)
# location <- choose.dir() # save plot
RTW <- Pols_lang$RTW %>%
RTW_bewerking()
InV <- Pols_lang$InV_Pols_lang %>%
Process_intake_vervolg() %>%
filter(behandeling == "TFCC reïnsertie")
PRWE_0 <- Pols_lang$PRWHE %>%
filter(rounddescription == "Intake")
Behandeldatum <- Pols_lang$Behandeldatum
data_wide <- InV %>%
inner_join(PRWE_0, by = c("Respondent.ID","Patient.traject.ID")) %>%
inner_join(RTW, by = c("Patient.traject.ID")) %>%
inner_join(Behandeldatum, by = c("Respondent.ID", "Patient.traject.ID"))
KM <- survfit(Surv(EventTime, Event) ~ zwaarteBeroep, data = data_wide)
setwd(location)
png("survival per beroep.png", height = 5, width = 8, units = "in", res = 900)
plot(KM, lty = c(1,2,9), fun = "event",
ylim = c(0,1),
ylab = "% Return to work",
xlab = "Time (Weeks)", col = c(1,2,3))
segments(-5,0.5,16,0.5, lty = 2)
segments(7,-1,7,0.5, lty = 2)
segments(16,-1,16,0.5, lty = 2)
segments(15,-1,15,0.5, lty = 2)
legend("bottomright",
c("Light physical labor",
"Moderate physical labor",
"Heavy physical labor"),
lty = c(1,2,9), col = c(1,2,3))
dev.off()
quantile(KM, probs = c(0.05,0.25,0.5,0.75,0.95))
quantile(KM)
print(KM, print.rmean=TRUE)
Creating advanced boxplots