Gemstracker and HandWristStudyGroup data
General introduction to R and GitHub
Applying R to HandWristStudyGroup data
4.1 Data manipulation
Working with dates
In this part, we will show you how R handles dates and how you could work with them in your own research project with HandWristStudyGroup data.
By default, R works with dates in the so-called ISO 8601 format, which means the dates are formatted from the largest time measurement (year) to the smallest time unit (day). For example, the 28th of December 2020 would be 2020-12-28 in ISO 8601 format.
To make things easy, dates in the HandWristStudyGroup data are already recognized as dates in R when you import a data set. However, you could also check if a variable is recognized as a date by using the str() function in R.
If R recognizes a date as a string instead of a date, you can use the as.date() function. Remember that, to use this function, your date should be in the ISO 8601 format for R to recognize your date with the different parts separated by"-" or "/".
If your dates are stored in a different format, you can use the lubridate package which is able to recognize many different date formats in R. For example, you can use the following code to change a date in string format to a date format:
ymd("2013-12-08"), this tells R that 2013 is the year, 12 is the month, and 8 is the day of the month.
dmy("27/02/2015"), this tells R that 27 is the day of the month, 2 is the month, and 2015 is the year.
For more information on the lubridate package, you can check the times and dates cheatsheet:
Dates can be used for a variety of different applications. For example, you could check the value of variables over time. We will come back on this in the part data visualization.
You can calculate with dates too. Check how many days there are between June 21 and December 28 in the year 2020.
Check if the date of the treatment in the example dataset is correctly configured as a date or not.
We see that the date of treatment is already formatted as a date, but let's pretend this is a string variable that we would like to change to a date using the lubridate package. First, check the format of the date of treatment.
Use the lubridate package to change date of treatment to a date.
We can use the dates in plots as well. Create a simple histogram (geom_histogram) to see the number of treatments conducted over time in the dataset. Set the color of the histogram to blue and the fill to grey.
Now make the same plot but then zoomed in on the time period from 01-01-2018 to 01-01-2019
#Calculate the number of days between June 21 and December 28th in the year 2020
as.Date("2020-12-28") - as.Date("2020-06-21")
#Check if the date of treatment is a date---
data_long$behandelingDatum %>% str()
#Check the format of the date of treatment---
data_long$behandelingDatum %>% view()
#Use the lubridate package to change the date of treatment to a date---
data_long$behandelingDatum %>% ymd()
#create a simple histogram to see the amount of treatments conducted over time---
ggplot(data_long, aes(x = behandelingDatum))
+ geom_histogram(color="blue", fill = "grey")
#Now make the same histogram but then for the year 2019 specifically---
ggplot(data_long, aes(x = behandelingDatum)) +
geom_histogram(color="blue", fill = "grey") +