Gemstracker and HandWristStudyGroup data

General introduction to R and GitHub

R-course: RYouReady

Applying R to HandWristStudyGroup data

RyouReadyBasics

In BasicBasics, you will learn the basic principles of R and RStudio. By the end of the lesson, you should have written the first lines of an R script, which will import the data of the treatment outcomes of patients undergoing a trapeziectomy with LRTI ('Weilby) surgery. 

 

First, download and install R and RStudio as explained in Part 2 of this course. Then, watch the video below (14:42 min) for an introduction to the environment of RStudio.

Assignment

  • After watching the video, make a new R project on your own PC with the Name RYouReady. Within this project, create subfolder with the names "Data" and "Scripts". 

  • Make a new script with the name RYouReadyBasics and save this script in the folder Scripts. 

  • Note that this is an example; if you also work with the GitHub repository for this course, you now have two projects. With File -> Open Project, you can now go back to the course project. 

  • For this course, we propose to continue working in the course project in combination with GitHub

Interlude: efficient documentation of files and folders

 

When you are programming, it is important to give clear and useful names to your files. Avoid making files with names such as ‘document’, ‘script’, or ‘report’. This takes more time and thought at the beginning, but will be very helpful for the future you. Furthermore, it will be a lot easier for somebody else to understand your files.

In addition, it is advisable to organize your files into a folder system, in which the folders also have informative names and are well-categorized.

Lastly, every computer may crash; so always save your files in a folder with some form of backup or cloud storage (OneDrive, Google Drive, DropBox, etc.).

In the image above, you see a screenshot of the four regions in RStudio with a brief explanation. In the RYouWithMe course, they advise you to change two settings. The image below shows you how to do so. 

It is recommended to change your settings as advised in the RYouWithMe course:

Packages


The software R is based on a core module where you, each time, add specific packages that allow you to perform a specific calculation. A package, also called a library, is a set of R functions that all serve a specific goal, such as designing beautiful plots or performing regression analysis. As with everything in R, it is open sources; Packages can be developed by anyone who wants to do this.  

 

In this course, we use a limited number of commonly-used packages. A key package that we use is called Tidyverse; this is a combination (a whole uniVERSE) of packages that all analyze data that organized in a specific way. In the video below (8:33 min), this is explained in more detail.

Video Loading Packages

Assignment

  • Install the packages mentioned in the video (Tidyverse en Here) and add a section to your new script with the title 'load packages'. In this section, load both packages. 

#Load packagess----
library(tidyverse)
library(here)
 

Answers

Developing repeatable scripts

The repeatability of your analysis is crucial to coding; with R in general, but even more so in health sciences. You may frequently want to export the data that you collect in clinical care and then run your code again on your latest export. Therefore, it is very useful to always start your script by clearing all variables from your environment, followed by (re-)importing your latest dataset. This avoids situations where you continue to work with data in your workspace and, later, you are not able to exactly recreate the dataset and redo the analysis. 

Assignment

  • Start your new script with the remark: "#clear the workspace", and on the next line write "rm(list = ls())". Once you have data in your global environment and you run this line, you will notice that everything in your workspace is removed. 

#clear the workspace----

rm(list = ls())

Answers

Now watch the video below (11:16 min). It shows how to import CSV data and inspect the imported data.

Loading and understanding the example dataset for this study

In the by RLadies, the data that are imported are in CSV format. For this part of the course, you will use data stored in the Rdata format. In the video below (9:11 min), the example data that are used for the current course are explained. In the video, it is explained how to use this using the GitHub. If you have not yet worked with GitHub, you can just go to this link, right-click the 2 files and download them to your local computer with the 'save link as' option. If you want to learn more already about the GitHub link for this course, go here. For those new to R and GitHub, we recommend doing this later. 

Assignment

 

  • Import the example data that you downloaded to your computer in your global environment. There are 2 ways to import the data. The first way is to go to your global environment section (Q3), en manually click the open file icon and load the file. If you do so, you should see 2 things: 

    • There should be two data frames in your global environment with the names Example_LongFormat and Example_WideFormat

    • In the console (Q2), you should see the code that is needed to load the file in a script.  

  • Copy the line of code to load the data from your console to the script, in a new section called "load data".

  • Run the script, that will now first empty your workspace and then load the data. You have now loaded the data in the second way.

#Clear the workspace and open the data----

rm(list = ls())

load("~/R/RCourseHWStudyGroup/data/Example_LongFormatHashed.RData")
load("~/R/RCourseHWStudyGroup/data/Example_WideFormatHashed.RData")

Answers

Good to know


If you close R, you can decide to save your global environment. If you do so, the next time you open R, your data are loaded again. You do, however, have to load all packages again (installing them is not needed). If you forget to load packages, R will not recognize specific functions loaded in the library and return an error if you run the code. 

If you really script every step you take in R and save your script, you should not need to save your global environment; you can just open R, run the script and recreate all data in the global environment. Therefore, your raw data and your scripts are the most important files you have. Everything else you should be able to recreate in minutes or even seconds!

This ends the Basics; you should now have a script that loads libraries and the example dataset in R as a basis for further analysis. 

Next step: RYouReadyCleaning1