Gemstracker and HandWristStudyGroup data

R-course: RYouReady

General introduction to GitHub

Applying R to HandWristStudyGroup data

4.7 Propensity score matching

What is the propensity score?

By Maud ten Heggeler.

The propensity score is the probability of receiving a treatment or intervention conditional on observed variables. By using the propensity score you can simulate some of the characteristics of a randomized controlled trial. There are multiple propensity score methods: stratification on the propensity score, using it as a covariate in regression models, and matching on the propensity score. The final technique, Propensity score matching (PSM), will be discussed in this section. The goal of PSM is to form matched sets of treatment and control (or untreated) patients who share a similar value of the propensity score. See the image below for a simplified representation of propensity score matching.

propensity score matching.png

Before matching

Before you can start matching your patients, it is important that your dataset is ready for the match. In addition to cleaning and structuring the dataset, it is necessary that you have a binary variable that reflects, for example, the treatment status (treatment 1 of treatment 2). Second, you must determine on which variables you want to match the patients, taking clinical relevance into account. Finally, we will use the MatchIt package in this section. This function cannot handle missing values; therefore, it is necessary that the dataset no longer contains any missing values.

We will discuss the way to a propensity score match below in 3 steps using the MatchIt package.

3 steps to a Propensity score match using the MatchIt package​

Step 1. Installing the packages




Step 2. The PSM itself

The propensity score match is just one click away with the matchit() function from the MatchIt package you just installed, but there are a few arguments you need to specify inside this matchit() function:


The matchit function looks like this:

matchit(formula, data, method = "nearest", caliper = “”)


What do the arguments mean?

The formula is specified as follows; treat ~ X, ... where treat is the vector of treatment assignment and X are the covariates you want to use in the matching.

The data argument is the dataset you are using.

The method argument is set to “nearest neighbour” by default, but there are other matching methods (e.g. “optimal”, “exact”) available which you can choose from.

Finally, you can specify a caliper width, with the caliper argument. The caliper stands for the standard deviations of the propensity score within which to draw control units. In other words, if there is no “good” control patient to match to a treatment patient, that patient will be excluded from the match. Austin et al.(1) wrote an article about the best method and caliper to use in PSM, if you want to fully delve into it.


1. Austin PC. A comparison of 12 algorithms for matching on the propensity score. Stat Med. 2014;33(6):1057-1069. doi:10.1002/sim.6004


3. Checking the match: balance

After you have matched patients based on the propensity score, it is important to see what this match looks like and whether the covariates you included in the match are equally distributed, i.e. “balanced”. There are several ways in R to represent the balance of the match both visually and numerically. A simple jitter plot will show you the distribution of the propensity scores of all patients and gives a rough view of the propensity scores of patients that are matched and patients that are excluded from the match.

propensity score.png

However, the standardized mean difference(SMD) is usually used to see if the variables are equally distributed over both groups. A SMD below 0.1 is considered as an acceptable difference between two values from both groups.

In the assignments below you will create a jitter plot and determine the smd of the covariates in your match.


Time to get started with the Gemstracker data!



1. Install and load the MatchIt package

2. Match your patients based on treatment using the matchit() function from the MatchIt package. Match patients from both treatment groups on the following characteristics: Age, Sex, BMI, Smoking status, Alcohol intake, Duration of complaints and baseline VAS-pain. Use the nearest neighbor method and set the caliper to 0.2.

3. Inspect the results of your match and check how many patients could be matched by using this algorithm.


4. Visualize your match with a simple jitter plot

5. Get your data from the match with the function

6. Create a TableOne using the dataset from assignment 5 to inspect your match in more detail. Are the variables equally distributed between both groups?



#1.Installing and loading the MatchIt package



#2. Matching patients
Match_Weilby <- matchit(treatment  ~  Age+Sex+BMI+Roken+Alcohol+Hoe_lang_klachten+VAS_pijn  data = data_set, method="nearest", ratio= 2, caliper =  0.2)

#3.Inspect the result


#4. Visualize your match with a simple plot
plot(Match_Weilby, type = "jitter")

#5 Get your data from the match with the function <-

#6 Create a table one

Var_One <- c("Age", "BMI", "Geslacht", "Roken", "Alcohol", "Hoe_lang_klachten", “VAS_pijn”)

CatVar_One <- c("Geslacht", "Roken", "Alcohol")


TableOne_Weilby_match<- CreateTableOne(vars = Var_One, strata = "treatment", data =,test = TRUE, factorVars = CatVar_One)


#print TableOne, make sure to set smd to TRUE

print(TableOne_Weilby_match, smd = TRUE)

Congratulations, you just did your first propensity score match!