STATISTICS with R
A comprehensive guide to statistical analysis in R
Kendall Tau Correlation in R
The Kendall tau-b correlation is a nonparametric statistical method that is used to measure the relationship between two ordinal random variables. The Kendall tau-b coefficient measures the change in rank values. If ranks in the second variable increase consistently, a concordance (agreement) count is calculated. If the ranks in the second variable do not increase consistently, a discordance count is calculated. Ranks that do not change are called ties and are taken into calculation.
Introduction to Kendall Tau Correlation
When there are several random variables in a study or a data set, some of those variables may be related to each other. For example, imagine a life coach collects data on the frequency of physical exercise (none, sometimes, often) and how optimistic the participants are about their lives (low optimism, medium optimism, high optimism). In her anecdotal experience, the life coach believes that there might be a positive relationship between exercise frequency and optimism. Therefore, she tries to quantify this relationship using a correlation test. But what is the appropriate correlation for two ordinal variables?
In this example, the collected data have an order between their values. For example, for the variable Exercise frequency, the order is in such a way that none < sometimes < often. If we assign numerical values to these data, we can order them by number: none (0) < sometimes (1) < often (2). With the same logic, Optimism level can be expressed in ordinal numbers: low (0) < medium (1) < high (2).
When the data are ordinal (i.e., they are arranged in an order of levels), we can use Kendall tau-b correlation coefficient to measure the relationship between two ordinal variables (there are tau-a and tau-c versions of Kendall correlation, as well, but for slightly different data requirements).
Like other correlation coefficients, the Kendall tau-b correlation shows the strength of a correlation (ranging between -1 and +1). A correlation of 0 means the two variables are totally unrelated to each other. A positive correlation means that the values of both variables change together: either increasing together or decreasing together. However, if the values of one variable increase while the values of the other variable decrease, the correlation value will have a negative sign.
Kendall Tau Example
Is there a relationship between the frequency of exercise (none, sometimes, often) and optimistic outlook (low, medium, high)?

A life coach is interested to know if there is a relationship between how often individuals exercise (none, sometimes, often) and how optimistic they are about their lives (low, medium, high). The life coach randomly recruits 15 participants who on a questionnaire indicated that they exercise weekly and how often. In addition, the life coach asks them to rate their optimistic outlook on a scale of 0 to 2 (0: low, 1: medium, 2: high level of optimism). Table 1 includes the frequency of exercise and the levels of optimism for five participants in the study.
| Participant | Exercise Frequency | Optimism Level |
|---|---|---|
| Participant 1 | Never | Low |
| Participant 2 | Never | Low |
| Participant 3 | Never | Low |
| Participant 4 | Never | Medium |
| Participant 5 | Never | Medium |
| … | … | … |
The life coach enters the data in a spreadsheet program in the computer and saves the data in the CSV format. The complete data set for this example can be downloaded from here.
Analysis: Kendall Tau Correlation in R
In the first step, the life coach reads the data into the RStudio program using the read.csv() function. After reading in the data, the life coach reviews the data set and decides to create two variables, one for the frequency of exercise (exerciseFrequency) and one for the optimism level (optimismLevel).
To calculate the Kendall tau-b correlation coefficient between the frequency of exercise and the optimism level, we can use the cor.test() function in R. The cor.test() is a built-in R function, and therefore, we do not need to install and load a package to use this function. The following code in Listing 1 shows how to perform a Kendall tau-b correlation between two ordinal random variables.
dfOptimism <- read.csv("dsExerciseOptimism.csv")
exerciseFrequency <- dfOptimism$Exercise
optimismLevel <- dfOptimism$Optimism
corrExerciseOptimism <- cor.test(exerciseFrequency, optimismLevel,
method = "kendall",
alternative = "two.sided")
print(corrExerciseOptimism)
Kendall's rank correlation tau
data: exerciseFrequency and optimismLevel
z = 2.575, p-value = 0.01002
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.6166698In this code, the parameters of the cor.test() function include the names of the two variables (Exercise frequency and Optimism level), the statistical method (we asked for Kendall correlation because we have two ordinal variables), and the null hypothesis (two-sided, which means we assume there is no relationship between the two variables and hence tau = 0).
The results of the Kendall tau-b correlation in Listing 1 include the z value (the normalized test statistic), the approximate p-value, and the Kendall tau-b correlation coefficient tau (shown as sample estimates: tau). In this example, the Kendall tau-b correlation is 0.62 (rounded), is positive, and is statistically significant. This implies that there is a moderate relationship between the frequency of exercise and optimism level.