Relation between Discrete Variables (Rank Correlation)
From MM*Stat International
English |
Português |
Français |
Español |
Italiano |
Nederlands |
Spearman’s rank correlation coefficient
The starting point for the measurement of relationships of two discrete, or ordinal, variables and
are the ranks.
,
which are assigned to the observations
and
according to their rank. The ranks are defined so that
is equal to
for the
that takes on the largest value we have observed, is equal to
for the
that takes on the second largest value we have observed and so on.
Spearman’s rank correlation coefficient is computed from the pairs of ranks as follows:
Spearman’s rank correlation coefficient amounts to applying the Bravais-Pearson correlation coefficient to the ranks (rather than the observations themselves).
It is true that:
The Bravais-Pearson Correlation Coefficient is calculated as:
If we use the corresponding ranks
and
instead of the observations
and
themselves then we have derived Spearman’s rank correlation coefficient:
Properties of Spearman’s rank correlation coefficient:
- Spearman’s rank correlation coefficient can only take on values between -1 and +1:
.
- The rank correlation coefficient takes on the value +1 if the ranks behave exactly the same way, i.e.:
for all
.
- Spearman’s rank correlation coefficient takes on the value -1, if the ranks are perfectly opposed to each other, i.e.:
for all
.
example:
- Ranking of an athlete in downhill skiing
- Ranking of an athlete in slalom
Does there exist a relationship between the ranking in both disciplines?
athlete | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
downhill ![]() |
2 | 1 | 3 | 4 | 5 | 6 |
slalom ![]() |
2 | 3 | 1 | 5 | 4 | 6 |
![]() |
0 | 4 | 4 | 1 | 1 | 0 |
The coefficient
points to a strong relationship between the ranking in both disciplines.
Kendall’s rank correlation coefficient
Kendall’s rank correlation coefficient is based on the comparison of the order relation for all possible pairs of observations of two variables. Concordant are the pairs of variables which show the same order relation, i. e. which show for both variables a low or high value. Discordant are the pairs which show a different order relation, that is which show in one of the variables a low and in the other variable a high value. Moreover, there can be pairs of variables, which are equal in terms of one value or both values. We call this bounding.
The number of concordant pairs and discordant pairs
can be calculated as follows:
- The variable pairs
a
are sorted in increasing order of
.
- We call
the number of ranks subsequent to
which are larger than
- We call
the number of the ranks subsequent to
which are smaller than
Using the number of discordant and concordant variable pairs, we can calculate Kendall’s rank correlation coefficient:
,
with
and
.
The total number of all ranks to be compared is given by:
. The correlation coefficient can only take on values between -1 and +1:
.
An alternative way of calculating Kendall’s rank correlation coefficient is given by:
example:
Ten employees have been ranked according to their managerial abilities () and their work ethic (
). In order to make a statement about the relationship between both variables, we calculate both, Spearmans’ and Kendall’s rank correlation coefficients.
employee | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
![]() |
7 | 3 | 9 | 10 | 1 | 5 | 4 | 6 | 2 | 8 |
![]() |
3 | 9 | 10 | 8 | 7 | 1 | 5 | 4 | 2 | 6 |
![]() |
16 | 36 | 1 | 4 | 36 | 16 | 1 | 4 | 0 | 4 |
Spearman’s rank correlation coefficient
Kendall’s rank correlation coefficient
employee 5 9 2 7 6 8 1 10 3 4 1 2 3 4 5 6 7 8 9 10 7 2 9 5 1 4 3 6 10 8 6 1 6 3 0 1 0 0 1 0 3 7 1 3 5 3 3 2 0 0 ,
This example allows us to calculate Spearmans’ and Kendall’s rank correlation coefficients for two series of ranks to be input by the user. After starting the example, the number of elements of the list of ranks has to be specified. Then the series of ranks themselves have to be provided. To test, the following data set can be put in when prompted:
student | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
grade in mathematics | 1 | 4 | 5 | 1 | 3 | 2 |
grade in physics | 2 | 5 | 3 | 2 | 2 | 3 |
For these series of ranks, the program will deliver the following output
The standings of 20 athletes in the 100 Meter dash and 200 Meter dash are given in the following table:
athlete(i) | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100 meters | 5 | 7 | 3 | 13 | 2 | 15 | 19 | 14 | 12 | 1 | 6 | 20 | 17 | 4 | 18 | 11 | 10 | 16 | 9 | 8 |
200 meters | 3 | 9 | 1 | 10 | 7 | 5 | 13 | 14 | 17 | 4 | 11 | 16 | 18 | 12 | 20 | 2 | 15 | 19 | 6 | 8 |
In what follows, the statistical relationship between the standings of the athletes in the two disciplines will be determined. Since the variables are ordinally scaled (discrete) we will use Spearman’s and Kendall’s rank correlation coefficients. Calculating both coefficients gives the following results:
Spearman’s coefficient is calculated as:
The information necessary to apply the formula can be obtained from the table –
is the difference between
and
,
is the number of athletes (= 20). The calculations produce a coefficient of 0.6617, which implies a positive relationship between the standings in the two disciplines - athletes doing well in the 100 meter dash also tend to do well in the 200 meters.
To calculate Kendall’s rank correlation coefficient, one needs to determine the concordant and discordant pairs of athletes. A pair of observations (=athletes) is called concordant, if the same order relation applies to both variables and discordant if the order relations don’t agree. For instance, athletes 1 and 2 are concordant: athlete 1 has a better standing than athlete 2 in both the 100 meter dash and the 200 meter dash. Athletes 1 and 5, however, are discordant: athlete 1 is behind in the 100 meters but is ahead of athlete 5 in the standings of the 200 meter dash. Overall, there are
different pairs in this example, 138 of which are concordant while 52 are discordant. Using these numbers Kendall’s rank correlation coefficient can be calculated:
,
where
and
.
Here,
is the number of concordant pairs and
the number of discordant pairs. Kendall’s rank correlation coefficient turns out to be 0.4526 in this example, which is evidence for a positive relationship between the standings.