Relation between Continuous Variables (Correlation, Correlation Coefficients)
From MM*Stat International
The common variation (covariation) of the two continuous variables and determines the strength of the relation between the two variables. Variation is measured using the dispersion or deviation of the realizations from their mean. In the first step, we center the observations: The common variation of both variables is the product of the deviations of the observations of their mean (see the calculation of the covariance): The scale on which each of the variables are measured and the number of observations can have a large impact on the magnitude of the common variation. If the mean of one of the variables is 8 and the observed value is 10, and the mean value of the other variable is 1,008 and the observed value is 1,260. Although the deviation of the first value is 2 and of the deviation of the second is 252, the relative deviation of the mean value is in both cases 25%. This fact may not have been observed if we simply calculated the common variation for this observation 504. Therefore, In order to get similar deviations of the variables, we perform a standardization: and Now, change the above equation into: We subsequently divide this sum of products by the number of observations in order to eliminate its influence. Now we have obtained the Bravais-Pearson (sample) correlation coefficient which measures the strength of the linear relation between the two continuous variables and is given by: The final parts of the above equation shows that the Bravais-Pearson correlation coefficient is equal to the variation common to both variables and (= covariance) standardized by the product of the standard deviations of each variable. The Bravais-Pearson correlation coefficient can also be written as follows:
Properties of the correlation coefficient:
The correlation coefficient only takes on values between -1 and +1:
The sign of the correlation coefficient tells us the direction of the linear relation
“+” corresponds to a positive correlation (proportional variation)
“-” corresponds to a negative correlation (inverse proportional variation)
If all observations are exactly on a straight line, the correlation coefficient is equal to .
The more the correlation coefficient approaches the value the more pronounced is the linear relation between the variables and (and the other way round).
If the variables and are independent, then the correlation coefficient is equal to 0.
On the other hand, a correlation coefficient of 0 only means that there is no linear relation between the variables and (linear independence). But it is very well possible that there exists a pronounced non-linear relation between both variables.
The correlation coefficient is symmetric:
Relation of correlation and the scatterplot of and observations
Perfect correlation (correlation coefficient = )
Strong correlation ( correlation coefficient )
Weak correlation (correlation coefficient )
A correlation of 0 corresponds ”in general” to a some kind of a circular scatterplot point cloud.
In enterprises, we observed the variables - annual profit (in Mill. DM) and - annual rent for the computer facilities (in 1,000 DM). You can see their variable values in the following table. We also illustrate them graphically in the following scatterplot.
|Company||annual profit in Mill. DM||annual rent in 1,000 DM|
From the observations, the following results can be obtained:
The sample correlation coefficient is in this example 0.8763. This points to a strong positive linear relation.
In 1985, the following variables describing criminal activity were recorded for each of the 50 states of the U.S.A.:
|-||US states region number|
|-||US states division number|
Variables and can take on the following values:
|3||South||3||E N Central|
|4||West||4||W N Central|
|6||E S Central|
|7||W S Central|
This interactive example allows you to select two variables for which a scatterplot will be drawn and the Bravais-Pearson-correlation coefficient will be calculated. In 1985, rates of criminal activity of the 50 states of the U.S.A. have been recorded, among them the rate of murder. The relationship between the murder rate and the size of the population can be visualized by a scatterplot:
The different sums of squared deviations (SSD) are calculated in the following way: Sum of the products of deviations of “population” and “murder”: Sum of squared deviations for “population": Sum of squared deviations for “murder": The sample correlation coefficient is equal to The sample correlation coefficient of 0.27 points to a weak positive linear relationship.