Distribution of the Sample Variance

From MM*Stat International

Jump to: navigation, search
English
Português
Français
‎Español
Italiano
Nederlands


Consider a population variable with and . From this population a random sample of size is drawn. The sample variance is based on the sum of squared deviations of the random variables from the mean.  We have proposed two estimators for the variance, the and Since is usually unknown and estimated by the the sample variance is calculated as Alternatively, the sample variance may also be calculated as See the entry under ”Information” for more on this version of the sample variance. The derivation of the distribution of the sample variances will be given for the case of a normally distributed population, i.e. . Under these assumptions, the  random variables are independently and identically normally distributed with and : Moreover, the sample mean is also normally distributed with and :

Distribution of the sample variance

Consider for the moment the random variable It is the sum of squares of independent standard normals, and hence has a chi-square distribution with degrees of freedom, i.e.,  Now consider and note the similarity.  By now using  as an estimator of  it can be shown that we have a sum of squares of independent standard normals in which case is chi-square with degrees of freedom.  The distribution of is a simple rescaling of . Thus we may make probability statements about . Using the properties of the chi-square distribution,  the expected value and variance of are:

Probability Statements about :

For known variance and a normally distributed population one can calculate the probability that the sample variance will take on values in a central interval with pre-specified probability Furthermore, if we want to put equal probability mass in the tails, i.e., we impose: With  degrees of freedom, the interval boundaries can be obtained from tables of the chi-square distribution Thus, Rearranging yields the probability statement:

Example for the distribution of the sample variance

To measure the variation in time needed for a certain task, the variance is often utilized. Let the time a worker needs to complete a certain task be the random variable . Suppose is normally distributed with and . A random sample of size is drawn with replacement.  The random variables () are therefore independent and identically normally distributed.

problem 1:

A random sample of size is drawn.What is the probability that the sample variance will take on values in the interval ? That is, the probability to be calculated is . To solve the problem, each side is multiplied by : Since  it follows that: The probability that will take on values between and is identical to the probability that the transformed random variable will take values between 7 and 21. The random variable is chi-square degrees of freedom.  The probability can be found by using a table of the chi-square distribution. The probability that will lie in the interval and is equal to 0.8331. The following graph shows the density function of the chi-square distribution with 14 degrees of freedom, where the symbol is a shorthand for .

problem 2:

The goal is to determine a central interval of variation for the sample variance with pre-specified probability  We assume the same population as in problem 1 and use a random sample of size Since and we again put equal probability mass in the tails: Using tables for the chi-square distribution with degrees of freedom we obtain and . Thus, With probability 0.95, the transformed random variable takes values in the interval . Rearranging gives the interval: With probability 0.95 will the sample variance takes values in the interval . The exact numerical boundaries of the interval can be determined only if the population variance of the variable is known.

More information

is known

Consider the simplifying assumption that is known and let us modify as follows: Note that the above argument does not assume a distribution for the It is only assumed that they are i.i.d. with common variance .  In this case we assume that the  are i.i.d.     Recall that the variance of a chi-square random variable with degrees of freedom has mean and variance .  Since has a chi-square distribution with degrees of freedom, it follows that: and therefore Note also that we can derive the mean of using: and therefore

is unknown

Since is typically unknown, the usual estimator of the variance given by Recall that  the variance of a random variable can be written as: This implies that Applying this result we have to the  and to we have: Furthermore Therefore, the expectation of the sample variance is given by Once again, this argument does not require the assumption of normality, only that the are i.i.d. with common variance .

In this case we assume that the  are i.i.d.    Since has a chi-square distribution with degrees of freedom, it follows that and therefore

(3) is unknown

In this case we use the to estimate the variance: Note that Hence and Note that the expectation of the is not exactly equal to the population variance which is the reason that the sample variance is usually used in practical applications.  Nevertheless, even for moderately sized samples, the two estimates will be similar.