Sampling Distribution of the Mean

 English Português Français ‎Español Italiano Nederlands

The distribution of a statistic (which is itself a function of the sample) is called a sampling distribution. Statistics are used for estimating unknown population characteristics or parameters and for testing hypotheses. These tasks involve probability statements which can only be made if the sampling distributions of the statistics are known (or can be approximated). For the most important statistics, we now present in each case the sampling distribution its expected value and variance.

Distribution of the sample mean

Consider sampling from a population with distribution function ${\displaystyle F(x)}$, expected value ${\displaystyle E(X)=\mu }$ and variance ${\displaystyle Var(X)=\sigma ^{2}.}$ One of the most important statistics is the sample mean. The sample mean (or sample average) is given by: ${\displaystyle {\bar {x}}={\frac {1}{n}}\sum \limits _{i=1}^{n}x_{i}}$

Expected value, variance and standard deviation of the sample mean

Expected value, variance and standard deviation of the sample mean are given by:

1. ${\displaystyle E({\bar {x}})=\mu }$${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})={\frac {\sigma ^{2}}{n}}\cdot {\frac {N-n}{N-1}}}$${\displaystyle \sigma ({\bar {x}})={\frac {\sigma }{\sqrt {n}}}\cdot {\sqrt {\frac {N-n}{N-1}}}}$ The factor ${\displaystyle {\frac {N-n}{N-1}}}$ is called the finite sample correction.

If the population variance ${\displaystyle Var(X)=\sigma ^{2}}$ is unknown it has to be estimated by the statistic ${\displaystyle s^{2}.}$ In the above formulas ${\displaystyle \sigma ^{2}}$ is replaced by ${\displaystyle s^{2}}$ which leads to an estimator of the variance of the sample mean given by:

• for a simple random sample: ${\displaystyle {\widehat {\sigma ^{2}}}({\bar {x}})={\frac {s^{2}}{n}}}$
• for a random sample without replacement ${\displaystyle {\widehat {\sigma ^{2}}}({\bar {x}})={\frac {s^{2}}{n}}\cdot {\frac {N-n}{N-1}}}$

These results for the expectation and variance of the sample mean hold regardless of the specific form of its sampling distribution.

Distribution of the sample mean

The sampling distribution ${\displaystyle F({\bar {x}})}$ of the sample mean is determined by the distribution of the variable ${\displaystyle X}$ in the population. In each case below we assume a random sample with replacement.

1. It is assumed that ${\displaystyle X}$ is normally distributed with expected value ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$, that is, : ${\displaystyle X\sim N(\mu ,\,\sigma ^{2})}$

1. The population variance ${\displaystyle \sigma ^{2}}$ is known;   in this case ${\displaystyle {\bar {x}}}$ has the following normal distribution:${\displaystyle {\bar {x}}\sim N(\mu ,\,\sigma ^{2}({\bar {x}}))=\,\,\,N(\mu ,\,{\frac {\sigma ^{2}}{n}})}$ and the standardized random variable ${\displaystyle z={\frac {{\bar {x}}-\mu }{\sigma ({\bar {x}})}}={\sqrt {n}}{\frac {{\bar {x}}-\mu }{\sigma }}}$ has the standard normal distribution ${\displaystyle z\sim N(0;1)}$.

2. The population variance ${\displaystyle \sigma ^{2}}$ is unknown In this case, it may be estimated by ${\displaystyle \ s^{2}}$ The transformed random variable: ${\displaystyle t={\sqrt {n}}{\frac {{\bar {x}}-\mu }{s}}}$

has a tabulated distribution with parameter the ’degrees of freedom’ which equals ${\displaystyle n-1}$.  This distribution is called the and it is usually denoted by ${\displaystyle t_{n-1}}$ .

As ${\displaystyle n}$ increases, the t-distribution converges to a standard normal.  Indeed the latter provides a good approximation when ${\displaystyle n>30.}$

2. This is the most relevant case for applications in business and economics since the distribution of many interesting variables may not be well approximated by the normal or its specific form is simply unknown.

Consider ${\displaystyle n}$ i.i.d. random variables ${\displaystyle X_{1},\dots ,X_{n}}$ with unknown distribution.  The random variables have expectation ${\displaystyle E(X_{i})=\mu }$ and variance ${\displaystyle Var(X_{i})=\sigma ^{2}.}$ According to the the following propositions hold:

• If ${\displaystyle \sigma ^{2}}$ is known, then the random variable ${\displaystyle z={\sqrt {n}}{\frac {{\bar {x}}-\mu }{\sigma }}}$ is approximately standard normal for sufficiently large ${\displaystyle n}$.

• If ${\displaystyle \sigma ^{2}}$ is unknown, then the random variable ${\displaystyle t={\sqrt {n}}{\frac {{\bar {x}}-\mu }{s}}}$ is also approximately standard normal for sufficiently large ${\displaystyle n}$.

As rule of thumb, the normal distribution can be used for ${\displaystyle n>30}$. If ${\displaystyle X}$ is normally distributed with known ${\displaystyle \mu }$ and ${\displaystyle \sigma ^{2}}$ so that ${\displaystyle {\bar {x}}}$ also follows the normal distribution then the calculation of probabilities may be done as in Chapter VI.  Calculations hold approximately if ${\displaystyle X}$ is arbitrarily distributed and ${\displaystyle n}$ is sufficiently large. More generally, if the distribution of ${\displaystyle X}$ is not normal, but is known, then it is in principle possible to calculate the sampling distribution of ${\displaystyle {\bar {x}}}$ and the probabilities that falls in a given interval (though the results may be quite complicated).

Weak law of large numbers

Suppose ${\displaystyle X_{1},\dots ,X_{n}}$ are ${\displaystyle n}$ independent and identically distributed random variables with expectation ${\displaystyle E(X_{i})=\mu }$ and variance ${\displaystyle Var(X_{i})=\sigma ^{2}}$. Then, for each ${\displaystyle \epsilon >0}$ it holds that: ${\displaystyle \lim _{n\rightarrow \infty }P(|{\bar {x}}_{n}-\mu |<\epsilon )=1\,\,.}$ This can be shown as follows:According to it holds that ${\displaystyle P(|{\bar {x}}_{n}-\mu |<\epsilon )\geq 1-{\frac {\sigma ^{2}({\bar {x}})}{\epsilon ^{2}}}.}$ After inserting ${\displaystyle \sigma ^{2}({\bar {x}})=\sigma ^{2}/n:}$ ${\displaystyle P(|{\bar {x}}_{n}-\mu |<\epsilon )\geq 1-{\frac {\sigma ^{2}}{n\epsilon ^{2}}}}$ If ${\displaystyle n}$ approaches infinity the second term on the right hand side goes to zero. Implication of this law:With increasing ${\displaystyle n}$, the probability that the sample mean ${\displaystyle {\bar {x}}}$ will deviate from its expectation ${\displaystyle \mu }$ by less than ${\displaystyle \epsilon >0}$ converges to one. If the sample size is large enough the sample mean will take on values within a pre-specified interval ${\displaystyle [\mu -\epsilon ;\mu +\epsilon ]}$ with high probability, regardless of the distribution of ${\displaystyle X}$.

Enhanced example for sampling distributions

This example is devoted to formally explaining the sampling distribution of the sample mean, its expectation and variance. To this end, certain assumptions must be made about the population. In particular, it is assumed that the mean hourly gross earnings of all 5000 workers of a company equals $27.30 with a standard deviation of$5.90 and variance of  $34.81. Problem 1: Suppose that the variable ${\displaystyle X}$ = “Gross hourly earnings of a (randomly selected) worker in this company” is normally distributed. That is, ${\displaystyle X\sim N(27.3;34.81)}$. From the population of all workers of this company, a random sample (with replacement) of ${\displaystyle n}$ workers is selected. The sample mean gives the average gross hourly earnings of the ${\displaystyle n}$ workers in the sample. Calculate the expected value, variance, standard deviation and find the specific form of the distribution of ${\displaystyle {\bar {x}}}$ for the following sample sizes: 1. ${\displaystyle n=50}$ Regardless of ${\displaystyle n}$, the expected value of ${\displaystyle {\bar {x}}}$ is ${\displaystyle E({\bar {x}})=\mu =\27.30}$ The variance of the sample mean is equal to ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=\sigma ^{2}/n\,.}$ Thus, ^{2}(\bar{x})=5.9^{2}/10=34.81/10=3.481[/itex]${\displaystyle \sigma ({\bar {x}})=\1.8657}$. ^{2}(\bar{x})=5.9^{2}/50=34.81/50=0.6962[/itex]${\displaystyle \sigma ({\bar {x}})=\0.8344}$ {x})=\$0.4172[/itex]. Obviously, the standard deviation of ${\displaystyle {\bar {x}}}$ is smaller than the standard deviation of ${\displaystyle X}$ in the population. Moreover, the standard deviation of ${\displaystyle {\bar {x}}}$ decreases from 1.8657 to 0.8344 and to 0.4172, as the sample size is increased from 10 to 50 and eventually to 200. Increasing the sample size by a factor of five cuts the standard deviation roughly by half. Increasing the sample size twentyfold reduces the standard deviation by more than 3/4.

Since ${\displaystyle X}$ is assumed to be normally distributed it follows that the sample mean ${\displaystyle {\bar {x}}}$ is also normally distributed under random sampling with replacement, regardless of the sample size.

Thus:

1. for random samples of size ${\displaystyle n=10}$${\displaystyle X\sim N((27.3;3.481)}$The red curve in the graph corresponds to the distribution of ${\displaystyle X}$ in the population while the blue curve depicts the distribution of the sample mean ${\displaystyle {\bar {x}}}$.

2. for random samples of size ${\displaystyle n=50}$${\displaystyle X\sim N(27.3;0.6962)}$

3. for random samples of size ${\displaystyle n=200}$${\displaystyle X\sim N(27.3;0.17405)}$

Problem 2:

Suppose that the variable ${\displaystyle X}$ = “gross hourly earnings of a (randomly selected) worker of this company” is normally distributed. Hence, ${\displaystyle X\sim N(27.3;34.81)}$. A sample of size ${\displaystyle n}$ is randomly drawn without replacement. The sample mean gives the gross hourly earnings of the ${\displaystyle n}$ workers in the sample. Calculate the expected value, variance, and standard deviation of ${\displaystyle {\bar {x}}}$ for the following sample sizes:

1. ${\displaystyle n=50}$
All random samples without replacement, regardless of ${\displaystyle n}$, have the same expected value as in the first problem: ${\displaystyle E({\bar {x}})=\mu =\27.30}$
In the case of sampling without replacement, the variance of the sample mean is reduced by a ’finite sample correction factor’.Specifically, the variance of the sample mean is given by ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})={\frac {\sigma ^{2}}{n}}\cdot {\frac {N-n}{N-1}}\,.}$ However, the finite sample correction can be neglected if ${\displaystyle n}$ is sufficiently small relative to ${\displaystyle N}$  for example if ${\displaystyle n/N\leq 0.05}$.

Thus, {x})=5.9^{2}/10=34.81/10=3.481[/itex]${\displaystyle \sigma ({\bar {x}})=\1.8657}$.In comparison, the finite sample correction yields ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=3.4747}$ and ${\displaystyle \sigma ({\bar {x}})=\1.8641}$, which demonstrates the negligibility of the correction. {x})=\sigma^{2}(\bar{x})=\sigma^{2}/n[/itex]. This leads to the same result as in problem 1:${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=5.9^{2}/50=34.81/50=0.6962}$${\displaystyle \sigma ({\bar {x}})=\0.8344}$  which is very similar to the finite sample corrected result ${\displaystyle \sigma ({\bar {x}})=\0.8303}$ Var(\bar{x})=\sigma^{2}(\bar{x}) & =\frac{\sigma^{2}}{n}\cdot\frac{N-n}{N-1}\\ & =\frac{5.9^{2}}{1000}\cdot\frac{5000-1000}{5000-1}=0.0279\ \\ \sigma(\bar{x}) & =\\$0.1669.\end{align}[/itex]

Problem 3:

Suppose that, more realistically, that the distribution of ${\displaystyle X}$ = “gross hourly earnings of a (randomly selected) worker from this company” is unknown. Hence, all that is known is ${\displaystyle E(X)=\mu =\27.0}$ and ${\displaystyle \sigma (X)=\5.90}$. A sample of size ${\displaystyle n}$ is randomly drawn. The sample mean gives the gross hourly earnings of the ${\displaystyle n}$ workers in the sample. Calculate the expected value, variance, standard deviation and find the specific form of the distribution of ${\displaystyle {\bar {x}}}$ for the following sample sizes:

1. ${\displaystyle n=50}$
How the expected value ${\displaystyle E({\bar {x}})}$ is calculated does not depend on the distribution of ${\displaystyle X}$ in the population. Hence, there are no new aspects in the present situation and the results are identical to the previous two problems: ${\displaystyle E({\bar {x}})=\mu =\27.30}$

How the variance of ${\displaystyle {\bar {x}}}$ is calculated does not depend on the distribution of ${\displaystyle X}$ in the population but it does depend on the type and size of the random sample. In the statement of problem 3 the sampling scheme has not been specified. However, for all three sample sizes ${\displaystyle n/N<0.05}$ and, hence, if the sample is drawn without replacement the formula ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=\sigma ^{2}/n}$ can be used as an approximation.

 for ${\displaystyle n=10}$ ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=3.481}$ ${\displaystyle \sigma ({\bar {x}})=\1.8657}$ for ${\displaystyle n=50}$ ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=0.6962}$ ${\displaystyle \sigma ({\bar {x}})=\0.8344}$ for ${\displaystyle n=200}$ ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=0.17405}$ ${\displaystyle \sigma ({\bar {x}})=\0.4172}$
Since the distribution of ${\displaystyle X}$ in the population is unknown no exact statement can be made about the distribution of ${\displaystyle {\bar {x}}.}$

However, the central limit theorem implies that the standardized random variable ${\displaystyle z}$ ${\displaystyle z={\sqrt {n}}{\frac {{\bar {X}}-\mu }{\sigma }}}$ is approximately standard normal if the sample size ${\displaystyle n>30}$ and –in random sampling without replacement– the size of the population ${\displaystyle N}$ is sufficiently large. This is satisfied for the cases b) ${\displaystyle n=50}$ and c) ${\displaystyle n=200}$.

Example of sampling distribution

${\displaystyle N=7}$ students take part in an exam for a graduate course and obtain the following scores: Table 1:

Student A B C D E F G
Score 10 11 11 12 12 12 16

The variable ${\displaystyle X}$ = “score of an exam” has the following population frequency distribution: Table 2:

${\displaystyle x}$ ${\displaystyle h(x)}$ ${\displaystyle f(x)=h(x)/N}$ ${\displaystyle F(x)}$
10 1 1/7 1/7
11 2 2/7 3/7
12 3 3/7 6/7
16 1 1/7 7/7

with population parameters ${\displaystyle \mu =12,\sigma ^{2}=3.143}$ and ${\displaystyle \sigma =1.773}$.

Random sampling with replacement

${\displaystyle n=2}$ exams are sampled with replacement from the population. Table 3 contains all possible samples of size ${\displaystyle n=2}$ with replacement and paying attention to the order of the draws: Table 3:

 1. exam 10 11 11 12 12 12 16 10 10;10 10;11 10;11 10;12 10;12 10;12 10;16 11 11;10 11;11 11;11 11;12 11;12 11;12 11;16 11 11;10 11;11 11;11 11;12 11;12 11;12 11;16 12 12;10 12;11 12;11 12;12 12;12 12;12 12;16 12 12;10 12;11 12;11 12;12 12;12 12;12 12;16 12 12;10 12;11 12;11 12;12 12;12 12;12 12;16 16 16;10 16;11 16;11 16;12 16;12 16;12 16;16

For each possible sample, the sample mean can be calculated and is recorded in Table 4. Table 4:

 1. exam 10 11 11 12 12 12 16 10 10 10.5 10.5 11 11 11 13 11 10.5 11 11 11.5 11.5 11.5 13.5 11 10.5 11 11 11.5 11.5 11.5 13.5 12 11 11.5 11.5 12 12 12 14 12 11 11.5 11.5 12 12 12 14 12 11 11.5 11.5 12 12 12 14 16 13 13.5 13.5 14 14 14 16

${\displaystyle {\bar {x}}}$ therefore can take on various values with certain probabilities. From Table 4 the distribution of ${\displaystyle {\bar {x}}}$ can be determined as given in the first two columns of Table 5. Table 5:

${\displaystyle {\bar {x}}}$ ${\displaystyle P({\bar {x}})}$ ${\displaystyle {\bar {x}}-E({\bar {x}})}$ ${\displaystyle [{\bar {x}}-E({\bar {x}})]^{2}}$ ${\displaystyle [{\bar {x}}-E({\bar {x}})]^{2}\cdot P({\bar {x}})}$
10 1 / 49 - 2 4 4 / 49
10.5 4 / 49 - 1.5 2.25 9 / 49
11 10 / 49 - 1 1 10 / 49
11.5 12 / 49 - 0.5 0.25 3 / 49
12 9 / 49 0 0 0
13 2 / 49 1 1 2 / 49
13.5 4 / 49 1.5 2.25 9 / 49
14 6 / 49 2 4 24 / 49
16 1 / 49 4 16 16 / 49

The mean of this distribution, i.e. the expected value of ${\displaystyle {\bar {x}}}$, is given by ${\displaystyle E({\bar {x}})=588/49=12\,.}$ which is equal to the expected value of the variable ${\displaystyle X}$ in the population: ${\displaystyle E(X)=12}$. Using the intermediate results in columns three to five of Table 5 allows one to calculate the variance of ${\displaystyle {\bar {x}}}$: ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=77/49=11/7=1.5714}$ This result is in agreement with the formula for ${\displaystyle \sigma ^{2}({\bar {x}})}$ given above: ${\displaystyle \sigma ^{2}({\bar {x}})=\sigma ^{2}/n=(22/7)/2=11/7\,.}$ It is easy to see that the variance of ${\displaystyle {\bar {x}}}$ is indeed smaller that the variance of ${\displaystyle X}$ in the population.

Random sampling without replacement

From the population, ${\displaystyle n=2}$ exams are randomly drawn without replacement. Table 6 displays all possible samples of size ${\displaystyle n=2}$ from sampling without replacement, paying attention to the order of the draws. Table 6:

 1. exam 10 11 11 12 12 12 16 10 10;11 10;11 10;12 10;12 10;12 10;16 11 11;10 11;11 11;12 11;12 11;12 11;16 11 11;10 11;11 11;12 11;12 11;12 11;16 12 12;10 12;11 12;11 12;12 12;12 12;16 12 12;10 12;11 12;11 12;12 12;12 12;16 12 12;10 12;11 12;11 12;12 12;12 12;16 16 16;10 16;11 16;11 16;12 16;12 16;12

For each possible sample, the sample mean is calculated and reported in Table 7: Table 7:

 1. exam 10 11 11 12 12 12 16 10 10.5 10.5 11 11 11 13 11 10.5 11 11.5 11.5 11.5 13.5 11 10.5 11 11.5 11.5 11.5 13.5 12 11 11.5 11.5 12 12 14 12 11 11.5 11.5 12 12 14 12 11 11.5 11.5 12 12 14 16 13 13.5 13.5 14 14 14

The first two columns of Table 8 contain the probability distribution of the sample mean Table 8:

${\displaystyle {\bar {x}}}$ ${\displaystyle P({\bar {x}})}$ ${\displaystyle {\bar {x}}-E({\bar {x}})}$ ${\displaystyle [{\bar {x}}-E({\bar {x}})]^{2}}$ ${\displaystyle [{\bar {x}}-E({\bar {x}})]^{2}\cdot P({\bar {x}})}$
10.5 4 / 42 - 1.5 2.25 9 / 42
11 8 / 42 - 1 1 8 / 42
11.5 12 / 42 - 0.5 0.25 3 / 42
12 6 / 42 0 0 0
13 2 / 42 1 1 2 / 42
13.5 4 / 42 1.5 2.25 9 / 42
14 6 / 42 2 4 24 / 42

The expected value ${\displaystyle E({\bar {x}})}$ is ${\displaystyle E({\bar {x}})=504/42=12}$ and is equal to the expected value of ${\displaystyle X}$ in the population. The variance is equal to ${\displaystyle Var({\bar {x}})=\sigma ^{2}({\bar {x}})=55/42=1.3095\,,}$ which is in agreement with the formula for calculating ${\displaystyle \sigma ^{2}({\bar {x}})}$ given earlier: {\displaystyle {\begin{aligned}Var({\bar {x}})=\sigma ^{2}({\bar {x}})&={\frac {\sigma ^{2}}{n}}\cdot {\frac {N-n}{N-1}}\\&={\frac {22/7}{2}}\cdot {\frac {7-2}{7-1}}={\frac {22\cdot 5}{7\cdot 2\cdot 6}}={\frac {55}{42}}\,.\\&\end{aligned}}}

Consider a population with distribution function ${\displaystyle F(x)}$, expected value ${\displaystyle E(X)=\mu }$ and variance ${\displaystyle Var(X)=\sigma ^{2}}$. The random variables ${\displaystyle X_{i},i=1,\dots ,n}$ all have the same distribution function ${\displaystyle F(x_{i})=F(x)}$, expectation ${\displaystyle E(X_{i})=\mu }$ and variance ${\displaystyle Var(X_{i})=\sigma ^{2}}$.

Expectation of the sample mean

${\displaystyle {\bar {x}}}$ Using the rules for the expectation of a linear combination of random variables it is easy to calculate that ${\displaystyle E({\bar {x}})=E\left({\frac {1}{n}}\sum \limits _{i=1}^{n}X_{i}\right)={\frac {1}{n}}E\left(\sum \limits _{i=1}^{n}X_{i}\right)={\frac {1}{n}}\sum \limits _{i=1}^{n}E(X_{i})={\frac {1}{n}}\cdot n\cdot \mu =\mu \,,}$ with ${\displaystyle E(X_{i})=\mu }$. This result holds under random sampling with or without replacement and is valid for any positive sample size ${\displaystyle n.}$

Variance of the sample mean ${\displaystyle {\bar {X}}}$

(1) {\displaystyle {\begin{aligned}Var({\bar {x}})&=E[({\bar {x}}-E({\bar {x}}))^{2}]=E[({\bar {x}}-\mu )^{2}]\\&=E\left[\left({\frac {1}{n}}\sum \limits _{i=1}^{n}(X_{i}-\mu )\right)^{2}\right]\\&=E\left[\left({\frac {1}{n}}(X_{1}-\mu )+\dots +{\frac {1}{n}}(X_{n}-\mu )\right)^{2}\right]\\&={\frac {1}{n^{2}}}[E(X_{1}-\mu )^{2}+\dots +E(X_{n}-\mu )^{2}+\sum \limits _{i}\sum \limits _{j\neq i}E(X_{i}-\mu )(X_{j}-\mu )]\\&={\frac {1}{n^{2}}}[Var(X_{1})+\dots +Var(X_{n})+\sum \limits _{i}\sum \limits _{j\neq i}Cov(X_{i},X_{j}]\\&\end{aligned}}} For each ${\displaystyle i=1,\dots ,n}$ , ${\displaystyle Var(X_{i})=\sigma ^{2}}$  Furthermore, under random sampling with replacement the random variables are independent and therefore have ${\displaystyle Cov(X_{i},X_{j})=0}$.  The variance of the sample mean thus simplifies to ${\displaystyle Var({\bar {x}})={\frac {1}{n^{2}}}n\sigma ^{2}={\frac {\sigma ^{2}}{n}}\,.}$ Note that the variance of ${\displaystyle {\bar {x}}}$ is equal to the variance of the population variable ${\displaystyle X}$ divided by ${\displaystyle n.}$ This implies that ${\displaystyle Var({\bar {x}})}$ is smaller than ${\displaystyle Var(X)}$ and that ${\displaystyle Var({\bar {x}})}$ is decreasing with increasing ${\displaystyle n.}$ In other words, for large ${\displaystyle n}$ the distribution of ${\displaystyle {\bar {x}}}$ is tightly concentrated around its expected value ${\displaystyle \mu }$.

(2) The derivation of ${\displaystyle Var({\bar {x}})}$ in the case of random sampling without replacement is similar but more complicated because of the dependency of the random variables. Regarding the finite sample correction, for large populations the following approximation is quite accurate ${\displaystyle {\frac {N-n}{N-1}}\approx {\frac {N-n}{N}}\,,}$ and the approximate correction ${\displaystyle 1-n/N}$ can be used. In sampling without replacement ${\displaystyle n}$ cannot exceed ${\displaystyle N}$. For fixed ${\displaystyle n}$, the finite sample correction approaches 1 with increasing ${\displaystyle N}$ : ${\displaystyle \lim _{N\rightarrow \infty }{\frac {N-n}{N-1}}=1\,.}$ In applications, the correction can be ignored if ${\displaystyle n}$ is small relative to ${\displaystyle N}$.Rule of thumb: ${\displaystyle n/N\leq 0.05}$However, this will only give an approximation to ${\displaystyle Var({\bar {x}})}$. On the distribution of ${\displaystyle {\bar {x}}}$ Suppose that ${\displaystyle X}$ follows a normal distribution in the population with expectation ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$: ${\displaystyle X\sim N(\mu ,\sigma ^{2})}$. In this case, the random variables ${\displaystyle X_{i},i=1,\dots ,n}$ are all normally distributed: ${\displaystyle X_{i}\sim N(\mu ,\sigma ^{2})}$ for each ${\displaystyle i=1,\dots ,n}$. The sum of ${\displaystyle n}$ independent and identically normally distributed random variables also follows a normal distribution: ${\displaystyle \sum \limits _{i=1}^{n}X_{i}\sim N(n\mu ,n\sigma ^{2})\,.}$ The statistic ${\displaystyle {\bar {x}}}$ differs from this sum only by the constant factor ${\displaystyle 1/n}$ and, hence, is also normally distributed: ${\displaystyle {\bar {x}}\sim N(\mu ,\sigma ^{2}({\bar {x}}))}$. Since only the standard normal distribution is tabulated the following standardized version of ${\displaystyle {\bar {x}}}$ is considered: ${\displaystyle z={\frac {{\bar {x}}-\mu }{\sigma ({\bar {x}})}}={\sqrt {n}}{\frac {{\bar {x}}-\mu }{\sigma }}\,,}$ which follows the standard normal distribution: ${\displaystyle z\sim N(0,1)}$. Evidently, using the standardized variable ${\displaystyle z}$ hinges on knowing the population variance ${\displaystyle \sigma ^{2}.}$ If the population variance ${\displaystyle \sigma ^{2}}$ is unknown:The unknown variance ${\displaystyle \sigma ^{2}}$ is estimated by ${\displaystyle s^{2}={\frac {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}{n-1}}}$ Dividing both sides by ${\displaystyle \sigma ^{2}}$ gives {\displaystyle {\begin{aligned}{\frac {s^{2}}{\sigma ^{2}}}&={\frac {1}{\sigma ^{2}}}{\frac {\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}{n-1}}\\{\frac {n-1}{\sigma ^{2}}}s^{2}&=\sum \limits _{i=1}^{n}{\frac {(x_{i}-{\bar {x}})^{2}}{\sigma }}\,.\\&\end{aligned}}} To simplify, set ${\displaystyle y={\frac {(n-1)s^{2}}{\sigma ^{2}}}}$.In random sampling with replacement, the ${\displaystyle X_{i},i=1,\dots ,n}$ are independent and ${\displaystyle y}$ is therefore the sum of squared independent standard normal random variables. It follows that ${\displaystyle y}$ is chi-square distributed with degrees of freedom ${\displaystyle n-1}$. Using the standardized random variable ${\displaystyle z}$ to construct the ratio ${\displaystyle t={\frac {z}{\sqrt {\frac {y}{n-1}}}}\,,}$ gives rise to the random variable ${\displaystyle t}$  which follows the t-distribution with degrees of freedom ${\displaystyle n-1.}$  (Recall from Chapter 6 that a ${\displaystyle t}$ random variable is the ratio of a standard normal to the square root of an independent chi-square divided by its degrees of freedom.) Inserting the expressions for ${\displaystyle z}$,and ${\displaystyle y}$ and rearranging terms yields: ${\displaystyle t={\frac {{\sqrt {n}}{\frac {{\bar {x}}-\mu }{\sigma }}}{\sqrt {{\frac {1}{n-1}}\left({\frac {n-1}{\sigma ^{2}}}s^{2}\right)}}}={\sqrt {n}}{\frac {{\bar {x}}-\mu }{s}}}$

Probability statements about ${\displaystyle {\bar {x}}}$:

If the sampling distribution of ${\displaystyle {\bar {x}}}$ including all its parameters are known, then probability statements about ${\displaystyle {\bar {x}}}$ can be made in the usual way. Suppose one wants to find a symmetric interval around the true mean which will contain ${\displaystyle {\bar {x}}}$ with probability ${\displaystyle 1-\alpha .}$That is, we need to find ${\displaystyle c}$ such that ${\displaystyle P[\mu -c\leq {\bar {x}}\leq \mu +c]=1-\alpha }$.. It will be convenient to use the standardized random variable ${\displaystyle z,}$the distribution of which we will assume to be symmetric.{\displaystyle {\begin{aligned}P(\mu -c\leq {\bar {x}}\leq \mu +c)&=1-\alpha \\P(-c\leq {\bar {x}}-\mu \leq c)&=1-\alpha \\P\left({\frac {-c}{\sigma ({\bar {x}})}}\leq {\frac {{\bar {x}}-\mu }{\sigma ({\bar {x}})}}\leq {\frac {c}{\sigma ({\bar {x}})}}\right)&=1-\alpha \\P\left({\frac {-c}{\sigma ({\bar {x}})}}\leq z\leq {\frac {c}{\sigma ({\bar {x}})}}\right)&=1-\alpha \\P\left(-z_{1-{\frac {\alpha }{2}}}\leq z\leq z_{1-{\frac {\alpha }{2}}}\right)&=1-\alpha \,\\{\frac {c}{\sigma ({\bar {x}})}}&=z_{1-{\frac {\alpha }{2}}}\\c&=z_{1-{\frac {\alpha }{2}}}\cdot \sigma ({\bar {x}})\\&\end{aligned}}} Thus, the deviation ${\displaystyle c}$ from ${\displaystyle \mu }$ is a multiple of ${\displaystyle \sigma ({\bar {x}})}$. Inserting ${\displaystyle \sigma ({\bar {x}})}$ leads to the interval ${\displaystyle \left[\mu -z_{1-{\frac {\alpha }{2}}}\cdot {\frac {\sigma }{\sqrt {n}}}\leq {\bar {x}}\leq \mu +z_{1-{\frac {\alpha }{2}}}\cdot {\frac {\sigma }{\sqrt {n}}}\right]}$ with probability ${\displaystyle P\left(\mu -z_{1-{\frac {\alpha }{2}}}\cdot {\frac {\sigma }{\sqrt {n}}}\leq {\bar {x}}\leq \mu +z_{1-{\frac {\alpha }{2}}}\cdot {\frac {\sigma }{\sqrt {n}}}\right)=1-\alpha }$ If ${\displaystyle X}$ is normally distributed then the central interval of variation with pre-specified probability ${\displaystyle 1-\alpha }$ is determined by reading ${\displaystyle z_{1-\alpha /2}}$ from the standard normal table. The probability ${\displaystyle 1-\alpha }$ is approximately valid if ${\displaystyle X}$ has an arbitrary distribution and the sample size ${\displaystyle n}$ is sufficiently large.