# Testing the Difference of Two Population Means

 English Português Français ‎Español Italiano Nederlands

The unknown parameter to be tested now is the difference of two expectations in two distinguishable populations, ${\displaystyle (\mu _{1}-\mu _{2})}$. Our parameter tests will be based on individual samples arising from these two populations; we will thus be dealing with two-sample tests. There are many different ways of constructing tests for the difference in two population expectations. Our tests will be suited to the following assumptions:

• There are two populations. The random variable observed in the first, ${\displaystyle X_{1}}$ has expectation ${\displaystyle E\left(X_{1}\right)=\mu _{1}}$ and variance ${\displaystyle Var\left(X_{1}\right)=\sigma _{1}^{2}}$; the parameters of the random variable observed in the second population, ${\displaystyle X_{2}}$, are ${\displaystyle E\left(X_{2}\right)=\mu _{2}}$ and ${\displaystyle Var\left(X_{2}\right)=\sigma _{2}^{2}}$. We test for the difference in their expected values, because we have to regard ${\displaystyle \mu _{1}}$ and ${\displaystyle \mu _{2}}$ as unknown.
• The sizes of the two populations, ${\displaystyle N_{1}}$ and ${\displaystyle N_{2}}$, are sufficiently large to base the test procedures on simple random samples drawn without replacement. The sample sizes are denoted by ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$, respectively.
• The two samples are independent. This means they are drawn independently of each other so as to not convey any cross-sample information.
• Either the random variates ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ are normally distributed (${\displaystyle X_{1}\thicksim \mathbb {N} \left(\mu _{1};\,\sigma _{1}\right)}$ and ${\displaystyle X_{2}\thicksim \mathbb {N} \left(\mu _{2};\,\sigma _{2}\right)}$), or their distributions can be approximated sufficiently accurately by a normal distribution via the central limit theorem. For this to be feasible, the sample sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$ have to be sufficiently large.

There is a hypothesis about the difference, expressed in terms of ${\displaystyle \omega _{0}=\mu _{1}-\mu _{2}}$. A special case of particular practical interest is that of hypothetical equality of the two population means, i.e. ${\displaystyle \omega _{0}=0}$. The test will be conducted at a of ${\displaystyle \alpha }$.

### Hypotheses

Depending on the application at hand, a two- or one-sided test will be carried out: 1) Two-sided test ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}=\omega _{0}\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}\neq \omega _{0}.}$ 2) Right-sided test ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}\leq \omega _{0}\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}>\omega _{0}.}$ 3) Left-sided test ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}\geq \omega _{0}\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}<\omega _{0}.}$ The choice of the appropriate test should be guided by the considerations laid out in the section on one-sample tests of ${\displaystyle \mu }$.

### Test statistic and its distribution; decision regions

We have already shown, that the estimator of the difference of two expectations, ${\displaystyle D={\overline {X}}_{1}-{\overline {X}}_{2},}$ where ${\displaystyle {\overline {X}}_{1}}$ and ${\displaystyle {\overline {X}}_{2}}$ are the sample means, that is, ${\displaystyle {\overline {X}}_{1}={\frac {1}{n_{1}}}\sum _{i=1}^{n_{1}}\,X_{1i}\quad {\overline {X}}_{2}={\frac {1}{n_{2}}}\sum _{i=1}^{n_{2}}\,X_{2i},}$ has normal distribution with expectation ${\displaystyle E\left(D\right)=\omega =\mu _{1}-\mu _{2}}$. Independence of the sample variables implies the variance of the sample mean differential is the difference of the variances of the sample means: ${\displaystyle Var\left(D\right)=\sigma _{D}^{2}={\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}.}$ Assume that ${\displaystyle \omega _{0}}$ is the true distance between the population expectations: ${\displaystyle \omega =\omega _{0}}$. Then ${\displaystyle D}$ follows a normal distribution with expectation ${\displaystyle E\left(D\right)=\omega _{0}}$ and variance ${\displaystyle \sigma _{D}^{2}}$ . In constructing an appropriate test statistic, we have to make the same distinction concerning our knowledge about the standard deviations ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$ as in the one-sample case. Let’s start with the simplifying (and unrealistic) assumption that, for some miraculous reason, we know the standard deviations in both populations, ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$. If we know ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$, the distribution of ${\displaystyle D}$ is fully specified as above, and we can standardize ${\displaystyle D}$ to ensure the applicability of numerical tables for the standard normal distribution: ${\displaystyle V={\frac {D-\omega _{0}}{\sigma _{D}}}={\frac {\left({\overline {X}}_{1}-{\overline {X}}_{2}\right)-\omega _{0}}{\sqrt {{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}}}}.}$ Under ${\displaystyle {\text{H}}_{0}}$, ${\displaystyle V}$ has (at least approximately) standard normal distribution, and the table of numerical values of the cumulative standard normal distribution can be used to determine critical values. These normal quantiles translate into the following decision regions for tests at a significance level ${\displaystyle \alpha }$:

Test Rejection region for H${\displaystyle _{0}}$ Non-rejection region for H${\displaystyle _{0}}$
Two-sided ${\displaystyle \left\{v\,|\,v<-z_{1-\alpha /2}\,,{\text{ or }}\,v>z_{1-\alpha /2}\right\}}$ ${\displaystyle \left\{v\,|\,-z_{1-\alpha /2}\leq v\leq z_{1-\alpha /2}\right\}}$
Right-sided ${\displaystyle \left\{v\,|\,v>z_{1-\alpha }\right\}}$ ${\displaystyle \left\{v\,|\,v\leq z_{1-\alpha }\right\}}$
Left-sided ${\displaystyle \left\{v\,|\,v ${\displaystyle \left\{v\,|\,v\geq -z_{1-\alpha }\right\}}$

We have to estimate the unknown quantities ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$ using their sample counterparts: ${\displaystyle S_{1}^{2}={\frac {1}{n_{1}-1}}\,\sum _{i=1}^{n_{1}}\left(X_{1i}-{\overline {X}}_{1}\right)^{2},\quad S_{2}^{2}={\frac {1}{n_{2}-1}}\,\sum _{i=1}^{n_{2}}\left(X_{2i}-{\overline {X}}_{2}\right)^{2}.}$ Assuming homogeneity in variances, i.e. the random variable under consideration has the same dispersion in both populations, ${\displaystyle \sigma _{1}^{2}=\sigma _{2}^{2}}$, the estimation function ${\displaystyle S^{2}}$ of the joint variance ${\displaystyle \sigma ^{2}}$ is a weighted arithmetic average of the two variance estimators ${\displaystyle S_{1}^{2}}$ and ${\displaystyle S_{2}^{2}}$: ${\displaystyle S^{2}={\frac {\left(n_{1}-1\right)\,S_{1}^{2}+\left(n_{2}-1\right)\,S_{2}^{2}}{n_{1}+n_{2}-2}}.}$ Thus, we can write the estimator ${\displaystyle S_{D}^{2}}$ of ${\displaystyle \sigma _{D}^{2}}$ as ${\displaystyle S_{D}^{2}=S^{2}\,\left({\frac {1}{n_{1}}}+{\frac {1}{n_{2}}}\right)={\frac {n_{1}+n_{2}}{n_{1}\,n_{2}}}\,{\frac {\left(n_{1}-1\right)\,S_{1}^{2}+\left(n_{2}-1\right)\,S_{2}^{2}}{n_{1}+n_{2}-2}}.}$ The test statistic ${\displaystyle V}$ is then calculated as ${\displaystyle V={\frac {D-\omega _{0}}{S_{D}}}={\frac {\left({\overline {X}}_{1}-{\overline {X}}_{2}\right)-\omega _{0}}{\sqrt {{\frac {n_{1}+n_{2}}{n_{1}\,n_{2}}}\,{\frac {\left(n_{1}-1\right)\,S_{1}^{2}+\left(n_{2}-1\right)\cdot S_{2}^{2}}{n_{1}+n_{2}-2}}}}},}$ and has ${\displaystyle t-}$distribution with ${\displaystyle n_{1}+n_{2}-2}$ degrees of freedom. Under the assumption of heterogenous variances, ${\displaystyle \sigma _{1}^{2}\neq \sigma _{2}^{2}}$, the estimator ${\displaystyle S_{D}^{2}}$ can only be approximated as ${\displaystyle S_{D}^{2}={\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}.}$ Welsh has suggested to base the test statistic on this approximation and use ${\displaystyle V={\frac {D-\omega _{0}}{S_{D}}}={\frac {\left({\overline {X}}_{1}-{\overline {X}}_{2}\right)-\omega _{0}}{\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}}}$ as test statistic. Under the null hypothesis, ${\displaystyle V}$ can be approximated by a ${\displaystyle t-}$distribution with ${\displaystyle f}$ degrees of freedom calculated as follows: ${\displaystyle f={\frac {\left({\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}\right)^{2}}{{\frac {1}{n_{1}-1}}\,\left({\frac {S_{1}^{2}}{n_{1}}}\right)^{2}+{\frac {1}{n_{2}-1}}\,\left({\frac {S_{2}^{2}}{n_{2}}}\right)^{2}}}.}$ In both cases (homogenous and heterogeneous variances) critical values can be taken from the ${\displaystyle t-}$distribution table. The following table shows the derived decision regions for the three test situations (for significance level ${\displaystyle \alpha }$).

Test Rejection region for H${\displaystyle _{0}}$ Non-rejection region for H${\displaystyle _{0}}$
Two-sided ${\displaystyle \left\{v\,|\,v<-t_{1-\alpha /2;n_{1}+n_{2}-2}\,,{\text{ or }}\,v>t_{1-\alpha /2;n_{1}+n_{2}-2}\right\}}$ ${\displaystyle \left\{v\,|\,-t_{1-\alpha /2;n_{1}+n_{2}-2}\leq v\leq t_{1-\alpha /2;n_{1}+n_{2}-2}\right\}}$
Right-sided ${\displaystyle \left\{v\,|\,v>t_{1-\alpha ;n_{1}+n_{2}-2}\right\}}$ ${\displaystyle \left\{v\,|\,v\leq t_{1-\alpha ;n_{1}+n_{2}-2}\right\}}$
Left-sided ${\displaystyle \left\{v\,|\,v ${\displaystyle \left\{v\,|\,v\geq -t_{1-\alpha ;n_{1}+n_{2}-2}\right\}}$

Note that the ${\displaystyle t-}$distribution quantiles in above table can be approximated by standard normal quantiles, if both sample sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$ are big enough to justify the application of the (${\displaystyle n_{1}>30}$ and ${\displaystyle n_{2}>30}$). The resulting decision regions are then similar to those in the case of known variances.

### Sampling and computing the test statistic

On the basis of an observed sample, the two sample means ${\displaystyle {\overline {x}}_{1}}$ and ${\displaystyle {\overline {x}}_{2}}$ and, if needed, the empirical standard deviations ${\displaystyle s_{1}}$ and ${\displaystyle s_{2}}$ can be computed. Plugging these values into the test statistic formula gives the realized test statistic value ${\displaystyle v}$.

### Test decision and interpretation

Test decision and interpretation are carried out analogously to the one-sample mean test. Consider a population of 3,100 supermarket branches with both cheese and meat counters. Define${\displaystyle X_{1}:=}$ ‘Queuing duration in minutes at cheese counter’ and${\displaystyle X_{2}:=}$ ‘Queuing duration in minutes at meat counter’. Assume that ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ have normal distribution with unknown expectations ${\displaystyle \mu _{1}}$ and ${\displaystyle \mu _{2}}$ and unknown, but equal variances ${\displaystyle \sigma _{1}^{2}=\sigma _{2}^{2}}$ (variance homogeneity). We want to conduct a test at a significance level of ${\displaystyle \alpha }$ on the basis of two simple random samples of sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$, whether the average time customers have to queue up on either counter before they are being served is equal, i.e. whether the differential in the true parameters ${\displaystyle \mu _{1}-\mu _{2}}$ equals ${\displaystyle \omega _{0}=0}$: ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}=0\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}\neq 0.}$ In this interactive example you can conduct this test as often as you like. Each repetition is based on freshly simulated random samples of ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ and carried out using your specified test parameters. You can:

• repeatedly observe test decisions on the basis of unchanged significance level ${\displaystyle \alpha }$ and sample sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$;
• alter ${\displaystyle \alpha ,}$ for constant ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$;
• vary the sample sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$, holding the significance level ${\displaystyle \alpha }$ constant; or
• vary ${\displaystyle \alpha }$, ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$ simultaneously.

Mr. Schmidt and Mr. Maier, two senior bank managers, enjoy lunch hours that are long enough to start arguing about the average age of their colleagues. 1st disputeMr. Schmidt claims that the average age of female employees differs from that of the male employees—an opinion Mr. Maier cannot and, more importantly, doesn’t want to share. 2nd disputeMr. Schmidt even believes to know the direction of the deviation: Female workers are older on average, it appears to him. Being opposed to Schmidts first claim, Maier cannot but disagree with his second. 3rd disputeThe above is not enough confrontation to override the boredom that has spread after numerous discussions about the fair value of the Euro and the best national football team coach. Mr. Schmidt cannot help himself and switches to attack: ‘On average, the women in our bank are ${\displaystyle 5}$ years older than the men!’ Mr. Maier is more than happy to disagree, even though he suddenly concedes that the average male colleague might be younger than the average female. But he cannot rule out the possibility that these subjective impressions could be subject to a focus bias arising from a more critical examination of their female colleagues (Maier and Schmidt are both married). To settle their disputes and hence make space for other future discussions, Maier and Schmidt decide to carry out a statistical investigation. They are both surprised that they can agree on the following settings: The statistical test will be based on the difference of two population means ${\displaystyle \mu _{1}-\mu _{2}}$; significance level is ${\displaystyle \alpha }$. Random variable ${\displaystyle X_{1}}$ captures the age of a female banker, ${\displaystyle X_{2}}$ the age of a male banker. Expectations ${\displaystyle E\left(X_{1}\right)=\mu _{1}}$, ${\displaystyle E\left(X_{2}\right)=\mu _{2}}$ and variances ${\displaystyle Var\left(X_{1}\right)=\sigma _{1}}$ , ${\displaystyle Var\left(X_{2}\right)=\sigma _{2}}$ are unknown. Homogeneity of variances cannot be assumed, Maier and Schmidt agree. Furthermore, there is no prior knowledge about the shape of the distribution of ${\displaystyle X_{1}}$ and ${\displaystyle X_{2}}$ . Consequently, sample sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$ will have to be sufficiently large to justify the application of the . Maier and Schmidt know that there are approximately as many female as male workers in the bank, and they thus choose equal sample sizes: ${\displaystyle n_{1}=n_{2}=50}$. They ask human resources for support in there ground-breaking investigations. Of course, personnel could simply provide them with the exact data, but they agree to draw two samples of size ${\displaystyle 50}$ at random, without replacing the sampled entity after each draw. They assure that the two samples from the male and female population can be regarded as independent. Sample averages and variances are computed for both samples.

### Test statistic and its distribution; decision regions

As ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$ are unknown and Maier&Schmidt have to assume heterogeneity of variances, they employ the ${\displaystyle V={\frac {\left({\overline {X}}_{1}-{\overline {X}}_{2}\right)-\omega _{0}}{\sqrt {{\frac {S_{1}^{2}}{n_{1}}}+{\frac {S_{2}^{2}}{n_{2}}}}}},}$ where ${\displaystyle {\overline {X}}_{1}={\frac {1}{n_{1}}}\sum _{i=1}^{n_{1}}\,X_{1i},\quad {\overline {X}}_{2}={\frac {1}{n_{2}}}\sum _{i=1}^{n_{2}}\,X_{2i}}$ are the sample means and ${\displaystyle S_{1}^{2}={\frac {1}{n_{1}-1}}\,\sum _{i=1}^{n_{1}}\left(X_{1i}-{\overline {X}}_{1}\right)^{2},\quad S_{2}^{2}={\frac {1}{n_{2}-1}}\,\sum _{i=1}^{n_{2}}\left(X_{2i}-{\overline {X}}_{2}\right)^{2}}$ are estimators of the population variances ${\displaystyle \sigma _{1}}$ and ${\displaystyle \sigma _{2}}$. As the sample sizes satisfy ${\displaystyle n_{1}>30}$ respectively ${\displaystyle n_{2}>30}$, the central limit theorem can be applied, and the distribution of ${\displaystyle V}$ can, under ${\displaystyle {\text{H}}_{0}}$, be approximated by the standard normal distribution (bell curve). Maier&Schmidt will thus apply an asymptotic or approximate test for ${\displaystyle \mu _{1}-\mu _{2}}$.

## 1st dispute

### Hypothesis

Mr. Schmidts first claim is general in that he doesn’t specify direction or size of the proposed average age differential. Thus, a two-sided test with ${\displaystyle \omega _{0}=0}$ has to be specified: ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}=\omega _{0}=0\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}\neq \omega _{0}=0,}$ or, equivalently, ${\displaystyle {\text{H}}_{0}:\mu _{1}=\mu _{2}\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}\neq \mu _{2}.}$

### Determining the decision regions for H${\displaystyle _{0}}$

The upper critical value satisfying ${\displaystyle P\left(V\geq c_{u}\right)=1-\alpha /2=0.975}$ can be looked up in the normal distribution table as the ${\displaystyle 97.5}$ per cent quantile: ${\displaystyle c_{u}=z_{0.975}=1.96}$. From the symmetry of the normal distribution around zero follows for the lower critical value ${\displaystyle c_{l}=-z_{1-\alpha /2}=-1.96}$, such that ${\displaystyle P\left(V\leq c_{l}\right)=\alpha /2=0.025}$. We thus have the following decision regions: Approximate non-rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,-1.96\leq v\leq 1.96\right\}}$. Approximate rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v<-1.96\,{\text{ or }}\,v>1.96\right\}}$.

### Sampling and computing the test statistic

Personnel submits the following data computed from the two samples: Female bank clerks: ${\displaystyle {\overline {x}}_{1}=47.71,\quad s_{1}^{2}=260.875.}$ Male bank clerks: ${\displaystyle {\overline {x}}_{2}=41.80,\quad s_{2}^{2}=237.681.}$ Using ${\displaystyle \omega _{0}=0}$, Maier&Schmidt derive a test statistic value of ${\displaystyle v=1.87}$.

### Test decision and interpretation

The test statistic value of ${\displaystyle v=1.87}$ falls into the non-rejection region for H${\displaystyle _{0}}$, and consequently the null hypothesis is not rejected. Based on two independent random samples of sizes ${\displaystyle n_{1}=n_{2}=50}$, Maier&Schmidt couldn’t prove statistically the existence of a significant difference in the population averages of female and male bank clerks’ ages, ${\displaystyle \mu _{1}}$ and ${\displaystyle \mu _{2}}$. Having not-rejected the null hypothesis, Maier&Schmidt may have made a wrong decision. This is the case, if in reality the two population means do differ. The probability of the occurrence of a type II error (${\displaystyle '{\text{H}}_{0}^{'}|{\text{H}}_{1}}$) can only be computed for ’hypothetical’ true parameter values, i.e. the parameter region of the is narrowed to a single parameter point.

## 2nd dispute

### Hypothesis

Mr. Schmidt believes that subsequently he has come up with some substantial new arguments in favour of his proposition and insists in putting it as the alternative hypothesis in a further test to be conducted. If the null hypothesis is rejected and thus his hypothesis verified, he can quantify the maximum type I error probability to be ${\displaystyle \alpha }$ and has thus scientific backing for maintaining his position. The resulting test is a right-sided one, still without quantification of the suggested positive difference: ${\displaystyle \omega _{0}=0}$: ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}\leq \omega _{0}=0\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}>\omega _{0}=0,}$ or, equivalently, ${\displaystyle {\text{H}}_{0}:\mu _{1}\leq \mu _{2}\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}>\mu _{2}.}$

### Determining the decision regions for H${\displaystyle _{0}}$

The critical value satisfying ${\displaystyle P\left(V\leq c\right)=1-\alpha =0.95}$ can be found in the normal distribution table to be ${\displaystyle c=z_{0.95}=1.645}$. The decision regions are then: Approximative non-rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v\leq 1.645\right\}}$. Approximative rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v>1.645\right\}}$.

### Sampling and computing the test statistic

Human resources supplies Mr. Maier and Mr. Schmidt with the following sample characteristics: Female bank clerks: ${\displaystyle {\overline {x}}_{1}=51.71,\quad s_{1}^{2}=385.509.}$ Male bank clerks: ${\displaystyle {\overline {x}}_{2}=45.16,\quad s_{2}^{2}=283.985}$ Using ${\displaystyle \omega _{0}=0}$, Maier&Schmidt compute the test statistic value as ${\displaystyle v=1.79}$.

### Test decision and interpretation

As the test statistic value of ${\displaystyle v=1.87}$ falls into the rejection region for H ${\displaystyle _{0}}$, the null hypothesis is rejected. Maier&Schmidt could show on the basis of two independent random samples of sizes ${\displaystyle n_{1}=n_{2}=50}$, that the difference ${\displaystyle \mu _{1}-\mu _{2}}$ is significant at the ${\displaystyle \alpha =0.05}$ level. Thus, Schmidt has reason to maintain his claim, that the average female bank clerk is older than the average male. The probability of having made a wrong conclusion in a repeated test context, i.e. the type I error probability ${\displaystyle P\left('{\text{H}}_{1}^{'}|{\text{H}}_{0}\right)}$, is constrained by the significance level ${\displaystyle \alpha =0.05}$. Compared to the two-sided test, the rejection region for H${\displaystyle _{0}}$ doesn’t consist of two segments, but is located on the right hand side of ${\displaystyle E\left(V\right)=0}$. As the area under the normal curve corresponding to this region has to equal the ‘entire’ quantity ${\displaystyle \alpha }$, the critical value is smaller than that for the two-sided version. For this reason the null hypothesis is more likely to be rejected for the same significance level ${\displaystyle \alpha }$ and sample sizes ${\displaystyle n_{1}}$ and ${\displaystyle n_{2}}$ in the one-sided test than in the two-sided test for equal deviations of the test statistic from the hypothetical boundary parameter value in the same direction.

## 3rd dispute

### Hypothesis

In his third claim, Mr. Schmidt has gone one step further in that he has quantified the average age of his female colleagues to be at least ${\displaystyle 5}$ years higher than the average age of his male coworkers. Translated into our test formalization, the hypothetical difference is ${\displaystyle \omega _{0}=5}$. Maier agrees to adopt the same test structure as in the second dispute, leaving Schmidts claim as alternative hypothesis. The resulting right-sided test is: ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}\leq \omega _{0}=5\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}>\omega _{0}=5.}$

### Determining the decision regions for H${\displaystyle _{0}}$

The critical value for ${\displaystyle P\left(V\leq c\right)=1-\alpha =0.95}$ is looked up in the normal distribution table: ${\displaystyle c=z_{0.95}=1.645}$. The resulting approximate decision regions are the same as in the second dispute: Approximative non-rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v\leq 1.645\right\}}$. Approximative rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v>1.645\right\}}$.

### Sampling and computing the test statistic

Human resources submit the following statistics: Female bank clerks: ${\displaystyle {\overline {x}}_{1}=52.22,\quad s_{1}^{2}=321.914.}$ Male bank clerks: ${\displaystyle {\overline {x}}_{2}=43.13,\quad s_{2}^{2}=306.527}$ This time Maier&Schmidt compute the test statistic value using ${\displaystyle \omega _{0}=5}$, yielding ${\displaystyle v=1.154}$.

### Test decision and interpretation

The test statistic value ${\displaystyle v=1.154}$ belongs to the non-rejection region for H${\displaystyle _{0}}$, and the null hypothesis is thus not rejected. On the basis of two independent random samples of sizes ${\displaystyle n_{1}=n_{2}=50}$, Maier&Schmidt couldn’t verify statistically, that the difference ${\displaystyle \mu _{1}-\mu _{2}}$ is significantly greater than ${\displaystyle 5}$. Schmidt hence couldn’t prove statistically at a significance level of ${\displaystyle \alpha =0.05}$, that the average female bank clerk is ${\displaystyle 5}$ years older than the average male bank worker. The test delivers an objective decision basis for a proposed difference of exactly ${\displaystyle 5}$—nothing can be said about any other positive difference smaller than ${\displaystyle 5}$ (neither for true differences greater than ${\displaystyle 5}$, owing to the possibility of the type II error). Thus, if the average female banker is older than the average male banker in the population, Mr. Schmidt has either overstated the difference or is a victim of the type II error, ${\displaystyle '{\text{H}}_{0}^{'}|{\text{H}}_{1}}$, the probability of which can only be computed for specific values of the true population parameter differential. Student Sabine visits two farms to buy fresh eggs. The farms are populated by two different breeds of hens—one on each. Sabine randomly picks ${\displaystyle 10}$ eggs from the first and ${\displaystyle 15}$ eggs from the second farm. Back home, she has the impression that the eggs produced by the hens on the first farm are heavier than those from the second. To verify this suspicion, she conducts a statistical test at a significance level of ${\displaystyle \alpha }$. Sabine compares two (weight) averages by testing for the difference ${\displaystyle \mu _{1}-\mu _{2}}$ of two means.

### Hypothesis

As Sabine has reason to believe that the average weight of one egg variety is greater than that of the other, a single-sided test is indicated. She wants to prove statistically, that the first farm produces heavier eggs and consequently puts her conjecture as alternative hypothesis , hoping that her sample will reject the null hypothesis which states the negation of the statement she wants to verify positively. But Sabine has no idea as to how great the average weight difference could be and thus sets the hypothetical difference that has to be exceeded to prove her right to zero: ${\displaystyle \mu _{1}-\mu _{2}=\omega _{0}=0}$. She can formalize her test as ${\displaystyle {\text{H}}_{0}:\mu _{1}-\mu _{2}\leq 0\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}-\mu _{2}>0,}$ or, equivalently, ${\displaystyle {\text{H}}_{0}:\mu _{1}\leq \mu _{2}\quad {\text{ versus }}\quad {\text{H}}_{1}:\mu _{1}>\mu _{2}.}$

### Test statistic and its distribution; decision regions

Sabine has picked the eggs at random—in particular, she hasn’t tried to get hold of the biggest ones on either farm. Naturally, she sampled without replacement, but we must also assume that the population of daily produced eggs on both farms is sufficiently large to justify the assumption of a simple random sample. Clearly, Sabine has drawn the samples independently, for she sampled on two unrelated farms. Sabine assumes that the random variables ${\displaystyle X_{1}:}$ ‘egg weight of first breed’ and ${\displaystyle X_{2}:}$ ‘egg weight of second breed’ are : ${\displaystyle X_{1}\thicksim \mathbb {N} \left(\mu _{1};\,\sigma _{1}\right)}$ and ${\displaystyle X_{2}\thicksim \mathbb {N} \left(\mu _{2};\,\sigma _{2}\right)}$. Expectations ${\displaystyle E\left(X_{1}\right)=\mu _{1}}$ and ${\displaystyle E\left(X_{2}\right)=\mu _{2}}$ and variances ${\displaystyle Var\left(X_{1}\right)=\sigma _{1}^{2}}$ and ${\displaystyle Var\left(X_{2}\right)=\sigma _{2}^{2}}$ are unknown. To simplify matters, Sabine assumes that the population variances are homogenous: ${\displaystyle \sigma _{1}^{2}=\sigma _{2}^{2}}$. This assumption implies that a differential in the expectation doesn’t induce a differential in the variances—a rather adventurous assumption. Nevertheless, acknowledging the above assumptions (and the possibility of their violation), Sabine can base her test on the test statistic ${\displaystyle V={\frac {\left({\overline {X}}_{1}-{\overline {X}}_{2}\right)-\omega _{0}}{\sqrt {{\frac {n_{1}+n_{2}}{n_{1}\,n_{2}}}\,{\frac {\left(n_{1}-1\right)\,S_{1}^{2}+\left(n_{2}-1\right)\cdot S_{2}^{2}}{n_{1}+n_{2}-2}}}}}.}$ Here, ${\displaystyle n_{1}=10}$ and ${\displaystyle n_{2}=15}$ are the sample sizes, ${\displaystyle {\overline {X}}_{1}}$ and ${\displaystyle {\overline {X}}_{2}}$ are the sample means and ${\displaystyle S_{1}^{1}}$ and ${\displaystyle S_{1}^{2}}$ are the estimators of ${\displaystyle \sigma _{1}^{2}}$ and ${\displaystyle \sigma _{2}^{2}}$. Under ${\displaystyle {\text{H}}_{0}}$, ${\displaystyle V}$ has ${\displaystyle t-}$distribution with ${\displaystyle n_{1}+n_{2}-2=10+15-2=23}$ degrees of freedom. In the corresponding t-table we find the quantile ${\displaystyle t_{0.95;23}=1.714}$ to be the critical value ${\displaystyle c}$ satisfying ${\displaystyle P\left(V\leq c\right)=1-\alpha =0.95}$ and hence have the following decision regions: Non-rejection region for H${\displaystyle _{0}}$: ${\displaystyle \left\{v\,|\,v\leq 1.714\right\}}$. Rejection region for H${\displaystyle _{0}}$: ${\displaystyle \left\{v\,|\,v>1.714\right\}}$.

### Sampling and computing the test statistic

Sabine weighs the eggs and computes the sample-specific arithmetic averages and variances: 1st breed: ${\displaystyle {\overline {x}}_{1}=65.700\quad s_{1}^{2}=50.35.}$ 2nd breed: ${\displaystyle {\overline {x}}_{2}=60.433\quad s_{1}^{2}=42.46.}$ Using ${\displaystyle \omega _{0}=0}$ she calculates a test statistic value of ${\displaystyle v=1.91}$.

### Test decision and interpretation

The test statistic realization ${\displaystyle v=1.91}$ falls into the rejection region for H ${\displaystyle _{0}}$. Thus, Sabine couldn’t prove statistically on the basis of two independent random samples of sizes ${\displaystyle n_{1}=10}$ and ${\displaystyle n_{2}=15}$ and a significance level of ${\displaystyle \alpha =0.05}$, that the difference ${\displaystyle \mu _{1}-\mu _{2}}$ of the population averages of the eggs’ weights is significantly negative. As the type I error probability ${\displaystyle P\left('{\text{H}}_{1}^{'}|{\text{H}}_{0}\right)}$ can not exceed ${\displaystyle \alpha }$, Sabine has scientific backing for her claim that the eggs from breed 1 hens are heavier than those from the second farm—on average!