# Finding the Sample Size

 English Português Français ‎Español Italiano Nederlands

The length of a confidence interval generally depends on the confidence level and on the sample size $n$ . An increase in the confidence level $1-\alpha$ (keeping the sample size $n$ constant) yields a broader confidence interval. Increasing the sample size $n$ (while keeping the confidence level $1-\alpha$ constant), enhances precision and yields a smaller interval. Hence by adjusting the confidence level and sample size, we may control the width of the confidence interval. Until now we have assumed that confidence level and sample size are given.  In some applications, however, it is necessary to find the sample size which yields a confidence interval of pre-specified width at a confidence level $1-\alpha$ . The problem will be illustrated using confidence intervals for a mean $\mu$ and a proportion $\pi$ . We will assume sampling with replacement from a large population.

(a) Confidence interval for $\mu$ We assume that the population is normally distributed..The exact sample size can be found, if the length of the sample size is not random, i.e. does not depend on the data. This is true if the variance $\sigma ^{2}$ of the population is known. The length of the confidence interval for $\mu$ is given by:$L=2\cdot z_{1-{\frac {\alpha }{2}}}\cdot {\frac {\sigma }{\sqrt {n}}}$ and depends on the confidence level $1-\alpha$ as well as the sample size $n$ . If the length $L$ and the confidence level $1-\alpha$ are given, we may solve the above equation for $n$ .  More precisely, the required sample size is the smallest integer for which the condition holds: $n\geq {\frac {4\sigma ^{2}z_{1-{\frac {\alpha }{2}}}^{2}}{L^{2}}}\,.$ In order to obtain a confidence interval not exceeding length $L$ and confidence level $1-\alpha$ , $n$ has to be as large as this integer. If the variance $\sigma ^{2}$ is unknown, the length of the interval for $\mu$ $L=2\cdot t_{n-1;1-{\frac {\alpha }{2}}}\cdot {\frac {s}{\sqrt {n}}}$ is random since it depends on the standard deviation $s$ which is a function of the sample. There are procedures which ensure that the expected length of the confidence interval equals some value, but these will not be considered here.

(b) Confidence interval for $\pi$ Suppose we have a sufficiently large sample so that the sample proportion ${\hat {\pi }}$ is approximately normally distributed.  The length of the confidence interval for $\pi$ is given by $L=2\cdot z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}\,.$ Given a pre-specified value $L$ , and confidence level $1-\alpha$ , we could solve the above equation for the required sample size. More precisely, the required sample size would be the smallest integer for which the following condition holds: $n\geq {\frac {4\cdot z_{1-{\frac {\alpha }{2}}}^{2}{\widehat {\pi }}\cdot (1-{\widehat {\pi }})}{L^{2}}}.$ However, note that ${\widehat {\pi }}$ is random, in which case the required sample size would vary from sample to sample.  Fortunately, we can arrive at a conservative minimum sample size as follows.  Note first that  $\pi (1-\pi )$ is maximal when $\pi =0.5$ and $1-\pi =0.5$ .  This is the situation which requires, ceteris paribus, the largest sample size. Thus if we select $n\geq {\frac {4\cdot z_{1-{\frac {\alpha }{2}}}^{2}\cdot {\frac {1}{2}}\cdot {\frac {1}{2}}}{L^{2}}}={\frac {z_{1-{\frac {\alpha }{2}}}^{2}}{L^{2}}}\,.$ then it will also be the case that $n\geq {\frac {4\cdot z_{1-{\frac {\alpha }{2}}}^{2}\pi \cdot (1-\pi )}{L^{2}}}.$ for any other $\pi$ . We need to take some care to ensure that sample size $n$ is sufficiently large so that the normal distribution applies. We are given a population of employees of an insurance company and we observe the following variables: $X1$ = annual income $X2$ = number of signed contracts $X3$ = sick days per calendar year $X4$ = weekly working hours.We assume that the variables $X1,\dots ,X4$ are normally distributed with unknown means and known variances for each variable:

 $\sigma _{1}^{2}=18.92$ $\sigma _{2}^{2}=7.54$ $\sigma _{3}^{2}=4.03$ $\sigma _{4}^{2}=12.24$ How large does the sample size need to be so that the confidence interval for $\mu$ at a given confidence level $1-\alpha$ is of length $L$ ? In this example one can assess the influence of sample size on the level and width of the confidence interval.  We recommend that only one of these features be varied at a time. Please select

• the variable to be analyzed
• the length of the confidence interval $L$ • the level of confidence $1-\alpha$ (as decimal number, i.e. 0.95)

The leader of a small political party would like to know whether the party will receive more than 5% of the vote if the election were held tomorrow. He has appointed a statistician to perform the analysis. During their conversation the statistician highlights the following issues:

• In order to find the exact proportion of supporters, one would have to ask all the voters (i.e. the whole population).
• The proportion of votes in the sample is but an estimate or approximation of the true proportion.
• The confidence interval provides a measure of the uncertainty associated with the estimate.
• The length and level of confidence may be chosen by the politician .
• The shorter the required interval and the higher the confidence level, the larger the sample size.

The statistician calculates the required sample size using $n\geq {\frac {4\cdot z_{1-{\frac {\alpha }{2}}}^{2}\cdot {\widehat {\pi }}\cdot (1-{\widehat {\pi }})}{L^{2}}}\,.$ Since ${\widehat {\pi }}$ is unknown, the statistician uses the largest imaginable proportion of votes for his party.  That proportion is 10 %.  (This is because $\pi \cdot (1-\pi )$ increases with $\pi$ .)  This yields a conservative value for minimum sample size. The Bimmelbahn Corporation would like to make a statement about the timeliness of its trains, in particular, the average delay and the proportion of timely trains. Confidence interval based on a random sample will be used. 1. Question: What should be the sample size in order to find a confidence interval for mean delay at a confidence level $1-\alpha =0.90$ and width 60 min ?We assume that the random variable $X$ = duration of delays is normally distributed with mean $E(X)=\mu$ and known variance $Var(X)=\sigma ^{2}=68.8$ .  We want a confidence interval for $\mu$ .  Note that $z_{1-\alpha /2}=z_{0.95}=1.645$ . Hence the required sample size is $n\geq {\frac {4\sigma ^{2}z_{1-{\frac {\alpha }{2}}}^{2}}{L^{2}}}={\frac {4\cdot 68.8^{2}\cdot 1.645^{2}}{60^{2}}}=14.23\,.$ Thus if $n\geq 15$ , the confidence interval will have the desired precision and width. 2. Question: What should be the sample size in order in order that the confidence interval for $\pi$ (the proportion of timely trains), be of length not exceeding 0.1 at a confidence level $1-\alpha =0.95$ ?We assume the normal approximation holds for the distribution of ${\hat {\pi }}$ (rule of thumb: $n\geq 100$ ). Note that $z_{1-\alpha /2}=z_{0.975}=1.96$ . We need to find $n$ to satisfy$:$ $n\geq {\frac {4\cdot z_{1-{\frac {\alpha }{2}}}^{2}\cdot {\widehat {\pi }}\cdot (1-{\widehat {\pi }})}{L^{2}}}$ A conservative bound for minimum sample size may be obtained by setting $\pi =0.5$ . We obtain: $n\geq {\frac {z^{2}}{L^{2}}}={\frac {1.96^{2}}{0.1^{2}}}=384.16\,.$ Thus to achieve the desired width and confidence level, we need $n\geq 385$ .