# Confidence Interval for Proportion

 English Português Français ‎Español Italiano Nederlands

Suppose we are drawing from a dichotomous population, where ${\displaystyle \pi }$ denotes the proportion of elements with a given property. We want to estimate a confidence interval for the unknown parameter ${\displaystyle \pi }$. We draw a random sample of size ${\displaystyle n}$ in a such manner that ${\displaystyle X_{1},\dots ,X_{n}}$ are independently and identically bernoulli distributed (see Chapter Binomial distribution). The sample proportion is the number of ’successes’ in the sample and is the mean of the bernoulli variables ${\displaystyle X_{1},\dots ,X_{n}}$.  It is worth emphasizing that the sample proportion is a sample mean and as such inherits the properties and behavior of a sample mean.  Thus, ${\displaystyle {\widehat {\pi }}={\frac {1}{n}}\sum \limits _{i=1}^{n}X_{i}}$ with expectation and variance ${\displaystyle E({\widehat {\pi }})=\pi \,,\quad Var({\widehat {\pi }})={\frac {\pi (1-\pi )}{n}}}$ It is an unbiased and consistent estimator of ${\displaystyle \pi }$ (see Chapter properties of estimators). Since it is quite difficult to construct confidence intervals for small samples, we will restrict ourselves to the case where the sample size ${\displaystyle n}$ is sufficiently large so that we may use the central limit theorem to obtain the distribution of the estimator.  In particular, ${\displaystyle z={\frac {{\widehat {\pi }}-\pi }{\sigma ({\widehat {\pi }})}}={\frac {{\widehat {\pi }}-\pi }{\sqrt {\frac {\pi (1-\pi )}{n}}}}}$ is approximately normal: ${\displaystyle z\sim N(0;1)}$. Hence we conclude that ${\displaystyle P\left(-z_{1-{\frac {\alpha }{2}}}\leq {\frac {{\widehat {\pi }}-\pi }{\sigma ({\widehat {\pi }})}}\leq z_{1-{\frac {\alpha }{2}}}\right)\approx 1-\alpha \,,}$ where ${\displaystyle z_{1-\alpha /2}}$ is obtained from standard normal tables. Still we cannot construct a confidence interval for ${\displaystyle \pi }$, since the variance of ${\displaystyle {\widehat {\pi }}}$ depends on ${\displaystyle \pi }$ which is unknown . We simply replace ${\displaystyle \sigma ({\widehat {\pi }})}$ with a consistent estimate ${\displaystyle {\widehat {\sigma }}({\widehat {\pi }})={\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}}$ This is a consistent estimator of ${\displaystyle {\sqrt {\frac {\pi (1-\pi )}{n}}}}$ since ${\displaystyle {\hat {\pi }}}$ is a consistent estimator of ${\displaystyle \pi }$. The above probability statement becomes ${\displaystyle P\left(-z_{1-{\frac {\alpha }{2}}}\leq {\frac {{\widehat {\pi }}-\pi }{{\hat {\sigma }}({\widehat {\pi }})}}\leq z_{1-{\frac {\alpha }{2}}}\right)\approx 1-\alpha }$ We isolate ${\displaystyle \pi }$ in the middle of the probability statement to obtain: ${\displaystyle P\left({\widehat {\pi }}-z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}\,\,\leq \,\,\,\,\,\,\pi \,\,\,\,\,\,\leq \,\,\,{\widehat {\pi }}+z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}\right)\approx 1-\alpha \,.}$ Hence for large sample sizes an approximate confidence interval is given by: ${\displaystyle \left[{\widehat {\pi }}-z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}},\,\,\,\,\ \,{\widehat {\pi }}+z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}\right]}$ The normal distribution provides a reasonable approximation so long as ${\displaystyle \pi }$ is not too close to zero or one. Typically, sample size should be no smaller than ${\displaystyle 30}$, and preferably substantially larger, e.g. ${\displaystyle n\geq 100}$.

• The two sided confidence intervals we have constructed assign roughly equal probabilities to the tails: ${\displaystyle P\left(\pi <{\widehat {\pi }}-z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}\right)\approx {\frac {\alpha }{2}}\,,\,\,\,\,\,\,P\left({\widehat {\pi }}+z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}<\pi \right)\approx {\frac {\alpha }{2}}\,.}$
• By construction the confidence interval is symmetric around the point estimate ${\displaystyle {\hat {\pi }}}$.
• The length of the interval${\displaystyle 2z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}}$ is a random variable, since it depends through ${\displaystyle {\widehat {\pi }}}$ on the random sample.
• The length of the confidence interval also depend on the confidence level ${\displaystyle 1-\alpha }$ and on ${\displaystyle n}$.

We are given a sample of ${\displaystyle N=3250}$ employees of an insurance company. The following dichotomous variables are measured:${\displaystyle X1}$ = 1 if the employee is a client of Accident Ltd${\displaystyle X2}$ = 1 if the employee belongs to the field staff${\displaystyle X3}$ = 1 if the employee has a company car ${\displaystyle X4}$ = 1 if the employee has professional experience In each case, the dichotomous variable takes a value of 1 or 0. The population proportion for each of these characteristics is unknown. We will obtain a point and interval estimate of the corresponding population proportion. We will be using confidence intervals based on the normal approximation. You will have the opportunity to examine the effect of the confidence level and the sample size on the confidence interval. We recommend that you alter only one of these features at a time while holding the other constant. Please select

• the variable to be analyzed
• the sample size ${\displaystyle n}$
• the confidence level ${\displaystyle 1-\alpha }$ (as a decimal number e.g. 0.95)

Results:This interactive example provides

1. the confidence interval for the selected confidence level

If you choose the same variable repeatedly, but enter different confidence levels/sample sizes, the previous results are also displayed for comparison purposes.

The leader of a political party ’F’ is interested in knowing what fraction of citizens would vote for it if an election were held. A survey of ${\displaystyle n=2000}$ citizens is performed which asks the question: ”If there were an election tomorrow which party would receive your vote?” According to the survey 103 citizens declared that would vote for F. We wish to estimate a 95% confidence interval for ${\displaystyle \pi }$ , the proportion of voters who would vote for F. Note the following:

• In order to insure that a citizen that has been already asked is not sampled a second time, we sample without replacement, (though the probability of replication is low given the sample size).
• Since interest is focused on party F, the event ${\displaystyle A}$ is defined as ”the individual votes for F” and the complementary event ${\displaystyle {\bar {A}}}$  as ”the individual does not vote for F”. Thus for our purposes the population is dichotomous. The proportion of votes for party F is ${\displaystyle \pi =P(A)}$.
• The sample size is sufficiently large (${\displaystyle n=2000}$), so that one may construct an approximate confidence interval using the normal approximation: ${\displaystyle \left[{\widehat {\pi }}-z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}},\,\,\ \ \,{\widehat {\pi }}+z_{1-{\frac {\alpha }{2}}}\cdot {\sqrt {\frac {{\widehat {\pi }}(1-{\widehat {\pi }})}{n}}}\right]}$ which has an approximate confidence level of 95%. We obtain ${\displaystyle z_{0.975}=1.96}$

The results of the survey yield ${\displaystyle {\hat {\pi }}=103/2000=0.0515}$ and a 95% confidence interval:${\displaystyle \left[0.0515-1.96\cdot {\sqrt {\frac {0.0515\cdot 0.9485}{2000}}},\,\,\ \,0.0515+1.96\cdot {\sqrt {\frac {0.0515\cdot 0.9485}{2000}}}\right]}$${\displaystyle =[0.0418\,;\,0.0612]\,.}$