Confidence Interval for Proportion
From MM*Stat International
Suppose we are drawing from a dichotomous population, where denotes the proportion of elements with a given property. We want to estimate a confidence interval for the unknown parameter . We draw a random sample of size in a such manner that are independently and identically bernoulli distributed (see Chapter Binomial distribution). The sample proportion is the number of ’successes’ in the sample and is the mean of the bernoulli variables . It is worth emphasizing that the sample proportion is a sample mean and as such inherits the properties and behavior of a sample mean. Thus, with expectation and variance It is an unbiased and consistent estimator of (see Chapter properties of estimators). Since it is quite difficult to construct confidence intervals for small samples, we will restrict ourselves to the case where the sample size is sufficiently large so that we may use the central limit theorem to obtain the distribution of the estimator. In particular, is approximately normal: . Hence we conclude that where is obtained from standard normal tables. Still we cannot construct a confidence interval for , since the variance of depends on which is unknown . We simply replace with a consistent estimate This is a consistent estimator of since is a consistent estimator of . The above probability statement becomes We isolate in the middle of the probability statement to obtain: Hence for large sample sizes an approximate confidence interval is given by: The normal distribution provides a reasonable approximation so long as is not too close to zero or one. Typically, sample size should be no smaller than , and preferably substantially larger, e.g. .
- The two sided confidence intervals we have constructed assign roughly equal probabilities to the tails:
- By construction the confidence interval is symmetric around the point estimate .
- The length of the interval is a random variable, since it depends through on the random sample.
- The length of the confidence interval also depend on the confidence level and on .
We are given a sample of employees of an insurance company. The following dichotomous variables are measured: = 1 if the employee is a client of Accident Ltd = 1 if the employee belongs to the field staff = 1 if the employee has a company car = 1 if the employee has professional experience In each case, the dichotomous variable takes a value of 1 or 0. The population proportion for each of these characteristics is unknown. We will obtain a point and interval estimate of the corresponding population proportion. We will be using confidence intervals based on the normal approximation. You will have the opportunity to examine the effect of the confidence level and the sample size on the confidence interval. We recommend that you alter only one of these features at a time while holding the other constant. Please select
- the variable to be analyzed
- the sample size
- the confidence level (as a decimal number e.g. 0.95)
Results:This interactive example provides
- the confidence interval for the selected confidence level
If you choose the same variable repeatedly, but enter different confidence levels/sample sizes, the previous results are also displayed for comparison purposes.
The leader of a political party ’F’ is interested in knowing what fraction of citizens would vote for it if an election were held. A survey of citizens is performed which asks the question: ”If there were an election tomorrow which party would receive your vote?” According to the survey 103 citizens declared that would vote for F. We wish to estimate a 95% confidence interval for , the proportion of voters who would vote for F. Note the following:
- In order to insure that a citizen that has been already asked is not sampled a second time, we sample without replacement, (though the probability of replication is low given the sample size).
- Since interest is focused on party F, the event is defined as ”the individual votes for F” and the complementary event as ”the individual does not vote for F”. Thus for our purposes the population is dichotomous. The proportion of votes for party F is .
- The sample size is sufficiently large (), so that one may construct an approximate confidence interval using the normal approximation: which has an approximate confidence level of 95%. We obtain
The results of the survey yield and a 95% confidence interval: