# Testing the Proportion in a Binary Population

### From MM*Stat International

English |

Português |

Français |

Español |

Italiano |

Nederlands |

Assume a random variable has only two possible outcomes. We call the statistical population of binary. If is an indicator variable storing the information about the existence (or non-existence) of a feature, we can carry out statistical inference about the proportion of elements within the population possessing the property of interest () or not (). As in other parametric tests, the inference relates to a hypothetical value, here , that represents a hypothetical proportion of population elements having the property of interest.
We will introduce statistical test procedures based on a *simple random sample* of size . This ensures that the sample variables , which are indicator variables with outcomes measured as either or , are independent and identically distributed Bernoulli variables. As usual the significance level is denoted by .

### Hypotheses

Depending on the application at hand, one- or two-sided tests are formulated: 1) 2) 3) Our earlier remarks on the choice of null and alternative hypothesis in the section on testing population means also apply in this environment.

### Test statistic and its distribution; decision regions

The sample proportion is a suitable estimator of the population parameter . The estimator is a simple transformation of (), which contains all the important information. It counts the number of elements in the sample possessing the property of interest. As has already been shown, follows a Binomial distribution with parameters and : . As is chosen by the decision-maker, is the only remaining parameter needed to completely specify the Binomial distribution. Following the logic applied in all parametric hypothesis testing problems, we assume to be , that is, we determine the distribution of the test statistic given the hypothetical proportion is the one prevailing in the population: . Hence, the estimator becomes our , since it has a Binomial distribution with parameter and under :
The rejection region of the null hypothesis contains all realizations of for which the cumulated probabilities don’t exceed the . The critical values can be read from the numerical table of the cumulative distribution function of , by following these rules:
1)
The lower critical value is the realization of , for which the cumulative distribution function just exceeds the value : and .
The upper critical value is the argument of the cumulative distribution function that returns a probability equal to or greater than : and .
The *rejection region* for H is given by, such that
.
For the *non-rejection region* for H we have, such that
.
2)
The critical value is the smallest realization of the test statistic that occurs with cumulated probability of at least : and .
The *rejection region* for H is then, such that
.
The *non-rejection region* for H is, such that
.
3)
The critical value is determined as the smallest realization of the test statistic that occurs with cumulated probability of at least : and .
The *rejection region* for H is , such that
.
The *non-rejection region* for H is given by, such that
.
As is a discrete random variable, the given significance level will generally not be fully utilized (exhausted). The actual significance level will only by chance reach that level and will usually be smaller. The above tests are thus conservative with respect to the utilization of the allowance for the maximum probability of the type I error.
Given the sample size is sufficiently high, the estimator can be standardized to give the test statistic
Here, is the standard deviation of the estimation function under .
Under , has approximately standard normal distribution (i.e. normal with mean 0 and variance 1). Critical values for the given significance level can be taken from the cumulative standard normal distribution table. Decision regions for the one- and two sided tests are determined in the same way as those for the approximate population mean test for unknown : In fact, a hypothesis about a proportion is a hypothesis about an expectation (of a binary indicator variable): .

### Sampling and computing the test statistic

Once a sample of size has been drawn, we have realization of the sampling variables, , and can compute the realized value of the .

### Test decision and interpretation

See the remarks for the test.

### Power Curve

The *power* curve of the large-sample test based on can be calculated explicitly for all test situations in the same manner as the power curve for the population mean tests.
The power curve of the exact test based on is computed using the *Binomial distribution* (as this is the distribution underlying the test statistic) for all and fixed .
From the definition it follows
1) for the two-sided test
2) for the right-sided test
3) for the left-sided test
Given the respective critical values, the probabilities can be looked up in the numerical table of the cumulative Binomial .
For , the power curve equals the actual .
Imagine a ‘binary population’ of economics students, an unknown proportion of which is enthusiastic about statistics. We define the random variable to assume one if the statistical element (‘economics student’) likes statistic and zero if not.
We believe that half of the students fancy learning statistical concepts (our hypothetical proportion is thus ) and want to test whether this informed guess is true in statistical terms, at a significance level of on the basis of a random sample of size :
In this interactive example you can repeat this test as often as you like. In each run a new sample is simulated (drawn). You can interact by deciding about and in each repetition. In particular, you can try the following combinations:

- Hold significance level and sample size constant;
- vary the significance level and keep the sample size constant;
- set to a new level and hold constant; or
- change both the significance level and the sample size .

One of the raison d’etres of financial intermediaries is their ability to efficiently assess the credit-standing (‘creditworthiness’) of potential borrowers. The management of ABC bank decides to introduce an extended credit checking scheme if the proportion of customers with repayment irregularities isn’t below per cent. The in-house statistician conducting the statistical test is asked to keep the probability of not deciding to improve the credit rating procedure even though the proportion is ‘really’ above per cent low (i.e. to keep low). The random variable ‘credit event’ or ‘repayment problems’ is defined as an indicator variable taking on zero (‘no’) or one (‘yes’). The actual proportion of clients having trouble with servicing the debt is unknown. The hypothetical boundary value for testing this population proportion is .

### Hypothesis

Deviations from the hypothetical parameter into one direction are of interest; thus, a one-sided test will be employed. As the bank hopes to prove that the evaluation processes in place are sufficient, i.e. the proportion of debtors displaying irregularities in repaying their loans is less than per cent, this claim is formulated as the alternative hypothesis:
The properties of this test with respect to the bank managers’ requirements have to be evaluated to ensure the test really meets their needs.
The type I error, which can be made if the null hypothesis is rejected, is here:
If the test results in the non-rejection of the null hypothesis, a type II error might occur:
The type I error represents the risk the managers of the ABC bank want to cap. Its maximum level is given by the significance level, which has been set to a sufficiently low level of .
The type II error represents the risk of a costly introduction of new credit evaluation processes without management-approved need. The impact of this scenario on the banks’ profitability is difficult to assess, as the new process will lead to a repricing of credits and thus may also generate cost *savings*.
The following two alternatives are both based on the above test.
A random sample is drawn from the population of debtors without replacement. This is reasonable, if , as then the random sample can then be regarded as ‘simple’.

### 1st alternative

To curb costs, a sample size of is chosen. The sampling-theoretical requirement is fulfilled.

### Test statistic and its distribution; decision regions

The estimator ‘Number of clients with irregularities in debt servicing in sample of size 30’ can directly serve as our test statistic . Under , has Binomial distribution . A small supports the . The critical value is the smallest realization of , for which equals to or is greater than , i.e. it has to satisfy: and . In the numerical table of the cumulative distribution function of we find , and thus we have the following decision regions: Rejection region for H:, with . Non-rejection region for H:, with . Because is a discrete random variable, the given isn’t exhausted: i.e.

### Sampling and computing the test statistic

randomly selected debtors are investigated with respect to reliability in debt servicing. Assume of them haven’t always fulfilled their contractual obligations: .

### Test decision and interpretation

As belongs to the non-rejection region for H, the is not-rejected. Even though the sample proportion is smaller than the hypothetical boundary proportion , which should favour H, we cannot conclude is false: at a significance level of , the difference cannot be regarded as statistically significant. In other words: It is far too likely that the difference has arisen from sampling variability due to the small sample size to be able to reject the null hypothesis. It is important to observe that it’s not merely the value of the point estimator compared to the hypothetical value that leads to a non-rejection or rejection of the null hypothesis, but intervals that take into account the random character of the estimator (i.e. the difference is compared to an appropriate, case specific, statistical yardstick to determine what is statistically significant large, and hence small, deviations/differences). Based on a random sample of size and a significance level , we were unable to show statistically, that the proportion of trouble debtors is significantly smaller than per cent. Consequently, the ABC bank will review and try to improve the credit approval procedures.

### Power

Not having rejected the null hypothesis, we are vulnerable to a type II error, which occurs when the is a true statement: . Let’s calculate the type II error probability for a true parameter value : What is the probability of not rejecting the null hypothesis in a left-sided test with , , and , given the true population proportion is and hence the null hypothesis actually wrong? We compute where is taken from the table of the cumulative distribution function for , that is . Interpretation: Given the true proportion is , of all samples of size will not be able to discriminate between the true parameter and the hypothetical , inducing the bank to undertake suboptimal improvements of the credit assessment process with probability . In deciding to control the maximum error I probability, the bank is accepting type II error probabilities of such magnitude, statisticians can provide management with power function graphs for any desired true parameter value . Of course, not rejecting the null hypothesis can also be the right decision: . Suppose, for example, that the true proportion of unreliable debtors is . The probability of not rejecting the null hypothesis and hence (unknowingly) making the right decision given our current test setting (left sided with , , and thus ) is We have where can be looked up in a numerical table of as the cumulative probability for values less than or equal to , i.e. . These calculations can be carried out for any desired parameter value within the overall parameter space (here: ). Depending on which hypothesis the individual parameter adheres to, the power curve or returns probabilities for making a right decision or a type I or type II error.

True hypothesis | |||
---|---|---|---|

The following display shows the graph of the power curve in the left-sided test with parameters , , and .

### 2nd alternative

Now the statistician tries to both satisfy the parameter set by the management to contain the probability of the crucial type I error *and* keep the type II error as low as possible. She is aware of the trade-off relationship between and error and focuses on possibilities of reducing the associated probabilities simultaneously by increasing the sample size and thus making the decision an economic one. Cost projections in conjunction with a valuation of the benefit of higher reliability lead to a choice of , still small enough to satisfy as basis for simple random sampling without replacement.

### Test statistic and its distribution; decision regions

The standardized test statistic is used. Under , it is approximately with parameters and . Large sample theory suggests that the approximation is sufficiently accurate for a sample size of . From the cumulative standard normal distribution table we can take to satisfy . From symmetry it follows that , and we have as the approximated for H and as the approximated non-rejection region for H.

### Sampling and computing the test statistic

From the universe of debtors, are selected and random, of which turn out to have displayed problems in debt servicing at least once in their repayment history. Their proportion within the sample is thus . Plugging this into the test statistic yields

### Test decision and interpretation

As falls into the non-rejection region for H, the null hypothesis is not rejected. On the basis of this particular sample of size , it can not be statistically claimed, that the proportion of problematic debtors is less than per cent. The ABC bank management will thus initiate a review of their credit procedures.

### Type II error probability

As the bank management has been induced to not-reject the statement in the null hypothesis, it may have made a type II error, which occurs if the true proportion amongst the is actually smaller than : . Let’s examine the probability of this happening for a ‘hypothetical’ true population proportion of , i.e. .
First we must determine the critical proportion corresponding to the critical value calculated using the normal approximation. From follows
is the probability of the sample function assuming a value from the non-rejection region of the null hypothesis, given the true parameter belongs to the alternative hypothesis:
In order to determine this probability on the basis of a numerical table for the *standard* normal distribution, we must standardize using and :
In the standard normal distribution table we find and thus have
Thus, compared to from the 1st alternative, the increase in the sample size has resulted in a sizeable reduction in the error type II probability for a true population proportion of .

A statistics professor has the impression that in the last year the university library has bought proportionally less new statistics books than in the past. Over the last couple of years the relative amount of statistics books amongst new purchases has consistently been more than per cent. He asks one of his assistants to investigate whether this has changed in favour of other departments. Acting on behalf of his students whom he wants to secure as many new books as possible, he asks his assistant to minimize the risk of not complaining to the head of the library when the proportion of statistics books *has* decreased.
The assistant decides to have a sample of books taken from the file containing the new purchases over the last months. He wants to know how many of these are statistics books. He is thus dichotomizing the random variable ‘subject matter’ into the outcomes ‘statistics’ and ‘not statistics’. Of course, if you regard the purchases as an outcome of a decision-making process conducted by the librarians, this is anything but a random variable. But for the statisticians who rely on a sample because they don’t have access to all relevant information, it appears to be one. From the proportion of statistics books the assistant wants to infer to the population of all newly purchased books, using a statistical test to allow for deviations of the proportion in the sample from those in the population. In particular, he wants to verify whether the proportion has indeed dropped below the past average of per cent. He will thus test the population proportion and chooses a ‘standard’ of .

### Hypothesis

As the assistant wants to verify whether the proportion has dropped below , he has to employ a one-sided test. He recalls that the professor wants him to minimize the probability of not disclosing that the proportion has decreased below when in reality it has. He thus opts for a right-sided test, i.e. puts the professors’ claim as null hypothesis in the hope of not rejecting it: The assistant undertakes an investigation into the properties of this test with respect to the professors’ intention of minimizing the probability of not detecting a relative decrease in the statistics book supply. A real-world decrease can only not have been detected if the has been rejected even though it is really true. This situation is called type I error: The maximum probability of this situation, , is given by the , which has been set to . Thus, the risk the professor wanted to ‘minimize’ is under control. If the null hypothesis is not-rejected, then a type II error can arise: The probability of this happening (conditional on the null hypothesis not having been rejected), , is unknown, because the true proportion (which is element of the parameter set specified by the ), is unknown. As we have already seen in other examples, it can be substantial, but the professors’ priorities lie on trading off type II error for type I error which is under control.

### Test statistic and its distribution; decision regions

The estimator ‘number of statistics books in a sample of books’ can serve as test statistic . Under , has Binomial distribution with parameter and : . A relatively high number of statistics books in the sample supports the , that the proportion of statistics books has not decreased. The critical value is the realization of , for which equals or exceeds , that is, we require and .
In the table of the cumulative of you will find .
The *rejection region* for H is thus, such that .
As is a discrete random variable, the given significance level isn’t fully utilized:
The *non-rejection region* for H is given by, such that .

### Sampling and computing the test statistic

A subset of books is selected at random from the list of last years’ new purchases and categorized in statistics and non-statistics books. As the total amount of new books is sufficiently large from a sample-theoretical point of view, a simple random sample is drawn, i.e. the sampling is carried out without replacement. The amount of statistics books in the sample is counted to be , which will serve as the realized test statistic value .

### Test decision and interpretation

As falls into the non-rejection region for H, the cannot be rejected. On the basis of a random sample of size and a significance level of , the assistant couldn’t verify statistically that the proportion of statistics books is still above per cent. This test result means that a complaint to the library seems to be merited.

### Power

Given our test parameters (, , and ), what is the probability of not rejecting the null hypothesis if the true proportion of statistics books is ? That is, we want to calculate the probability of the type II error given a specific element of the parameter set associated with the alternative hypothesis, : In the table of the cumulative Binomial distribution we find this probability to be . Alas, if the true proportion has increased to per cent, there is still a per cent chance of not discovering a significant deviation from the hypothetical boundary proportion of per cent. This is the probability of an unjustified complaint issued by the professor given the proportion has risen to —a substantial relative increase. The probability of making a type II error contingent on alternative true proportions can be computed via the power curve. Levels of and for several values of are listed in the following table.

True hypothesis | |||
---|---|---|---|

For example, if the true proportion (and therefore absolute amount) of statistics books is , the sample cannot contain any statistics books and we will expect and won’t reject the null hypothesis. The rejection of the null hypothesis () is an impossible event with associated probability of zero. The poweris the conditional probability of rejecting the null hypothesis given the relative amount is zero: If, on the other hand, the true proportion of statistics books is , the power is calculated as where can be looked up in the table of the cumulative distribution function as the value of for . is the probability of correctly rejecting the null hypothesis, . The probabilities of rejecting the null hypothesis and not-rejecting it must always sum up to one for any given true parameter value within the range specified by the alternative hypothesis: For a true proportion of , the former sampling result amounts to making a type II error, the probability of which is denoted by . Thus, we can write or As is the value of the power at point , we can calculate the probability of making a type II error as If the true proportion of statistics books is per cent, per cent of all samples of size will lead to a non-rejection of the null hypothesis, i.e. won’t detect the significant difference between and . The following display depicts the graph of the power curve for the right-sided test we have just discussed: , , and .