# Chi-Square Test of Independence

### From MM*Stat International

English |

Português |

Français |

Español |

Italiano |

Nederlands |

The chi-square test of independence allows us to test for statistical (stochastic) independence. It is a nonparametric test applicable to all measurement scales.
We assume that two random variables and are observed simultaneously on statistical elements, the observed pairs being mutually independent (simple random sample). If and are discrete random variables, they can be observed in the realization respectively . If and are continuous (including quasi-continuous discrete variables), the sample space has to be partitioned into disjoint exhaustive classes (intervals). In this case, and denote representative values within the classes (usually the class midpoints) and and denote the overall number of classes. A suitable representation of the observed joint *frequency distribution* is the two-dimensional frequency table, also known as *bivariate contingency table* (see Chapter 10 for additional material on contingency tables).
Two-dimensional contingency table

md | |||||||
---|---|---|---|---|---|---|---|

Here, denotes the absolute frequency of the observed pair , i.e. that assumes or a value from the th class, *and* assumes or a value within the th class:
The last column contains the observed *marginal distribution* (md) of , composed of the absolute marginal frequencies , denoting the frequencies with which has been observed in (discrete realization or class midpoint) regardless of the value of . In the last row you find the observed marginal distribution of , given by the absolute marginal frequencies , the frequencies of being observed in regardless of . The following definitions are used in the construction of the two-dimensional contingency table:

### Hypothesis

The null hypothesis in a chi-square test of independence states that and are statistically (stochastically) independent; the alternative hypothesis negates this.
versus
If the null hypothesis is true, the multiplication rule for independent events gives
In above formula,
denotes the probability of assuming (or a value belonging to the class represented by ) *and* assuming (or a value within the th class),
is the probability of being observed in respectively the th class (marginal probabilities of ),
is the probability that assumes the value or is observed in the th class (marginal probabilities of ).
The pair of hypotheses can thus be written versus
As usual the significance level and sample size have to be fixed before the test is conducted.

### Test statistic and its distribution; decision regions

As the test is based on a comparison between observed absolute frequencies and absolute frequencies expected under the null hypothesis, the test statistic is built around absolute frequencies.
An observed sample is summarized in the bivariate contingency table in terms of joint absolute frequencies (). These quantities are outcomes of a random experiment and thus vary across samples. They are realizations of their theoretical counterparts, the random variables denoted by .
If the null hypothesis is true, the expected joint frequencies are . The joint probabilities and marginal probabilities and are unknown and have to be estimated from the sample. Unbiased and consistent estimators for and are the relative marginal frequencies (sample proportions) and . This implies, that we are assuming fixed marginal frequencies in the two-dimensional contingency table. Our estimators for the expected joint absolute frequencies under are given by
The comparison between the joint absolute frequencies encountered in the sample and those expected under the null hypothesis is based on the differences (). A test statistic weighting these differences is the sum
Under , the test statistic has approximately chi-square distribution with degrees of freedom. The approximation is sufficient, if for all pairs . When these conditions aren’t fulfilled, adjoining realizations (or classes) have to be combined into larger sets of (possible) observations. and denote the numbers of classes in both variables *after* such necessary (re-)grouping.
The *critical value* satisfying has to be looked up in the table of the cumulative chi-square distribution function with appropriate degrees of freedom ( ). The decision regions are
Rejection region for H:.
Non-rejection region for H:.
Under the null, the probability of the test statistic assuming a value from the rejection region for H equals the significance level . The probability of the test statistic being observed in the non-rejection region for H is .

### Sampling and computing the test statistic

After a sample of size has been drawn, the absolute frequencies of all observed realization pairs can be calculated. We can consolidate these into the empirical marginal frequencies for and and derive the expected absolute frequencies from these according to the above formulae. If violated the approximation conditions necessitate further grouping, and the frequencies , , and have to be recalculated. Plugging and into the test statistic formula yields the realized test statistic value .

### Test decision and interpretation

If falls into the rejection region for H, the null hypothesis is rejected on the basis of a random sample of size at a significance level of ( ). In this case it couldn’t be shown that the random variables and are statistically independent. If they *are* actually independent in the population, a type I error has been made (), the probability of which in repeated samples (tests) equals the : .
If belongs to the non-rejection region for H, the null hypothesis is not rejected on the basis of a random sample of size (). The sample doesn’t statistically contradict the assumption of independence. A type II error has been made, in this case, if the alternative hypothesis is actually true ().

In 1991 and 1996, randomly selected German citizens over have been presented the following two questions: Q1) Assess the current economic situation Q2) What is the economic outlook for the upcoming year The participants we asked to express their opinion on the following scale: Q1) 1 = Very Good, 2 = Good, 3 = Satisfactory, 4 = Fair, 5 = Poor Q2) 1 = Significantly improved, 2 = Improved, 3 = Unchanged, 4 = Deteriorated, 5 = Significantly deteriorated The questions are translated into the random variables ‘Current economic situation’ and ‘Economic outlook’, with the above realizations. In addition, a third variable ‘Survey region’ with the categories ‘West Germany’ and ‘East Germany’ has been recorded. We want to test at a significance level of , whether the random variables and respectively and as surveyed in 1991 and 1996 are statistically independent.

### Hypothesis; Test statistic and its distribution

The independence of the random variables has to be stated in to facilitate the computation of the expected absolute joint frequencies and thus the test statistic: versus and versus We use the test statistic for the chi-square test of independence, which, under , has approximately a chi-square distribution with degrees of freedom. The decision regions of the null hypothesis cannot be determined before the sample has been drawn and analyzed, because we have to follow a sequential approach:

- First, we estimate the expected joint absolute frequencies.
- On this basis we can check the approximation conditions and, if necessary, combine values or classes.
- Now we can determine the degrees of freedom and retrieve the critical values.

### Sampling and computing the test statistic; test decision

Tables to contain the joint absolute frequencies in the samples of the years and as well as the expected absolute joint frequencies for true null hypothesis, calculated as and the differences . Current economic situation () versus region ( ), 1991

West | East | |||

Very good | observed | |||

expected | ||||

difference | ||||

Good | observed | |||

expected | ||||

difference | ||||

Satisfactory | observed | |||

expected | ||||

difference | ||||

Fair | observed | |||

expected | ||||

difference | ||||

Poor | observed | |||

expected | ||||

difference | ||||

Current economic situation () versus region ( ), 1996

West | East | |||

Very good | observed | |||

expected | ||||

difference | ||||

Good | observed | |||

expected | ||||

difference | ||||

Satisfactory | observed | |||

expected | ||||

difference | ||||

Fair | observed | |||

expected | ||||

difference | ||||

Poor | observed | |||

expected | ||||

difference | ||||

Economic outlook () versus region (), 1991

West | East | |||

Significantly improved | observed | |||

expected | ||||

difference | ||||

Improved | observed | |||

expected | ||||

difference | ||||

Unchanged | observed | |||

expected | ||||

difference | ||||

Deteriorated | observed | |||

expected | ||||

difference | ||||

Significantly deteriorated | observed | |||

expected | ||||

difference | ||||

Economic outlook () versus region (), 1996

West | East | |||

Significantly improved | observed | |||

expected | ||||

difference | ||||

Improved | observed | |||

expected | ||||

difference | ||||

Unchanged | observed | |||

expected | ||||

difference | ||||

Deteriorated | observed | |||

expected | ||||

difference | ||||

Significantly deteriorated | observed | |||

expected | ||||

difference | ||||

The approximation conditions are fulfilled for all tests to be conducted, i.e. for all pairs . The critical value satisfying is as we have degrees of freedom. The decision regions are thus Rejection region for H:. Non-rejection region for H:. Chi-square values and resulting decisions for the tests are

Year | Random variables | Test statistic value | Test decision |
---|---|---|---|

, | Reject H | ||

, | Do not-reject | ||

, | Reject | ||

, | Reject H |

### Interpretation

While the data rejects the null hypothesis,of statistical independence, at a significance level of , the proposition that the random variables ‘Current economic situation’ and ‘Survey region’ are statistically independent, is not-rejected for the data. But we can extract more qualitative information if we look at the contingency tables. As can be seen from the comparatively high positive differences for the positive statements in table , in West Germans tended to classify the economic situation more positively compared to the East Germans. In , there are still positive differences , but their sum isn’t significant anymore. Some kind of convergence in the assessment of the (then) current economic situation has taken place. Both surveys’ data reject the null hypothesis that the random variables ‘Economic outlook’ and ‘Survey region’ are statistically independent at a significance level of . Observe that in both years the East Germans have been more positive about the future of the economy than the West Germans. If you compare the differences for both years, you will notice the same qualitative tendency to homogeneity in opinions across (East and West) Germany as in the assessment of the current economic environment. Yet quantitatively they are still large enough (in total) to be significant in , and we cannot but conclude (at least within the assumed test parameter setting), that the East and West Germans have a structurally different opinions. The type of dependency between and can be explored using suitable statistical tools for dependence analysis (e.g. categorical regression).

Someone suggests that the number of defects on a car is statistically independent from its age. We want to test this hypothesis at a significance level of using the chi-square test of independence. The random variable ‘number of defects’ is measured in the realization ‘ no defect’, ‘ defect’ and ‘ or more defects’; random variable ‘cars’ age’ is categorized as ‘ year’, ‘ year and years’ and ‘ years’.

### Hypothesis

As the test statistic underlying the chi-square test of independence uses as inputs the expected joint frequencies, which are in turn calculated using the assumption of independence, the independence hypothesis must to be stated as null hypothesis: versus or versus

### Test statistic and its distribution; decision regions

We use the test statistic of the chi-square test of independence: Under , is approximately chi-square distributed with degrees of freedom. The decision regions of the null hypothesis can only be determined after the sample has been drawn and analyzed:

- First, the expected joint absolute frequencies have to be estimated.
- Then the approximation conditions can (must) be checked and necessary combinations of classes (or values) can be established.
- Once the two above steps have been concluded, and not before, the degrees of freedom can be determined and the critical values looked up.

### Sampling and computing the test statistic

Police officers positioned at various locations randomly stop cars and record age and number of defects. In the table below, the absolute joint and marginal frequencies in this sample are listed together with the expected frequencies under the null hypothesis, calculated as

observed | |||||

expected | |||||

observed | |||||

expected | |||||

observed | |||||

expected | |||||

The approximation conditions are fulfilled, as all expected absolute joint frequencies are equal to or greater than five: . We are observing and in respectively classes an thus have degrees of freedom. The satisfying is looked up in the table of the chi-square distribution as , implying the following decision regions Rejection region for H:. Non-rejection region for H:. The realized test statistic value is

### Test decision and interpretation

Since the test statistic value falls into the rejection region the null hypothesis is rejected. Given our test parameters (sample size and significance level ), we could verify the random variables ‘number of defects’ and ‘cars’ age’ to be statistically dependent. If this is not true in the population, we have made a type I error (). In repeated samples (tests) the probability of this happening is given by the The principle underlying the independence tests resembles that of the parametric tests. A test statistic is constructed to summarize (consolidate) the distance of the relevant information about the theoretical distribution under the null hypothesis from the corresponding structure in the sample (i.e. measure the distance between the two distributions). The distribution of the test statistic has to be determined—either exactly or approximately. The is being tested, and the decision can result in a type I error with probability , if the null hypothesis has been rejected or in a type II error, if it has been not-rejected (respectively not rejected) with probability . The error I probability is controlled by setting the , but the type II error probability cannot be calculated, as there are infinitely many probability models different from that claimed to be the true one in the null hypothesis. For this reason one will try to reject the null hypothesis and thus back a possible rejection by a known maximum probability of making a wrong decision (in repeated samples).

### Hypothesis

If the random variables *are* independent in the population, we expect this to be reflected in the sample. But a sample cannot convey all the information imbedded in the population, and we have to account for random variation introduced by the sampling process. If the hypothesis is true, we expect it to be reflected accurately on statistical average only and have to determine what the expected deviation of the sample characteristics from the hypothetical ones arising from sampling noise is. Deviations of the observed joint absolute frequencies from those implied by independence, , will occur with probability one. The task is to quantify them relative to the expected variation—an excess disagreement, i.e. a significant deviation, leading to a rejection of the null hypothesis. As it is always the null hypothesis that is being tested, the independence of and has to be proposed as null hypothesis. Only this way the expected absolute frequencies can be calculated; after all we need some probability model that allows us to derive the distribution of the test statistic and thus assess its intrinsic variation. Large deviations of the observed joint absolute frequencies from those expected if and *are* independent, , contradict the independence assumption and thus increase the likelihood of rejecting the null hypothesis (everything else equal).
The test statistic underlying the chi-square test of independence is calculated using observed frequencies and the theoretical probabilities , , and ( ). If and are discrete random variables, the joint probabilities are related to exactly one pair of realizations:
Continuous random variables assume specific values with probability zero. Thus, the sample space has to be partitioned into exhaustive disjoint intervals. In the continuous case the probabilities are defined as follows:
is the probability of assuming a value belonging to the class *and* assuming a value from the class ,
is the probability of being observed in th class (marginal probabilities of ),
is the probability that takes values from the th class (marginal probabilities of ).
Formally:
To simplify and unify exposition for discrete and continuous random variables, and are taken to be values representative for the classes in the continuous case (e.g. midpoints). and denote the number of classes constructed for and .
Note that it may prove necessary to group observations from discrete variables into classes—if only to improve approximation accuracy (for the price of a less detailed probability model).

### Test statistic

We want to illustrate why the joint absolute frequencies are random variables. Our argumentation is valid both for discrete and continuous variables. Suppose we sample one statistical element from the population with respect to the random variables and and check whether the observation pair equals , i.e. whether the event has been realized. There are only two possible outcomes to this random experiment. The probability of the event happening is , and the probability of one single element not being observed in this particular pair of and realization is . If we draw a sample of independent pairs of observations, we repeat this random experiment times under the same conditions and thus with constant . In other words, we are carrying out a Bernoulli experiment with replications. In doing so, we are interested in the total number of occurrences of the event , i.e. the absolute frequency of the value pair in the sample. This frequency is the outcome of a Bernoulli experiment and thus varies across samples. Thus, ‘Number of occurrences of in a simple random sample of size ’ is a discrete random variable with possible outcomes . The random variable has Binomial distribution with parameters and : . Expectation for H is given by . If the null hypothesis is true and thus and are statistically independent, the joint probability is calculated according to the multiplication rule for independent events as the product of the marginal probabilities and : . The expected joint absolute frequencies are then given by . This result applies to all and . The test statistic is based on a comparison of the joint absolute frequencies encountered in the sample with those to be expected given the null hypothesis is true. The probabilities underlying the expected frequencies are unknown and have to be estimated from the sample. The comparison is based on the differences as distance measures. To prevent negative differences from offsetting positive ones (or vice versa), the difference is squared: . To account for varying importance of these squared deviations, they are weighted by dividing by : A difference of receives a higher weighting if than if . Summing over all pairs summarizes (condenses) all weighted squared deviations into one test statistic: As the are random variables, so is . Under the null hypothesis, for sufficiently large sample size and validity of the approximation conditions, is approximately chi-square distributed with degrees of freedom. If the approximation requirements aren’t fulfilled, bordering classes or values have to be combined in a suitable way. The outcomes of discretely measured random experiments are then being grouped into classes. and are the numbers of classes remaining after such a necessary re-grouping. Determining the degrees of freedom: There is a total of probabilities constituting the bivariate distribution of the random variables and as categorized in the two-dimensional contingency table. We lose one degree of freedom because the probabilities aren’t independent of each other: From follows that any probability is determined by the other joint probabilities. If we could derive all probabilities joint probabilities from both variables’ marginal distributions (probabilities) applying , we had thus degrees of freedom. Unfortunately the marginal probabilities and are unknown and have to be estimated from the data, further reducing the degrees of freedom. The marginal distribution of encompasses probabilities , of which only have to be estimated because . The same applies to the marginal distribution of : As , only marginal probabilities have to be estimated. Thus, a total of marginal probabilities has to be estimated, and the overall degrees of freedom are: As is positive for all pairs , the test statistic will always be positive. Large deviations translate into a high test statistic value. The null hypothesis is thus rejected for high values of . Hence, the chi-square test of independence is a right-sided test.