# Chi-Square Test of Independence

 English Português Français ‎Español Italiano Nederlands

The chi-square test of independence allows us to test for statistical (stochastic) independence. It is a nonparametric test applicable to all measurement scales. We assume that two random variables ${\displaystyle X}$ and ${\displaystyle Y}$ are observed simultaneously on ${\displaystyle i=1,\ldots ,n}$ statistical elements, the observed pairs being mutually independent (simple random sample). If ${\displaystyle X}$ and ${\displaystyle Y}$ are discrete random variables, they can be observed in the realization ${\displaystyle x_{k},\,k=1,\ldots ,K}$ respectively ${\displaystyle y_{j},\,j=1,\ldots ,J}$. If ${\displaystyle X}$ and ${\displaystyle Y}$ are continuous (including quasi-continuous discrete variables), the sample space has to be partitioned into disjoint exhaustive classes (intervals). In this case, ${\displaystyle x_{k},\,k=1,\ldots ,K}$ and ${\displaystyle y_{j},\,j=1,\ldots ,J}$ denote representative values within the classes (usually the class midpoints) and ${\displaystyle J}$ and ${\displaystyle K}$ denote the overall number of classes. A suitable representation of the observed joint frequency distribution is the two-dimensional frequency table, also known as bivariate contingency table (see Chapter 10 for additional material on contingency tables). Two-dimensional contingency table

${\displaystyle x}$ ${\displaystyle y}$ ${\displaystyle y_{1}}$ ${\displaystyle \cdots }$ ${\displaystyle y_{j}}$ ${\displaystyle \cdots }$ ${\displaystyle y_{J}}$ md ${\displaystyle x}$
${\displaystyle h_{11}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{1j}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{1J}}$ ${\displaystyle h_{1\bullet }}$
${\displaystyle \vdots }$ ${\displaystyle \cdots }$ ${\displaystyle \vdots }$ ${\displaystyle \cdots }$ ${\displaystyle \vdots }$ ${\displaystyle \vdots }$
${\displaystyle h_{k1}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{kj}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{kJ}}$ ${\displaystyle h_{k\bullet }}$
${\displaystyle \vdots }$ ${\displaystyle \cdots }$ ${\displaystyle \vdots }$ ${\displaystyle \cdots }$ ${\displaystyle \vdots }$ ${\displaystyle \vdots }$
${\displaystyle h_{K1}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{Kj}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{KJ}}$ ${\displaystyle h_{K\bullet }}$
${\displaystyle h_{\bullet 1}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{\bullet j}}$ ${\displaystyle \cdots }$ ${\displaystyle h_{\bullet J}}$ ${\displaystyle h_{\bullet \bullet }=n}$

Here, ${\displaystyle h_{kj}}$ denotes the absolute frequency of the observed pair ${\displaystyle \left(x_{k},y_{j}\right)}$, i.e. that ${\displaystyle X}$ assumes ${\displaystyle x_{k}}$ or a value from the ${\displaystyle k}$ th class, and ${\displaystyle Y}$ assumes ${\displaystyle y_{j}}$ or a value within the ${\displaystyle j}$th class: ${\displaystyle h_{kj}=h\left(\left\{X=x_{k}\right\}\cap \left\{Y=y_{j}\right\}\right)\,;\quad k=1,\ldots ,K\,,\,j=1,\ldots ,J.}$ The last column contains the observed marginal distribution (md) of ${\displaystyle X}$, composed of the absolute marginal frequencies ${\displaystyle h_{k\bullet }=h\left(X=x_{k}\right)\,;k=1,\ldots ,K}$, denoting the frequencies with which ${\displaystyle X}$ has been observed in ${\displaystyle x_{k}}$ (discrete realization or class midpoint) regardless of the value of ${\displaystyle Y}$. In the last row you find the observed marginal distribution of ${\displaystyle Y}$, given by the absolute marginal frequencies ${\displaystyle h_{j\bullet }=h\left(Y=y_{j}\right)\,;j=1,\ldots ,J}$, the frequencies of ${\displaystyle Y}$ being observed in ${\displaystyle y_{j}}$ regardless of ${\displaystyle X}$. The following definitions are used in the construction of the two-dimensional contingency table: ${\displaystyle h_{k\bullet }=\sum _{j=1}^{J}h_{kj}\,;\quad k=1,\ldots ,K;}$ ${\displaystyle h_{\bullet j}=\sum _{k=1}^{K}h_{kj}\,;\quad j=1,\ldots ,J;}$ ${\displaystyle h_{\bullet \bullet }=\sum _{k=1}^{K}h_{k\bullet }=\sum _{j=1}^{J}h_{\bullet j}=\sum _{k=1}^{K}\sum _{j=1}^{J}h_{kj}=n.}$

### Hypothesis

The null hypothesis in a chi-square test of independence states that ${\displaystyle X}$ and ${\displaystyle Y}$ are statistically (stochastically) independent; the alternative hypothesis negates this. ${\displaystyle {\text{H}}_{0}:X{\text{ and }}Y{\text{ are statistically independent}}}$versus${\displaystyle {\text{H}}_{1}:X{\text{ and }}Y{\text{are not statistically independent}}.}$ If the null hypothesis is true, the multiplication rule for independent events gives ${\displaystyle P\left(X=x_{k}\right\}\cap \left\{Y=y_{j}\right)=P\left(X=x_{k}\right)\cdot P\left(Y=y_{j}\right)=p_{k\bullet }\cdot p_{\bullet j}.}$ In above formula, ${\displaystyle p_{kj}}$ denotes the probability of ${\displaystyle X}$ assuming ${\displaystyle x_{k}}$ (or a value belonging to the class represented by ${\displaystyle x_{k}}$) and ${\displaystyle Y}$ assuming ${\displaystyle y_{j}}$ (or a value within the ${\displaystyle j}$th class), ${\displaystyle p_{k\bullet }}$ is the probability of ${\displaystyle X}$ being observed in ${\displaystyle x_{k}}$ respectively the ${\displaystyle k}$th class (marginal probabilities of ${\displaystyle X}$), ${\displaystyle p_{\bullet j}}$ is the probability that ${\displaystyle Y}$ assumes the value ${\displaystyle x_{k}}$ or is observed in the ${\displaystyle j}$th class (marginal probabilities of ${\displaystyle Y}$). The pair of hypotheses can thus be written ${\displaystyle {\text{H}}_{0}:p_{kj}=p_{k\bullet }\cdot p_{\bullet j}\quad \forall \left(k,j\right)}$ versus${\displaystyle {\text{H}}_{1}:p_{kj}\neq p_{k\bullet }\cdot p_{\bullet j}\quad {\text{ for at least one pair }}\left(k,j\right).}$ As usual the significance level ${\displaystyle \alpha }$ and sample size ${\displaystyle n}$ have to be fixed before the test is conducted.

### Test statistic and its distribution; decision regions

As the test is based on a comparison between observed absolute frequencies and absolute frequencies expected under the null hypothesis, the test statistic is built around absolute frequencies. An observed sample is summarized in the bivariate contingency table in terms of joint absolute frequencies ${\displaystyle h_{kj}}$ (${\displaystyle k=1,\ldots ,K\,,\,j=1,\ldots J}$). These quantities are outcomes of a random experiment and thus vary across samples. They are realizations of their theoretical counterparts, the random variables denoted by ${\displaystyle H_{kj}}$. If the null hypothesis is true, the expected joint frequencies are ${\displaystyle e_{kj}=n\cdot p_{k\bullet }\cdot p_{\bullet j}}$. The joint probabilities ${\displaystyle p_{kj}}$ and marginal probabilities ${\displaystyle p_{k\bullet }}$ and ${\displaystyle p_{\bullet j}}$ are unknown and have to be estimated from the sample. Unbiased and consistent estimators for ${\displaystyle p_{k\bullet }}$ and ${\displaystyle p_{\bullet j}}$ are the relative marginal frequencies (sample proportions) ${\displaystyle f_{k\bullet }=h_{k\bullet }/n}$ and ${\displaystyle f_{\bullet j}=h_{\bullet j}/n}$. This implies, that we are assuming fixed marginal frequencies in the two-dimensional contingency table. Our estimators for the expected joint absolute frequencies under ${\displaystyle {\text{H}}_{0}}$ are given by ${\displaystyle {\widehat {e}}_{kj}=n\cdot f_{k\bullet }\cdot f_{\bullet j}=n\cdot {\frac {h_{k\bullet }}{n}}\cdot {\frac {h_{\bullet j}}{n}}={\frac {h_{k\bullet }\cdot h_{\bullet j}}{n}}.}$ The comparison between the joint absolute frequencies encountered in the sample and those expected under the null hypothesis is based on the differences ${\displaystyle H_{kj}-{\widehat {e}}_{kj}}$ (${\displaystyle k=1,\ldots ,K\,;j=1,\ldots J}$). A test statistic weighting these differences is the sum ${\displaystyle V=\sum _{k=1}^{K}\sum _{j=1}^{J}{\frac {\left(H_{kj}-{\widehat {e}}_{kj}\right)^{2}}{{\widehat {e}}_{kj}}}.}$ Under ${\displaystyle {\text{H}}_{0}}$, the test statistic ${\displaystyle V}$ has approximately chi-square distribution with ${\displaystyle (K-1)\cdot \left(J-1\right)}$ degrees of freedom. The approximation is sufficient, if ${\displaystyle {\widehat {e}}_{kj}\geq 5}$ for all pairs ${\displaystyle \left(k,j\right)}$. When these conditions aren’t fulfilled, adjoining realizations (or classes) have to be combined into larger sets of (possible) observations. ${\displaystyle K}$ and ${\displaystyle J}$ denote the numbers of classes in both variables after such necessary (re-)grouping. The critical value ${\displaystyle c}$ satisfying ${\displaystyle P\left(V\leq c\right)=1-\alpha }$ has to be looked up in the table of the cumulative chi-square distribution function with appropriate degrees of freedom ( ${\displaystyle =(K-1)\cdot \left(J-1\right)}$ ). The decision regions are Rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v>\chi _{1-\alpha ;(K-1)\cdot \left(J-1\right)}^{2}\right\}}$. Non-rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v\leq \chi _{1-\alpha ;(K-1)\cdot \left(J-1\right)}^{2}\right\}}$. Under the null, the probability of the test statistic ${\displaystyle V}$ assuming a value from the rejection region for H${\displaystyle _{0}}$ equals the significance level ${\displaystyle \alpha =P\left(V>\chi _{1-\alpha ;(K-1)\cdot \left(J-1\right)}^{2}\,|\,{\text{H}}_{0}\right)}$. The probability of the test statistic ${\displaystyle V}$ being observed in the non-rejection region for H${\displaystyle _{0}}$ is ${\displaystyle P\left(V\leq \chi _{1-\alpha ;(K-1)\cdot \left(J-1\right)}^{2}\,|\,{\text{H}}_{0}\right)=1-\alpha }$.

### Sampling and computing the test statistic

After a sample of size ${\displaystyle n}$ has been drawn, the absolute frequencies ${\displaystyle h_{kj}}$ of all observed realization pairs ${\displaystyle \left(x_{k},y_{j}\right)}$ can be calculated. We can consolidate these into the empirical marginal frequencies for ${\displaystyle X}$ and ${\displaystyle Y}$ and derive the expected absolute frequencies ${\displaystyle {\widehat {e}}_{kj}}$ from these according to the above formulae. If violated the approximation conditions necessitate further grouping, and the frequencies ${\displaystyle h_{kj}}$, ${\displaystyle h_{k\bullet }}$, ${\displaystyle h_{\bullet j}}$ and ${\displaystyle {\widehat {e}}_{kj}}$ have to be recalculated. Plugging ${\displaystyle h_{kj}}$ and ${\displaystyle {\widehat {e}}_{kj}}$ into the test statistic formula yields the realized test statistic value ${\displaystyle v}$.

### Test decision and interpretation

If ${\displaystyle v}$ falls into the rejection region for H${\displaystyle _{0}}$, the null hypothesis is rejected on the basis of a random sample of size ${\displaystyle n}$ at a significance level of ${\displaystyle \alpha }$ (${\displaystyle '{\text{H}}_{1}^{'}}$ ). In this case it couldn’t be shown that the random variables ${\displaystyle X}$ and ${\displaystyle Y}$ are statistically independent. If they are actually independent in the population, a type I error has been made (${\displaystyle '{\text{H}}_{1}^{'}|{\text{H}}_{0}}$), the probability of which in repeated samples (tests) equals the : ${\displaystyle P\left('{\text{H}}_{1}^{'}|{\text{H}}_{0}\right)=\alpha }$. If ${\displaystyle v}$ belongs to the non-rejection region for H${\displaystyle _{0}}$, the null hypothesis is not rejected on the basis of a random sample of size ${\displaystyle n}$ (${\displaystyle '{\text{H}}_{0}^{'}}$). The sample doesn’t statistically contradict the assumption of independence. A type II error has been made, in this case, if the alternative hypothesis is actually true (${\displaystyle '{\text{H}}_{0}^{'}|{\text{H}}_{1}}$).

In 1991 and 1996, randomly selected German citizens over ${\displaystyle 18}$ have been presented the following two questions: Q1) Assess the current economic situation Q2) What is the economic outlook for the upcoming year The participants we asked to express their opinion on the following scale: Q1) 1 = Very Good, 2 = Good, 3 = Satisfactory, 4 = Fair, 5 = Poor Q2) 1 = Significantly improved, 2 = Improved, 3 = Unchanged, 4 = Deteriorated, 5 = Significantly deteriorated The questions are translated into the random variables ${\displaystyle X_{1}:}$ ‘Current economic situation’ and ${\displaystyle X_{2}:}$ ‘Economic outlook’, with the above realizations. In addition, a third variable ${\displaystyle Y:}$ ‘Survey region’ with the categories ‘West Germany’ and ‘East Germany’ has been recorded. We want to test at a significance level of ${\displaystyle \alpha =0.05}$, whether the random variables ${\displaystyle X_{1}}$ and ${\displaystyle Y}$ respectively ${\displaystyle X_{2}}$ and ${\displaystyle Y}$ as surveyed in 1991 and 1996 are statistically independent.

### Hypothesis; Test statistic and its distribution

The independence of the random variables has to be stated in ${\displaystyle {\text{H}}_{0}}$ to facilitate the computation of the expected absolute joint frequencies and thus the test statistic: ${\displaystyle {\text{H}}_{0}:X_{1}{\text{ and }}Y{\text{ are statistically independent}}}$ versus ${\displaystyle {\text{H}}_{1}:X_{1}{\text{ and }}Y{\text{ are not statistically independent}}}$ and ${\displaystyle {\text{H}}_{0}:X_{2}{\text{ and }}Y{\text{ are statistically independent}}}$ versus ${\displaystyle {\text{H}}_{1}:X_{2}{\text{ and }}Y{\text{ are not statistically independent}}.}$ We use the test statistic for the chi-square test of independence, ${\displaystyle V=\sum _{k=1}^{K}\sum _{j=1}^{J}{\frac {\left(H_{kj}-{\widehat {e}}_{kj}\right)^{2}}{{\widehat {e}}_{kj}}},}$ which, under ${\displaystyle {\text{H}}_{0}}$, has approximately a chi-square distribution with ${\displaystyle \left(K-1\right)\cdot \left(J-1\right)}$ degrees of freedom. The decision regions of the null hypothesis cannot be determined before the sample has been drawn and analyzed, because we have to follow a sequential approach:

• First, we estimate the expected joint absolute frequencies.
• On this basis we can check the approximation conditions and, if necessary, combine values or classes.
• Now we can determine the degrees of freedom and retrieve the critical values.

### Sampling and computing the test statistic; test decision

Tables ${\displaystyle 1}$ to ${\displaystyle 4}$ contain the joint absolute frequencies in the samples of the years ${\displaystyle 1991}$ and ${\displaystyle 1996}$ as well as the expected absolute joint frequencies for true null hypothesis, calculated as ${\displaystyle {\widehat {e}}_{kj}={\frac {h_{k\bullet }\cdot h_{\bullet j}}{n}},}$ and the differences ${\displaystyle h_{kj}-{\widehat {e}}_{kj}}$. Current economic situation (${\displaystyle X_{1}}$) versus region (${\displaystyle Y}$ ), 1991

 West East Very good observed ${\displaystyle 209}$ ${\displaystyle 165}$ ${\displaystyle 374}$ expected ${\displaystyle 184.8}$ ${\displaystyle 189.2}$ difference ${\displaystyle 24.2}$ ${\displaystyle -24.2}$ Good observed ${\displaystyle 744}$ ${\displaystyle 592}$ ${\displaystyle 1,336}$ expected ${\displaystyle 660.1}$ ${\displaystyle 675.9}$ difference ${\displaystyle 83.9}$ ${\displaystyle -83.9}$ Satisfactory observed ${\displaystyle 431}$ ${\displaystyle 647}$ ${\displaystyle 1,078}$ expected ${\displaystyle 532.6}$ ${\displaystyle 545.5}$ difference ${\displaystyle -101.6}$ ${\displaystyle 101.6}$ Fair observed ${\displaystyle 36}$ ${\displaystyle 39}$ ${\displaystyle 75}$ expected ${\displaystyle 37.1}$ ${\displaystyle 37.9}$ difference ${\displaystyle -1.1}$ ${\displaystyle 1.1}$ Poor observed ${\displaystyle 4}$ ${\displaystyle 15}$ ${\displaystyle 19}$ expected ${\displaystyle 9.4}$ ${\displaystyle 9.6}$ difference ${\displaystyle -5.4}$ ${\displaystyle 5.4}$ ${\displaystyle 1,424}$ ${\displaystyle 1,458}$ ${\displaystyle 2,882}$

Current economic situation (${\displaystyle X_{1}}$) versus region (${\displaystyle Y}$ ), 1996

 West East Very good observed ${\displaystyle 20}$ ${\displaystyle 6}$ ${\displaystyle 26}$ expected ${\displaystyle 17.2}$ ${\displaystyle 8.8}$ difference ${\displaystyle 2.8}$ ${\displaystyle -2.8}$ Good observed ${\displaystyle 264}$ ${\displaystyle 116}$ ${\displaystyle 380}$ expected ${\displaystyle 251.3}$ ${\displaystyle 128.7}$ difference ${\displaystyle 12.7}$ ${\displaystyle -12.7}$ Satisfactory observed ${\displaystyle 1,006}$ ${\displaystyle 557}$ ${\displaystyle 1,563}$ expected ${\displaystyle 1,033.7}$ ${\displaystyle 529.3}$ difference ${\displaystyle -27.7}$ ${\displaystyle 27.7}$ Fair observed ${\displaystyle 692}$ ${\displaystyle 335}$ ${\displaystyle 1,027}$ expected ${\displaystyle 679.2}$ ${\displaystyle 347.8}$ difference ${\displaystyle 12.8}$ ${\displaystyle -12.8}$ Poor observed ${\displaystyle 141}$ ${\displaystyle 73}$ ${\displaystyle 214}$ expected ${\displaystyle 141.5}$ ${\displaystyle 72.5}$ difference ${\displaystyle -0.5}$ ${\displaystyle 0.5}$ ${\displaystyle 2,123}$ ${\displaystyle 1,087}$ ${\displaystyle 3,210}$

Economic outlook (${\displaystyle X_{1}}$) versus region (${\displaystyle Y}$), 1991

 West East Significantly improved observed ${\displaystyle 75}$ ${\displaystyle 203}$ ${\displaystyle 278}$ expected ${\displaystyle 137.4}$ ${\displaystyle 140.6}$ difference ${\displaystyle -62.4}$ ${\displaystyle 62.4}$ Improved observed ${\displaystyle 449}$ ${\displaystyle 763}$ ${\displaystyle 1,212}$ expected ${\displaystyle 598.9}$ ${\displaystyle 613.1}$ difference ${\displaystyle -149.9}$ ${\displaystyle 149.9}$ Unchanged observed ${\displaystyle 684}$ ${\displaystyle 414}$ ${\displaystyle 1,108}$ expected ${\displaystyle 547.5}$ ${\displaystyle 560.5}$ difference ${\displaystyle 136.5}$ ${\displaystyle -136.5}$ Deteriorated observed ${\displaystyle 200}$ ${\displaystyle 62}$ ${\displaystyle 262}$ expected ${\displaystyle 129.5}$ ${\displaystyle 132.5}$ difference ${\displaystyle 70.5}$ ${\displaystyle -70.5}$ Significantly deteriorated observed ${\displaystyle 16}$ ${\displaystyle 6}$ ${\displaystyle 22}$ expected ${\displaystyle 10.9}$ ${\displaystyle 11.1}$ difference ${\displaystyle 5.1}$ ${\displaystyle -5.1}$ ${\displaystyle 1,424}$ ${\displaystyle 1,458}$ ${\displaystyle 2,882}$

Economic outlook (${\displaystyle X_{1}}$) versus region (${\displaystyle Y}$), 1996

 West East Significantly improved observed ${\displaystyle 9}$ ${\displaystyle 6}$ ${\displaystyle 15}$ expected ${\displaystyle 9.9}$ ${\displaystyle 5.1}$ difference ${\displaystyle -0.9}$ ${\displaystyle 0.9}$ Improved observed ${\displaystyle 190}$ ${\displaystyle 131}$ ${\displaystyle 321}$ expected ${\displaystyle 212.3}$ ${\displaystyle 108.7}$ difference ${\displaystyle -22.3}$ ${\displaystyle 22.3}$ Unchanged observed ${\displaystyle 809}$ ${\displaystyle 444}$ ${\displaystyle 1,253}$ expected ${\displaystyle 828.7}$ ${\displaystyle 42.3}$ difference ${\displaystyle -19.7}$ ${\displaystyle 19.7}$ Deteriorated observed ${\displaystyle 960}$ ${\displaystyle 426}$ ${\displaystyle 1,386}$ expected ${\displaystyle 916.7}$ ${\displaystyle 469.3}$ difference ${\displaystyle 43.3}$ ${\displaystyle -43.3}$ Significantly deteriorated observed ${\displaystyle 155}$ ${\displaystyle 80}$ ${\displaystyle 235}$ expected ${\displaystyle 155.4}$ ${\displaystyle 79.6}$ difference ${\displaystyle -0.4}$ ${\displaystyle 0.4}$ ${\displaystyle 2,123}$ ${\displaystyle 1,087}$ ${\displaystyle 3,210}$

The approximation conditions are fulfilled for all ${\displaystyle 4}$ tests to be conducted, i.e. ${\displaystyle {\widehat {e}}_{kj}\geq 5}$ for all pairs ${\displaystyle \left(k,j\right)}$. The critical value satisfying ${\displaystyle P\left(V\leq c\right)=0.95}$ is ${\displaystyle \chi _{1-\alpha ;\left(K-1\right)\cdot \left(J-1\right)}^{2}=\chi _{0.95;4}^{2}=9.49}$ as we have ${\displaystyle (K-1\cdot \left(J-1\right)=4}$ degrees of freedom. The decision regions are thus Rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v>9.49\right\}}$. Non-rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v\leq 9.49\right\}}$. Chi-square values and resulting decisions for the ${\displaystyle 4}$ tests are

Year Random variables Test statistic value ${\displaystyle v}$ Test decision
${\displaystyle 1991}$ ${\displaystyle X_{1}}$, ${\displaystyle Y}$ ${\displaystyle 71.85}$ Reject H${\displaystyle _{0}}$
${\displaystyle 1996}$ ${\displaystyle X_{1}}$, ${\displaystyle Y}$ ${\displaystyle 6.15}$ Do not-reject ${\displaystyle {\text{H}}_{0}}$
${\displaystyle 1991}$ ${\displaystyle X_{2}}$, ${\displaystyle Y}$ ${\displaystyle 278.17}$ Reject ${\displaystyle {\text{H}}_{0}}$
${\displaystyle 1991}$ ${\displaystyle X_{2}}$, ${\displaystyle Y}$ ${\displaystyle 14.61}$ Reject H${\displaystyle _{0}}$

### Interpretation

While the ${\displaystyle 1991}$ data rejects the null hypothesis,of statistical independence, at a significance level of ${\displaystyle 0.05}$ , the proposition that the random variables ${\displaystyle X_{1}:}$ ‘Current economic situation’ and ${\displaystyle Y:}$ ‘Survey region’ are statistically independent, is not-rejected for the ${\displaystyle 1996}$ data. But we can extract more qualitative information if we look at the contingency tables. As can be seen from the comparatively high positive differences ${\displaystyle h_{kj}-{\widehat {e}}_{kj}}$ for the positive statements in table ${\displaystyle 1}$, in ${\displaystyle 1991}$ West Germans tended to classify the economic situation more positively compared to the East Germans. In ${\displaystyle 1996}$, there are still positive differences ${\displaystyle h_{kj}-{\widehat {e}}_{kj}}$, but their sum isn’t significant anymore. Some kind of convergence in the assessment of the (then) current economic situation has taken place. Both surveys’ data reject the null hypothesis that the random variables ${\displaystyle X_{2}:}$ ‘Economic outlook’ and ${\displaystyle Y:}$ ‘Survey region’ are statistically independent at a significance level of ${\displaystyle \alpha =0.05}$. Observe that in both years the East Germans have been more positive about the future of the economy than the West Germans. If you compare the differences ${\displaystyle h_{kj}-{\widehat {e}}_{kj}}$ for both years, you will notice the same qualitative tendency to homogeneity in opinions across (East and West) Germany as in the assessment of the current economic environment. Yet quantitatively they are still large enough (in total) to be significant in ${\displaystyle 1996}$, and we cannot but conclude (at least within the assumed test parameter setting), that the East and West Germans have a structurally different opinions. The type of dependency between ${\displaystyle X_{2}}$ and ${\displaystyle Y}$ can be explored using suitable statistical tools for dependence analysis (e.g. categorical regression).

Someone suggests that the number of defects on a car is statistically independent from its age. We want to test this hypothesis at a significance level of ${\displaystyle \alpha =0.05}$ using the chi-square test of independence. The random variable ${\displaystyle X:}$ ‘number of defects’ is measured in the realization ${\displaystyle x_{1}:}$${\displaystyle 0}$ no defect’, ${\displaystyle x_{2}:}$${\displaystyle 1}$ defect’ and ${\displaystyle x_{3}:}$${\displaystyle 2}$ or more defects’; random variable ${\displaystyle Y:}$ ‘cars’ age’ is categorized as ${\displaystyle x_{1}:}$${\displaystyle \leq 1}$ year’, ${\displaystyle x_{2}:}$${\displaystyle >1}$ year and ${\displaystyle \leq 2}$ years’ and ${\displaystyle x_{2}:}$${\displaystyle >2}$ years’.

### Hypothesis

As the test statistic underlying the chi-square test of independence uses as inputs the expected joint frequencies, which are in turn calculated using the assumption of independence, the independence hypothesis must to be stated as null hypothesis: ${\displaystyle {\text{H}}_{0}:X{\text{ and }}Y{\text{ are statistically independent}}}$ versus ${\displaystyle {\text{H}}_{1}:p_{kj}=X{\text{ and }}Y{\text{ are not statistically independent}}}$ or ${\displaystyle {\text{H}}_{0}:p_{kj}=p_{k\bullet }\cdot p_{\bullet j}\forall \left(k,j\right)}$ versus ${\displaystyle {\text{H}}_{1}:p_{kj}\neq p_{k\bullet }\cdot p_{\bullet j}{\text{ for at least one pair }}\left(k,j\right).}$

### Test statistic and its distribution; decision regions

We use the test statistic of the chi-square test of independence: ${\displaystyle V=\sum _{k=1}^{K}\sum _{j=1}^{J}{\frac {\left(H_{kj}-{\widehat {e}}_{kj}\right)^{2}}{{\widehat {e}}_{kj}}}.}$ Under ${\displaystyle {\text{H}}_{0}}$, ${\displaystyle V}$ is approximately chi-square distributed with ${\displaystyle (K-1)\cdot \left(J-1\right)}$ degrees of freedom. The decision regions of the null hypothesis can only be determined after the sample has been drawn and analyzed:

• First, the expected joint absolute frequencies have to be estimated.
• Then the approximation conditions can (must) be checked and necessary combinations of classes (or values) can be established.
• Once the two above steps have been concluded, and not before, the degrees of freedom can be determined and the critical values looked up.

### Sampling and computing the test statistic

Police officers positioned at various locations randomly stop ${\displaystyle 110}$ cars and record age and number of defects. In the table below, the absolute joint and marginal frequencies in this sample are listed together with the expected frequencies under the null hypothesis, calculated as ${\displaystyle {\widehat {e}}_{kj}={\frac {h_{k\bullet }\cdot h_{\bullet j}}{n}}.}$

 ${\displaystyle <1}$ ${\displaystyle 1-2}$ ${\displaystyle >2}$ observed ${\displaystyle 30}$ ${\displaystyle 14}$ ${\displaystyle 5}$ ${\displaystyle 49}$ expected ${\displaystyle 26.7}$ ${\displaystyle 13.4}$ ${\displaystyle 8.9}$ observed ${\displaystyle 18}$ ${\displaystyle 10}$ ${\displaystyle 4}$ ${\displaystyle 32}$ expected ${\displaystyle 17.5}$ ${\displaystyle 8.7}$ ${\displaystyle 5.8}$ observed ${\displaystyle 12}$ ${\displaystyle 6}$ ${\displaystyle 11}$ ${\displaystyle 29}$ expected ${\displaystyle 15.8}$ ${\displaystyle 7.9}$ ${\displaystyle 5.3}$ ${\displaystyle 60}$ ${\displaystyle 30}$ ${\displaystyle 20}$ ${\displaystyle 110}$

The approximation conditions are fulfilled, as all expected absolute joint frequencies are equal to or greater than five: ${\displaystyle {\widehat {e}}_{kj}\geq 5}$. We are observing ${\displaystyle X}$ and ${\displaystyle Y}$ in ${\displaystyle K=3}$ respectively ${\displaystyle J=3}$ classes an thus have ${\displaystyle (K-1)\cdot \left(J-1\right)=4}$ degrees of freedom. The satisfying ${\displaystyle P\left(V\leq c\right)=1-\alpha =0.95}$ is looked up in the table of the chi-square distribution as ${\displaystyle c=\chi _{1-\alpha ;(K-1)\cdot \left(J-1\right)}^{2}=\chi _{0.95;4}^{2}=9.49}$, implying the following decision regions Rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v>9.49\right\}}$. Non-rejection region for H${\displaystyle _{0}}$:${\displaystyle \left\{v\,|\,v\leq 9.49\right\}}$. The realized test statistic value is ${\displaystyle v={\frac {\left(30-26.7\right)^{2}}{26.7}}+{\frac {\left(14-13.4\right)^{2}}{13.4}}+\ldots +{\frac {\left(11-5.3\right)^{2}}{5.3}}=10.5.}$

### Test decision and interpretation

Since the test statistic value ${\displaystyle v=10.5}$ falls into the rejection region the null hypothesis is rejected. Given our test parameters (sample size ${\displaystyle n=110}$ and significance level ${\displaystyle \alpha =0.05}$), we could verify the random variables ${\displaystyle X:}$ ‘number of defects’ and ${\displaystyle Y:}$ ‘cars’ age’ to be statistically dependent. If this is not true in the population, we have made a type I error (${\displaystyle '{\text{H}}_{1}^{'}|{\text{H}}_{0}}$). In repeated samples (tests) the probability of this happening is given by the ${\displaystyle \alpha =0.05}$ The principle underlying the independence tests resembles that of the parametric tests. A test statistic is constructed to summarize (consolidate) the distance of the relevant information about the theoretical distribution under the null hypothesis from the corresponding structure in the sample (i.e. measure the distance between the two distributions). The distribution of the test statistic has to be determined—either exactly or approximately. The is being tested, and the decision can result in a type I error with probability ${\displaystyle P\left('{\text{H}}_{1}^{'}|{\text{H}}_{0}\right)=\alpha }$, if the null hypothesis has been rejected or in a type II error, if it has been not-rejected (respectively not rejected) with probability ${\displaystyle P\left('{\text{H}}_{0}^{'}|{\text{H}}_{1}\right)=\beta }$ . The error I probability is controlled by setting the , but the type II error probability cannot be calculated, as there are infinitely many probability models different from that claimed to be the true one in the null hypothesis. For this reason one will try to reject the null hypothesis and thus back a possible rejection by a known maximum probability of making a wrong decision (in repeated samples).

### Hypothesis

If the random variables are independent in the population, we expect this to be reflected in the sample. But a sample cannot convey all the information imbedded in the population, and we have to account for random variation introduced by the sampling process. If the hypothesis is true, we expect it to be reflected accurately on statistical average only and have to determine what the expected deviation of the sample characteristics from the hypothetical ones arising from sampling noise is. Deviations of the observed joint absolute frequencies from those implied by independence, ${\displaystyle {\widehat {e}}_{kj}}$, will occur with probability one. The task is to quantify them relative to the expected variation—an excess disagreement, i.e. a significant deviation, leading to a rejection of the null hypothesis. As it is always the null hypothesis that is being tested, the independence of ${\displaystyle X}$ and ${\displaystyle Y}$ has to be proposed as null hypothesis. Only this way the expected absolute frequencies can be calculated; after all we need some probability model that allows us to derive the distribution of the test statistic and thus assess its intrinsic variation. Large deviations of the observed joint absolute frequencies ${\displaystyle h_{kj}}$ from those expected if ${\displaystyle X}$ and ${\displaystyle Y}$ are independent, ${\displaystyle e_{kj}}$, contradict the independence assumption and thus increase the likelihood of rejecting the null hypothesis (everything else equal). The test statistic underlying the chi-square test of independence is calculated using observed frequencies and the theoretical probabilities ${\displaystyle p_{kj}}$, ${\displaystyle p_{k\bullet }}$, and ${\displaystyle p_{\bullet j}}$ (${\displaystyle k=1,\ldots ,K\,;j=1,\ldots J}$ ). If ${\displaystyle X}$ and ${\displaystyle Y}$ are discrete random variables, the joint probabilities are related to exactly one pair of realizations: ${\displaystyle p_{kj}=P\left(\left\{X=x_{k}\right\}\cap \left\{Y=y_{j}\right\}\right),\quad p_{k\bullet }=P\left(\left\{X=x_{k}\right\}\right),\quad p_{\bullet j}=P\left(\left\{Y=y_{j}\right\}\right).}$ Continuous random variables assume specific values with probability zero. Thus, the sample space has to be partitioned into exhaustive disjoint intervals. In the continuous case the probabilities are defined as follows: ${\displaystyle p_{kj}}$ is the probability of ${\displaystyle X}$ assuming a value belonging to the class ${\displaystyle \left(x_{k-1}^{*},x_{k}^{*}\right)}$ and ${\displaystyle Y}$ assuming a value from the class ${\displaystyle \left(y_{j-1}^{*},y_{j}^{*}\right)}$, ${\displaystyle p_{k\bullet }}$ is the probability of ${\displaystyle X}$ being observed in ${\displaystyle k}$th class ${\displaystyle \left(x_{k-1}^{*},x_{k}^{*}\right)}$ (marginal probabilities of ${\displaystyle X}$), ${\displaystyle p_{\bullet j}}$ is the probability that ${\displaystyle Y}$ takes values from the ${\displaystyle j}$th class ${\displaystyle \left(y_{j-1}^{*},y_{j}^{*}\right)}$ (marginal probabilities of ${\displaystyle Y}$). Formally: {\displaystyle {\begin{aligned}p_{kj}&=P\left(\left\{x_{k-1}^{*} To simplify and unify exposition for discrete and continuous random variables, ${\displaystyle x_{k},\,\left(k=1,\ldots K\right)}$ and ${\displaystyle y_{j},\,\left(j=1,\ldots J\right)}$ are taken to be values representative for the classes in the continuous case (e.g. midpoints). ${\displaystyle K}$ and ${\displaystyle J}$ denote the number of classes constructed for ${\displaystyle X}$ and ${\displaystyle Y}$. Note that it may prove necessary to group observations from discrete variables into classes—if only to improve approximation accuracy (for the price of a less detailed probability model).

### Test statistic

We want to illustrate why the joint absolute frequencies ${\displaystyle H_{kj}}$ are random variables. Our argumentation is valid both for discrete and continuous variables. Suppose we sample one statistical element from the population with respect to the random variables ${\displaystyle X}$ and ${\displaystyle Y}$ and check whether the observation pair equals ${\displaystyle \left(x_{k},y_{j}\right)}$, i.e. whether the event ${\displaystyle \left\{X=x_{k}\right\}\cap \left\{Y=y_{j}\right\}}$ has been realized. There are only two possible outcomes to this random experiment. The probability of the event ${\displaystyle \left\{X=x_{k}\right\}\cap \left\{Y=y_{j}\right\}}$ happening is ${\displaystyle p_{kj}}$, and the probability of one single element not being observed in this particular pair of ${\displaystyle X}$ and ${\displaystyle Y}$ realization is ${\displaystyle 1-p_{kj}}$. If we draw a sample of ${\displaystyle n}$ independent pairs of observations, we repeat this random experiment ${\displaystyle n}$ times under the same conditions and thus with constant ${\displaystyle p_{kj}}$. In other words, we are carrying out a Bernoulli experiment with ${\displaystyle n}$ replications. In doing so, we are interested in the total number of occurrences of the event ${\displaystyle \left\{X=x_{k}\right\}\cap \left\{Y=y_{j}\right\}}$, i.e. the absolute frequency of the value pair ${\displaystyle \left(x_{k},y_{j}\right)}$ in the sample. This frequency is the outcome of a Bernoulli experiment and thus varies across samples. Thus, ${\displaystyle H_{kj}:}$ ‘Number of occurrences of ${\displaystyle \left\{X=x_{k}\right\}\cap \left\{Y=y_{j}\right\}}$ in a simple random sample of size ${\displaystyle n}$’ is a discrete random variable with possible outcomes ${\displaystyle 0,1,\ldots ,n}$. The random variable ${\displaystyle H_{kj}}$ has Binomial distribution with parameters ${\displaystyle n}$ and ${\displaystyle p_{kj}}$: ${\displaystyle H_{kj}\thicksim B\left(n;p_{kj}\right)}$. Expectation for H${\displaystyle _{kj}}$ is given by ${\displaystyle E\left(H_{kj}\right)=np_{kj}}$. If the null hypothesis is true and thus ${\displaystyle X}$ and ${\displaystyle Y}$ are statistically independent, the joint probability ${\displaystyle p_{kj}}$ is calculated according to the multiplication rule for independent events as the product of the marginal probabilities ${\displaystyle p_{k\bullet }}$ and ${\displaystyle p_{\bullet j}}$: ${\displaystyle p_{kj}=p_{k\bullet }\cdot p_{\bullet j}}$ . The expected joint absolute frequencies are then given by ${\displaystyle e_{kj}=n\cdot P_{kj}=n\cdot p_{k\bullet }\cdot p_{\bullet j}}$. This result applies to all ${\displaystyle k=1,\ldots ,K}$ and ${\displaystyle j=1,\ldots J}$. The test statistic is based on a comparison of the joint absolute frequencies encountered in the sample with those to be expected given the null hypothesis is true. The probabilities underlying the expected frequencies are unknown and have to be estimated from the sample. The comparison is based on the differences ${\displaystyle H_{kj}-{\widehat {e}}_{kj}}$ as distance measures. To prevent negative differences from offsetting positive ones (or vice versa), the difference is squared: ${\displaystyle \left(H_{kj}-{\widehat {e}}_{kj}\right)^{2}}$. To account for varying importance of these squared deviations, they are weighted by dividing by ${\displaystyle {\widehat {e}}_{kj}}$: A difference of ${\displaystyle h_{kj}-{\widehat {e}}_{kj}=5}$ receives a higher weighting if ${\displaystyle {\widehat {e}}_{kj}=10}$ than if ${\displaystyle {\widehat {e}}_{kj}=100}$. Summing over all pairs ${\displaystyle \left(k,j\right)}$ summarizes (condenses) all weighted squared deviations into one test statistic: ${\displaystyle V=\sum _{k=1}^{K}\sum _{j=1}^{J}{\frac {\left(H_{kj}-{\widehat {e}}_{kj}\right)^{2}}{{\widehat {e}}_{kj}}}.}$ As the ${\displaystyle H_{kj}}$ are random variables, so is ${\displaystyle V}$. Under the null hypothesis, for sufficiently large sample size ${\displaystyle n}$ and validity of the approximation conditions, ${\displaystyle V}$ is approximately chi-square distributed with ${\displaystyle (K-1)\cdot \left(J-1\right)}$ degrees of freedom. If the approximation requirements aren’t fulfilled, bordering classes or values have to be combined in a suitable way. The outcomes of discretely measured random experiments are then being grouped into classes. ${\displaystyle K}$ and ${\displaystyle J}$ are the numbers of classes remaining after such a necessary re-grouping. Determining the degrees of freedom: There is a total of ${\displaystyle K\cdot J}$ probabilities ${\displaystyle p_{kj}}$ constituting the bivariate distribution of the random variables ${\displaystyle X}$ and ${\displaystyle Y}$ as categorized in the two-dimensional contingency table. We lose one degree of freedom because the probabilities aren’t independent of each other: From ${\displaystyle \sum _{k}\sum _{j}p_{kj}=1}$ follows that any probability ${\displaystyle p_{kj}}$ is determined by the other ${\displaystyle K\cdot J-1}$ joint probabilities. If we could derive all probabilities joint probabilities from both variables’ marginal distributions (probabilities) applying ${\displaystyle p_{kj}=p_{k\bullet }\cdot p_{\bullet j}}$, we had thus ${\displaystyle K\cdot J-1}$ degrees of freedom. Unfortunately the marginal probabilities ${\displaystyle p_{k\bullet }}$ and ${\displaystyle p_{\bullet j}}$ are unknown and have to be estimated from the data, further reducing the degrees of freedom. The marginal distribution of ${\displaystyle X}$ encompasses ${\displaystyle K}$ probabilities ${\displaystyle p_{k\bullet }}$, of which only ${\displaystyle K-1}$ have to be estimated because ${\displaystyle \sum _{k}p_{k\bullet }=1}$. The same applies to the marginal distribution of ${\displaystyle Y}$: As ${\displaystyle \sum _{j}p_{\bullet j}=1}$, only ${\displaystyle J-1}$ marginal probabilities ${\displaystyle p_{\bullet j}}$ have to be estimated. Thus, a total of ${\displaystyle \left(K-1\right)+\left(J-1\right)}$ marginal probabilities has to be estimated, and the overall degrees of freedom are: ${\displaystyle K\cdot J-1-\left[\left(K-1\right)+\left(J-1\right)\right]=K\cdot J-K-J+1=\left(K-1\right)\cdot \left(J-1\right).}$ As ${\displaystyle \left(H_{kj}-{\widehat {e}}_{kj}\right)^{2}/{\widehat {e}}_{kj}}$ is positive for all pairs ${\displaystyle \left(k,j\right)}$, the test statistic ${\displaystyle V}$ will always be positive. Large deviations ${\displaystyle H_{kj}-{\widehat {e}}_{kj}}$ translate into a high test statistic value. The null hypothesis is thus rejected for high values of ${\displaystyle V}$. Hence, the chi-square test of independence is a right-sided test.