Marginal and Conditional Distributions
From MM*Stat International
English |
Português |
Français |
Español |
Italiano |
Nederlands |
Marginal distribution
Suppose one is given a two dimensional frequency distribution of the variables and . The marginal distribution of (respectively ) is the one dimensional distribution of variable (respectively ), in which we do not consider what happens to variable (respectively ). The Marginal distribution is the result of ”adding up” the frequencies of the realizations. For example for the marginal (absolute) distribution of :
Marginal distribution of | ||||
Variable | ||||
Marginal distribution of |
Marginal absolute distribution of variable with the values : Marginal absolute distribution of variable with the values : Total number of observations equals : The marginal relative distribution is defined similarly using the relative frequencies (). (Note: This may be accomplished simply by dividing all of the absolute frequencies in the marginal absolute distribution table by n.) Marginal relative distribution of variable with the values : Marginal relative distribution of variable with the values : Total of all relative frequencies equals :
Conditional distribution
Suppose one is given a two dimensional frequency distribution of two variables and . The frequency distribution of given a particular value of is called the conditional distribution or conditional distribution of given . (The conditional distribution of given is defined similarly.) Conditional relative frequency distribution of for a given : Conditional relative frequency distribution of for a given : Like marginal distributions, conditional distributions are one dimensional distributions. example: The starting point is the following 53 contingency table of the variables: - occupation - athletic activity which have been observed for employed persons.
occupation | MD | |||
---|---|---|---|---|
rarely | sometimes | regularly | ||
worker | 240 | 120 | 70 | 430 |
salaried | 160 | 90 | 90 | 340 |
civil servant | 30 | 30 | 30 | 90 |
farmer | 37 | 7 | 6 | 50 |
self employed | 40 | 32 | 18 | 90 |
MD | 507 | 279 | 214 | 1000 |
Conditional distribution of the variable (athletic activity) for a given (occupational group):
occupation | MD | |||
---|---|---|---|---|
rarely | sometimes | regularly | ||
worker | 0.56 | 0.28 | 0.16 | 1.00 |
salaried | 0.47 | 0.26 | 0.26 | 1.00 |
civil servant | 0.33 | 0.33 | 0.33 | 1.00 |
farmer | 0.74 | 0.14 | 0.12 | 1.00 |
self employed | 0.44 | 0.36 | 0.20 | 1.00 |
For randomly selected persons it has been determined whether they smoke and whether they have had lung cancer. The variables are X - Smoker with realizations = yes and = no Y - Lung cancer with realizations = yes and = no The two-dimensional frequency distribution is a 22 contingency table
MD | |||
yes () | no() | ||
smoking yes () | 10 | 15 | 25 |
smoking no () | 5 | 70 | 75 |
MD | 15 | 85 | 100 |
The conditional distributions of the variable (smoker) for a given (lung cancer) are shown in the following table:
yes | no | |
smoker yes | 0.667 | 0.176 |
smoker no | 0.333 | 0.824 |
1.000 | 1.000 |
Each element of the conditional distribution has been calculated as the ratio of the respective cell of the joint distribution and the corresponding element of the marginal distribution. From the table we learn that 66.7% of all persons diagnosed with lung cancer are smokers. 82.4% of the persons not diagnosed with lung cancer are non-smokers. The conditional distribution of the variable (lung cancer), for a given value (smoker/non-smoker) is constructed analogously:
yes | no | ||
smoker yes | 0.400 | 0.600 | 1.000 |
smoker no | 0.067 | 0.933 | 1.000 |
Hence, 40% of all smokers but only 6.7% of all non-smokers have been diagnosed with lung cancer. In a survey of 941 persons, respondents’ age (grouped as 18-29, 30-39 and 40-49) and the highest level of education attained (university, high school, middle school, lower school) were recorded. The observed frequencies are shown in the following contingency table:
university | high school | middle school | lower school | MD (age) | |
---|---|---|---|---|---|
18–29 | 38 | 93 | 134 | 42 | 307 |
30–39 | 23 | 94 | 168 | 70 | 355 |
40–49 | 12 | 39 | 129 | 99 | 279 |
MD (education) | 73 | 226 | 431 | 211 | 941 |
The conditional distributions of educational attainment, given age, are summarized in the following table:
university | high school | middle school | lower school | ||
---|---|---|---|---|---|
18–29 | 0.124 | 0.303 | 0.436 | 0.137 | 1.000 |
30–39 | 0.065 | 0.265 | 0.473 | 0.197 | 1.000 |
40–49 | 0.043 | 0.140 | 0.462 | 0.355 | 1.000 |
Each element of the distribution has been calculated as the ratio of the respective cell of the joint distribution and the corresponding element of the marginal distribution of age. The table shows that among the 18-29 year-olds 12.4% have completed a university education, 30.3% graduated from high school and 43.6% finished middle school. In the group of 40-49 year-olds the fraction of persons with a university degree is only 4.3%. The conditional distribution of age, for a given level of educational attainment, is constructed analogously:
university | high school | middle school | lower secondary | |
---|---|---|---|---|
18–29 | 0.521 | 0.411 | 0.311 | 0.199 |
30–39 | 0.315 | 0.416 | 0.390 | 0.332 |
40–49 | 0.164 | 0.173 | 0.299 | 0.469 |
1.000 | 1.000 | 1.000 | 1.000 |
It can be seen that among those with at most a high school education, 41.1% belong to the age group 18-29, 41.6% to the age group 30-39 and 17.3% to the age group 40-49.
In a survey of 107 students their major and gender were recorded. The responses were used to produce the following 92 contingency table:
$1 |
male | MD (major) | |
---|---|---|---|
social sc. | 12 | 13 | 25 |
engineering | 1 | 1 | 2 |
law | 8 | 13 | 21 |
medicine | 6 | 4 | 10 |
natural sc. | 1 | 8 | 9 |
psychology | 3 | 8 | 11 |
other | 1 | 0 | 1 |
theology | 7 | 2 | 9 |
business | 5 | 14 | 19 |
MD (gender) | 44 | 63 | 107 |
What are the shares of females and males in each major? The answer is given by the conditional distributions of gender, given the major. The frequencies of the conditional distribution are computed as the ratio of the corresponding cells of the joint distribution table and the marginal distribution (i.e. row sum in this case) of the respective major.
female | male | MD (major) | |
---|---|---|---|
social sc. | 0.480 | 0.520 | 1.000 |
engineering | 0.500 | 0.500 | 1.000 |
law | 0.381 | 0.619 | 1.000 |
medicine | 0.600 | 0.400 | 1.000 |
natural sc. | 0.111 | 0.889 | 1.000 |
psychology | 0.273 | 0.727 | 1.000 |
other | 1.000 | 0.000 | 1.000 |
theology | 0.778 | 0.222 | 1.000 |
business | 0,263 | 0,737 | 1,000 |
total | 0,411 | 0,589 | 1,000 |
The results show that business is dominated by males who account for 73.7% of all students majoring in business. In theology, on the other hand, women are the majority comprising 77.8% of theology majors.