Marginal and Conditional Distributions

From MM*Stat International

Jump to: navigation, search
English
Português
Français
‎Español
Italiano
Nederlands


Marginal distribution

Suppose one is given a two dimensional frequency distribution of the variables and . The marginal distribution of (respectively ) is the one dimensional distribution of variable (respectively ), in which we do not consider what happens to variable (respectively ). The Marginal distribution is the result of ”adding up” the frequencies of the realizations. For example for the marginal (absolute) distribution of  :

Marginal distribution of
Variable
Marginal distribution of

Marginal absolute distribution of variable with the values : Marginal absolute distribution of variable with the values : Total number of observations equals : The marginal relative distribution is defined similarly using the relative frequencies (). (Note: This may be accomplished simply by dividing all of the absolute frequencies in the marginal absolute distribution table by n.) Marginal relative distribution of variable with the values : Marginal relative distribution of variable with the values : Total of all relative frequencies equals :

Conditional distribution

Suppose one is given a two dimensional frequency distribution of two variables and . The frequency distribution of given a particular value of  is called the conditional distribution or conditional distribution of given . (The conditional distribution of given is defined similarly.) Conditional relative frequency distribution of for a given :   Conditional relative frequency distribution of for a given : Like marginal distributions, conditional distributions are one dimensional distributions. example: The starting point is the following 53 contingency table of the variables: - occupation - athletic activity which have been observed for employed persons.

occupation MD
rarely sometimes regularly
worker 240 120 70 430
salaried 160 90 90 340
civil servant 30 30 30 90
farmer 37 7 6 50
self employed 40 32 18 90
MD 507 279 214 1000

Conditional distribution of the variable (athletic activity) for a given (occupational group):

occupation MD
rarely sometimes regularly
worker 0.56 0.28 0.16 1.00
salaried 0.47 0.26 0.26 1.00
civil servant 0.33 0.33 0.33 1.00
farmer 0.74 0.14 0.12 1.00
self employed 0.44 0.36 0.20 1.00

For randomly selected persons it has been determined whether they smoke and whether they have had lung cancer. The variables are X - Smoker with realizations = yes and = no Y - Lung cancer with realizations = yes and = no The two-dimensional frequency distribution is a 22 contingency table

MD
yes () no()
smoking yes () 10 15 25
smoking no () 5 70 75
MD 15 85 100

The conditional distributions of the variable (smoker) for a given (lung cancer) are shown in the following table:

yes no
smoker yes 0.667 0.176
smoker no 0.333 0.824
1.000 1.000

Each element of the conditional distribution has been calculated as the ratio of the respective cell of the joint distribution and the corresponding element of the  marginal distribution. From the table we learn that 66.7% of all persons diagnosed with lung cancer are smokers. 82.4% of the persons not diagnosed with lung cancer are non-smokers. The conditional distribution of the variable (lung cancer), for a given value (smoker/non-smoker) is constructed analogously:

yes no
smoker yes 0.400 0.600 1.000
smoker no 0.067 0.933 1.000

Hence, 40% of all smokers but only 6.7% of all non-smokers have been diagnosed with lung cancer. In a survey of 941 persons, respondents’ age (grouped as 18-29, 30-39 and 40-49) and the highest level of education attained (university, high school, middle school, lower school) were recorded. The observed frequencies are shown in the following contingency table:

university high school middle school lower school MD (age)
18–29 38 93 134 42 307
30–39 23 94 168 70 355
40–49 12 39 129 99 279
MD (education) 73 226 431 211 941

The conditional distributions of educational attainment, given age, are summarized in the following table:

university high school middle school lower school
18–29 0.124 0.303 0.436 0.137 1.000
30–39 0.065 0.265 0.473 0.197 1.000
40–49 0.043 0.140 0.462 0.355 1.000

Each element of the distribution has been calculated as the ratio of the respective cell of the joint distribution and the corresponding element of the marginal distribution of age. The table shows that among the 18-29 year-olds 12.4% have completed a university education, 30.3% graduated from high school and 43.6% finished middle school. In the group of 40-49 year-olds the fraction of persons with a university degree is only 4.3%. The conditional distribution of age, for a given level of educational attainment, is constructed analogously:

university high school middle school lower secondary
18–29 0.521 0.411 0.311 0.199
30–39 0.315 0.416 0.390 0.332
40–49 0.164 0.173 0.299 0.469
1.000 1.000 1.000 1.000

It can be seen that among those with at most a high school education, 41.1% belong to the age group 18-29, 41.6% to the age group 30-39 and 17.3% to the age group 40-49.

En folnode4 c k 1.gif

In a survey of 107 students their major and gender were recorded. The responses were used to produce the following 92 contingency table:

$1

male MD (major)
social sc. 12 13 25
engineering 1 1 2
law 8 13 21
medicine 6 4 10
natural sc. 1 8 9
psychology 3 8 11
other 1 0 1
theology 7 2 9
business 5 14 19
MD (gender) 44 63 107

What are the shares of females and males in each major? The answer is given by the conditional distributions of gender, given the major. The frequencies of the conditional distribution are computed as the ratio of the corresponding cells of the joint distribution table and the marginal distribution (i.e. row sum in this case) of the respective major.

female male MD (major)
social sc. 0.480 0.520 1.000
engineering 0.500 0.500 1.000
law 0.381 0.619 1.000
medicine 0.600 0.400 1.000
natural sc. 0.111 0.889 1.000
psychology 0.273 0.727 1.000
other 1.000 0.000 1.000
theology 0.778 0.222 1.000
business 0,263 0,737 1,000
total 0,411 0,589 1,000

The results show that business is dominated by males who account for 73.7% of all students majoring in business. In theology, on the other hand, women are the majority comprising 77.8% of theology majors.