# Estimation Theory

 English Português Français ‎Español Italiano Nederlands

Assume a given population with distribution function ${\displaystyle F(x).}$In general, the distribution and its characteristics or parameters are not known.  Suppose we are interested in say the expectation ${\displaystyle \mu }$ and the variance ${\displaystyle \sigma ^{2}.}$(Alternatively, if the data are binary, we may be interested in the population proportion ${\displaystyle \pi }$).  As outlined previously, we can learn about the population or equivalently its distribution function ${\displaystyle F}$, through (random) sampling.  The data may then be used to infer properties of the population, hence the term indirect inference.  At the outset, it is important to emphasize that the conclusions drawn may be incorrect, particularly if the sample is small, or not representative of the underlying population.  The tools of probability may be used to provide measures of the accuracy or correctness of the estimates or conclusions. We will focus on estimation unknown parameters or characteristics. Assume ${\displaystyle \theta }$ to be the object of interest, then we differentiate two types of procedures: point estimation and interval estimation.

Point Estimation

The determination of a single estimate using a random sample is referred to as point estimation. It is desirable that the estimate provide the best possible approximation to the unknown parameter..

The Estimator or Estimating Function

We will be drawing ${\displaystyle n}$ independent observations from the population.  In that case  ${\displaystyle X_{i},i=1,..,n}$ are i.i.d. random variables. The estimator is defined to be a function ${\displaystyle g}$ of the ${\displaystyle X_{i}}$. We will write ${\displaystyle {\widehat {\theta }}=g(\bullet )\,.}$ to be the estimator of ${\displaystyle \theta }$ in which case it is a random variable.  The symbol will also represent a specific estimate for a given data-set.  It should be clear from the context which applies. A point estimate thus depends on the sample size ${\displaystyle n}$ and the realizations that have been drawn. The point estimate will rarely correspond to the true value of the unknown parameter. Indeed, repeated sampling will generally yield different estimates.  If the sample size is large, we would expect these to be close to the true parameter value. A crucial problem of point estimations is the selection of the best estimator. In some cases, the population parameter or characteristic of interest has a natural sample analogue.  For example, one typically uses the sample mean to estimate the population mean, the sample proportion to estimate the population proportion and the sample variance to estimate the population variance. (See e.g., the discussion in Section 7.1.) Assume a given population of ${\displaystyle N=2000}$ persons for which we observe two variables ${\displaystyle X_{1}}$ = age (in years) and ${\displaystyle X_{2}}$ = net income (in DM). The expectation and the variance of both variables are unknown. You may draw random samples without replacement from this population. In the first window, specify

• the sample size ${\displaystyle n}$ and
• variable (age or net income).

To estimate ${\displaystyle E(X_{1})=\mu _{1}}$ resp. ${\displaystyle E(X_{2})=\mu _{2}}$  the following estimator is used: ${\displaystyle {\bar {x}}={\frac {1}{n}}\sum \limits _{i=1}^{n}x_{i}}$ To estimate ${\displaystyle Var(X_{1})=\sigma _{1}^{2}}$ resp. ${\displaystyle Var(X_{2})=\sigma _{2}^{2}}$ the following estimator is used: ${\displaystyle s^{2}={\frac {1}{n-1}}\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}$ As output you receive ${\displaystyle {\bar {x}}}$ and ${\displaystyle s^{2}}$ as point estimates of ${\displaystyle \mu }$ and ${\displaystyle \sigma ^{2}}$. By repeated sample drawings you can observe variation in the point estimates.

Given a supposed population of ${\displaystyle N=2000}$ households let the random variable ${\displaystyle X}$ be household net income (in DM). The mean net income of this population, i.e. the expectation ${\displaystyle E(X)=\mu }$ is unknown and the subject of our estimation. The sample mean ${\displaystyle {\bar {x}}={\frac {1}{n}}\sum \limits _{i=1}^{n}x_{i}\,.}$ is used. A random sample of size ${\displaystyle n}$ yields the sample values ${\displaystyle x_{1},\dots ,x_{n}}$.

1. Random samples of size ${\displaystyle n=20}$  A random sample of ${\displaystyle n=20}$ private households yields the following values:

Table 1: Data on household net income

${\displaystyle i}$ Households Net Income (DM) ${\displaystyle x_{i}}$ ${\displaystyle i}$ Households Net Income (DM) ${\displaystyle x_{i}}$
1 800 11 2500
2 1200 12 2500
3 1400 13 2500
4 1500 14 2700
5 1500 15 2850
6 1500 16 3300
7 1800 17 3650
8 1800 18 3700
9 2300 19 4100
10 2400 20 4300

From the data we obtain ${\displaystyle {\bar {x}}=48300/20=2415}$ DM. As can be easily seen, the calculation is identical with the arithmetic mean, a measure which we already used in descriptive statistics.   An important objective of inductive statistics is to provide a measure of the accuracy of this result as an estimate of the underlying population mean. .

To illustrate the point, we obtain 24 further random samples of the size ${\displaystyle n=20.}$Table 2 tabulates the sample means for the 25 samples.

Table 2: Mean household net income (DM)

Sample ${\displaystyle {\bar {x}}}$ Sample ${\displaystyle {\bar {x}}}$ Sample ${\displaystyle {\bar {x}}}$
1 1884.90 10 2241.15 18 2395.25
2 1915.30 11 2243.15 19 2413.40
3 2060.90 12 2267.75 20 2415.00
4 2062.15 13 2298.80 21 2567.50
5 2110.30 14 2317.00 22 2607.25
6 2126.50 15 2319.55 23 2635.00
7 2163.10 16 2361.25 24 2659.00
8 2168.50 17 2363.50 25 2774.30
9 2203.85

In Table 2 the samples are reordered so that the sample means are in increasing order.  Evidently, there is considerable variation in the sample means, which illustrates the random character of estimation, in particular that the estimator ${\displaystyle {\bar {x}}}$ is a random variable.

Consequently point estimates need to be supplemented with a measure of their precision (e.g. by giving the standard deviation of the estimator).

The following graph displays the estimated values ${\displaystyle {\bar {x}}}$ of the 25 samples. In order to depict the deviation of the estimated values from the true mean of the population, the actual value ${\displaystyle \mu }$ is illustrated as a dashed line.

Fig. 1: Estimated Values ${\displaystyle {\bar {x}}}$ from 25 random samples of size ${\displaystyle n=20}$

2. Random Samples of size ${\displaystyle n=100}$From the same population 100 random samples of the size ${\displaystyle n=100}$ were drawn and mean household net incomes were calculated. The results are provided in the following graph. The actual value ${\displaystyle \mu }$ appears as a dashed line.

Fig. 2: Estimated Values ${\displaystyle {\bar {x}}}$ from 100 random samples of size ${\displaystyle n=100}$