# Properties of Estimators

 English Português Français ‎Español Italiano Nederlands

When estimating a specific parameter or characteristic of a population, several possible estimators ${\displaystyle {\hat {\theta }}}$ exist. Example 1:Suppose that the underlying population distribution is symmetric. In this case the population expectation equals the population’s median. Thus the unknown expectation can be estimated using either the sample mean or the sample median. In general, the two estimators will provide different estimates. Which estimator should be used? Example 2: To estimate the variance ${\displaystyle \sigma ^{2}}$ we may use either of the following: ${\displaystyle s^{2}={\frac {1}{n-1}}\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}$ ${\displaystyle MSD={\frac {1}{n}}\sum \limits _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}$ Which estimator should be used? Example 3: Suppose that the underlying population distribution is Poisson. For the Poisson distribution ${\displaystyle E(X)=Var(X)=\lambda }$ . Therefore the unknown parameter ${\displaystyle \lambda }$ could be estimated using the sample mean or the sample variance. Again in this case the two estimators will in general, yield different estimates. In order to obtain an objective comparison,  we need to examine the properties of the estimators.

Mean Squared Error

A general measure of the accuracy of an estimator is the Mean Squared Deviation, or Mean Squared Error (MSE). The MSE measures the average squared distance between the estimator ${\displaystyle {\hat {\theta }}}$ and the true parameter ${\displaystyle \theta }$: ${\displaystyle MSE=E[({\hat {\theta }}-\theta )^{2}]\,.}$ It is straightforward to show that the MSE can be separated into two components: ${\displaystyle MSE=E[({\hat {\theta }}-\theta )^{2}]=E[({\hat {\theta }}-E({\hat {\theta }}))^{2}]+[E({\hat {\theta }})-\theta ]^{2}\,.}$ The first term on the right side is the variance of ${\displaystyle {\hat {\theta }}}$: ${\displaystyle E[({\hat {\theta }}-E({\hat {\theta }}))^{2}]=Var({\hat {\theta }})\,,}$ The second term is the square of the bias ${\displaystyle E({\hat {\theta }})-}$ ${\displaystyle \theta }$ . Hence the MSE is the sum of the variance and the squared bias of the estimator: ${\displaystyle MSE=Var({\hat {\theta }})+[bias\,({\hat {\theta }})]^{2}\,.}$ If several estimators are available for an unknown parameter of the population, one would thus select that one with the smallest MSE. . Starting with the MSE three important properties of estimators are described, which should facilitate the search for the ”best” estimator.

Unbiasedness

An estimator ${\displaystyle {\hat {\theta }}}$ of the unknown parameter ${\displaystyle \theta }$ is unbiased, if the expectation of the estimator matches the true parameter value: ${\displaystyle E({\hat {\theta }})=\theta \,.}$ That is the mean of the sampling distribution of ${\displaystyle {\hat {\theta }}}$ equals the true parameter value ${\displaystyle \theta .}$. For an unbiased estimator the MSE equals the variance of the estimator: ${\displaystyle MSE=Var({\hat {\theta }}).}$ Thus the variance of the estimator provides a good measure of the precision of the estimator. If the estimator is biased, then the expectation of the estimator is different from the true parameter value. That is, ${\displaystyle bias({\hat {\theta }})=E({\hat {\theta }})-\theta \neq 0\,.}$ An estimator ${\displaystyle {\hat {\theta }}}$ is called asymptotically unbiased, if ${\displaystyle \lim _{n\rightarrow \infty }E({\hat {\theta }})=\theta \,,}$ ,i.e. the bias converges to zero with increasing sample size ${\displaystyle n}$.

Efficiency

Often there are several unbiased estimators available for the same parameter.  In this case, one would like to select the one with the smallest variance (which in this case is equal to the MSE). Let ${\displaystyle {\hat {\theta }}_{n}}$ and ${\displaystyle {\hat {\theta }}_{n}^{\star }}$ be two unbiased estimators of ${\displaystyle \theta }$ using a sample of size ${\displaystyle n}$.  The estimator ${\displaystyle {\hat {\theta }}_{n}}$ is called relatively efficient in comparison to ${\displaystyle {\hat {\theta }}_{n}^{\star }}$, if the variance of ${\displaystyle {\hat {\theta }}_{n}}$ is smaller than the variance of ${\displaystyle {\hat {\theta }}_{n}^{\star }}$, i.e., ${\displaystyle Var({\hat {\theta }}_{n}) The estimator ${\displaystyle {\hat {\theta }}_{n}}$ is called efficient if its variance is smaller than that of any other unbiased estimator.

Consistency

The consistency of an estimator is a property which focuses on the behavior of the estimator in large samples. In particular consistency requires that the estimator be close to the true parameter value with high probability in large samples. It is sufficient if the bias and variance of the estimator converge to zero.    Formally, suppose ${\displaystyle \lim _{n\rightarrow \infty }[E({\hat {\theta }}_{n})-\theta ]=0}$ and ${\displaystyle \lim _{n\rightarrow \infty }Var({\hat {\theta }}_{n})=0}$ Then the estimator is consistent.  Equivalently, the two conditions may be summarized using: ${\displaystyle \lim _{n\rightarrow \infty }MSE({\hat {\theta }}_{n})=0\,.}$ This notion of consistency is also referred to as ’squared mean consistency’. An alternative version, known as weak consistency is defined by the following: ${\displaystyle \lim _{n\rightarrow \infty }P(|{\hat {\theta }}_{n}-\theta |<\epsilon )=1}$ That is, the probability that the estimator ${\displaystyle {\hat {\theta }}_{n}}$ yields values within an arbitrarily small interval around the true parameter value ${\displaystyle \theta }$, converges to one with increasing sample size ${\displaystyle n}$. The probability that the estimator ${\displaystyle {\hat {\theta }}_{n}}$ differs from the true parameter value by more than ${\displaystyle \epsilon }$, converges to zero with increasing sample size ${\displaystyle n}$.  That is, ${\displaystyle \lim _{n\rightarrow \infty }P(|{\hat {\theta }}_{n}-\theta |\geq \epsilon )=0}$ The unknown mean ${\displaystyle E(X)=\mu }$ and variance ${\displaystyle \sigma ^{2}}$ will be estimated. A random sample of size ${\displaystyle n=12}$ was drawn from a population yielding the following data:1; 5; 3; 8; 7; 2; 1; 4; 3; 5; 3; 6. The sample mean ${\displaystyle {\bar {x}}={\frac {1}{n}}\sum \limits _{i=1}^{n}X_{i}\,,}$ is an unbiased and efficient estimator. Substituting the sample values yields ${\displaystyle {\bar {x}}={\frac {1}{12}}(1+5+3+8+7+2+1+4+3+5+3+6)={\frac {48}{12}}=4\,.}$ This result constitutes a point estimate of ${\displaystyle \mu }$. The estimator is given by: ${\displaystyle s^{2}={\frac {1}{n-1}}\sum \limits _{i=1}^{n}(X_{i}-{\bar {x}})^{2}\,,}$ Substituting the sample values yields the point estimate {\displaystyle {\begin{aligned}s^{2}&={\frac {1}{n-1}}\sum \limits _{i=1}^{12}(x_{i}-{\bar {x}})^{2}\\&={\frac {1}{11}}[(1-4)^{2}+(5-4)^{2}+\dots +(3-4)^{2}+(6-4)^{2}]={\frac {1}{11}}\cdot 56=5.09\,.\\&\end{aligned}}} Assume a population with mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$. Let ${\displaystyle (X_{1},X_{2},X_{3})}$ be a random sample drawn from the population. Each random variable ${\displaystyle X_{i},i=1,2,3}$ has ${\displaystyle E(X_{i})=\mu }$ and ${\displaystyle Var(X_{i})=\sigma ^{2}}$. Consider the following three estimators of the population mean:

1. ${\displaystyle {\hat {\mu }}}$${\displaystyle _{2}={\frac {1}{4}}(2X_{1}+2X_{3})}$
• Which estimators are unbiased?
• Which estimator is most efficient?

All of them are unbiased, since ${\displaystyle E(X_{i})=\mu }$ : {\displaystyle {\begin{aligned}E({\hat {\mu }}_{1})=E[{\frac {1}{3}}(X_{1}+X_{2}+X_{3})]={\frac {1}{3}}[E(X_{1})+E(X_{2})+E(X_{3})]={\frac {1}{3}}(\mu +\mu +\mu )&=\mu \\E({\hat {\mu }}_{2})=E[{\frac {1}{4}}(2X_{1}+2X_{3})]={\frac {1}{4}}[2E(X_{1})+2E(X_{3})]={\frac {1}{4}}(2\mu +2\mu )&=\mu \\E({\hat {\mu }}_{3})=E[{\frac {1}{3}}(2X_{1}+X_{2})]={\frac {1}{3}}[2E(X_{1})+E(X_{2})]={\frac {1}{3}}(2\mu +\mu )&=\mu \\&\end{aligned}}} The variance of each estimator is given by: {\displaystyle {\begin{aligned}Var({\hat {\mu }}_{1})&=Var[{\frac {1}{3}}(X_{1}+X_{2}+X_{3})]={\frac {1}{9}}Var(X_{1}+X_{2}+X_{3})={\frac {1}{9}}[Var(X_{1})+Var(X_{2})+Var(X_{3})]={\frac {1}{9}}(\sigma ^{2}+\sigma ^{2}+\sigma ^{2})={\frac {1}{3}}\sigma ^{2}\\Var({\hat {\mu }}_{2})&=Var[{\frac {1}{4}}(2X_{1}+2X_{3})]={\frac {1}{16}}Var(2X_{1}+2X_{3})={\frac {1}{16}}[4Var(X_{1})+4Var(X_{3})]={\frac {1}{16}}(4\sigma ^{2}+4\sigma ^{2})={\frac {1}{2}}\sigma ^{2}\\Var({\hat {\mu }}_{3})&=Var[{\frac {1}{3}}(2X_{1}+X_{2})]={\frac {1}{9}}Var(2X_{1}+X_{2})={\frac {1}{9}}[4Var(X_{1})+Var(X_{2})]={\frac {1}{9}}(4\sigma ^{2}+\sigma ^{2})={\frac {5}{9}}\sigma ^{2}\\&\end{aligned}}} The first estimator, because uses all the data, is the most efficient.  This estimator is of course the sample mean.  Note that even though the second and third estimators each use two observations, the third is less efficient than the second because it does not weight the observations equally. .

Mean Squared Error (MSE):

Recall the MSE is defined as ${\displaystyle MSE=E[({\hat {\theta }}-\theta )^{2}]}$ Expanding the expression one obtains: {\displaystyle {\begin{aligned}MSE&=E[({\hat {\theta }}-\theta )^{2}]\\&=E[({\hat {\theta }}-E({\hat {\theta }})+E({\hat {\theta }})-\theta )^{2}]\\&=E[({\hat {\theta }}-E({\hat {\theta }}))^{2}+2({\hat {\theta }}-E({\hat {\theta }}))(E({\hat {\theta }})-\theta )+(E({\hat {\theta }})-\theta )^{2}]\\&=E[({\hat {\theta }}-E({\hat {\theta }}))^{2}]+2E[({\hat {\theta }}-E({\hat {\theta }}))(E({\hat {\theta }})-\theta )]+[E({\hat {\theta }})-\theta )^{2}]\,.\\&\end{aligned}}} For the middle term we have: ${\displaystyle 2E[({\hat {\theta }}-E({\hat {\theta }}))(E({\hat {\theta }})-\theta )]=2[E({\hat {\theta }})-E({\hat {\theta }})][E({\hat {\theta }})-\theta )]=0}$ and consequently we have {\displaystyle {\begin{aligned}MSE&=E[({\hat {\theta }}-E({\hat {\theta }}))^{2}]+[E({\hat {\theta }})-\theta ]^{2}\\&=Var({\hat {\theta }})+[bias({\hat {\theta }})]^{2}\,.\\&\end{aligned}}} The MSE does not measure the actual estimation error that has occurred in a particular sample.  It measures the average squared error that would occur in repeated sample.

Unbiasedness

The following figure display three estimators of a parameter ${\displaystyle \theta }$.

The estimators ${\displaystyle {\hat {\theta }}_{1}}$ and ${\displaystyle {\hat {\theta }}_{2}}$ are unbiased since their expectation coincides with the true parameter ${\displaystyle \theta }$ (denoted by the vertical dashed line). In contrast, the estimator ${\displaystyle {\hat {\theta }}_{3}}$ is biased. For both unbiased estimators ${\displaystyle MSE=Var({\hat {\theta }})\,,}$ holds, as the bias equals zero. However ${\displaystyle {\hat {\theta }}_{1}}$ has lower variance and is therefore preferred to ${\displaystyle {\hat {\theta }}_{2}.}$It is also preferred to ${\displaystyle {\hat {\theta }}_{3}}$ which has the same variance but exhibits substantial positive bias. Each of the following widely used estimators are unbiased.

Sample Mean ${\displaystyle {\bar {x}}}$

The sample mean ${\displaystyle {\bar {x}}={\frac {1}{n}}\sum \limits _{i=1}^{n}X_{i}}$ is an unbiased estimator of unknown expectation ${\displaystyle E(X)=\mu }$ since ${\displaystyle E({\bar {x}})=\mu }$ See Section Distribution of the Sample Mean.

Sample Proportion ${\displaystyle {\widehat {\pi }}}$

The sample proportion ${\displaystyle {\widehat {\pi }}={\frac {1}{n}}\sum \limits _{i=1}^{n}X_{i}}$ is an unbiased estimator for the population proportion ${\displaystyle \pi }$ since ${\displaystyle E({\widehat {\pi }})=\pi \,,}$ See Section Distribution of the Sample Fraction.

Sample Variance

Assume a random sample of size ${\displaystyle n}$.

1. If the expectation ${\displaystyle E(X)=\mu }$ of the population is unknown and estimated using the sample mean, the estimator ${\displaystyle s^{2}={\frac {1}{n-1}}\sum \limits _{i=1}^{n}(X_{i}-{\bar {x}})^{2}}$ is an unbiased estimator of ${\displaystyle \sigma ^{2}}$, since ${\displaystyle E(s^{2})=\sigma ^{2}\,,}$ See Section Distribution of the Sample Variance  The standard deviation which is the square root of the sample variance ${\displaystyle s^{2}}$ is not an unbiased estimator of ${\displaystyle \sigma }$, as it tends to underestimate the population standard deviation.

The estimator ${\displaystyle MSD={\frac {1}{n}}\sum \limits _{i=1}^{n}(X_{i}-{\bar {x}})^{2}\,,}$ is not unbiased, since ${\displaystyle E(MSD)=E\left[{\frac {1}{n}}\sum \limits _{i=1}^{n}(X_{i}-{\bar {x}})^{2}\right]={\frac {1}{n}}E\left[\sum \limits _{i=1}^{n}(X_{i}-{\bar {x}})^{2}\right]={\frac {n-1}{n}}\sigma ^{2}\,,}$ See Section Distribution of the Sample Variance. The bias is given by: ${\displaystyle E(MSD)-\sigma ^{2}={\frac {n-1}{n}}\sigma ^{2}-\sigma ^{2}=-{\frac {\sigma ^{2}}{n}}\,.}$ Using the estimator ${\displaystyle MSD}$ one will tend to underestimate the unknown variance. The estimator, however is asymptotically unbiased as with increasing sample size ${\displaystyle n}$ the bias converges to zero.

Division by ${\displaystyle n-1,(}$as in ${\displaystyle s^{2}}$) rather than by ${\displaystyle n}$ (as in the ${\displaystyle MSD}$) assures unbiasedness.

Efficiency

• The sample mean ${\displaystyle {\bar {x}}}$ is an efficient estimator of the unknown population expectation ${\displaystyle \mu }$ . This is true for any distribution
• Suppose data are drawn from a ${\displaystyle N(\mu ;\sigma ^{2})}$ distribution. The sample mean ${\displaystyle {\bar {x}}}$ is an efficient estimator of ${\displaystyle \mu }$.  It can be shown that no unbiased estimator of ${\displaystyle \mu }$ exists which has a smaller variance.
• The sample mean ${\displaystyle {\bar {x}}}$ is an efficient estimator for the unknown parameter ${\displaystyle \lambda }$ of a poisson distribution.
• The sample proportion ${\displaystyle {\widehat {\pi }}}$ is an efficient estimator of the unknown population proportion ${\displaystyle \pi }$ for a dichotomous population, i.e. the underlying random variables have a common Bernoulli distribution.
• For a normally distributed population the sample mean ${\displaystyle {\bar {x}}}$ and the sample median ${\displaystyle md}$ are unbiased estimators of the unknown expectation ${\displaystyle \mu }$. For random samples (with replacement) we have: ${\displaystyle {\sigma ^{2}({\bar {x}})={\frac {\sigma ^{2}}{n}}}}$ Furthermore one can show that ${\displaystyle \sigma ^{2}(md)={\frac {\pi }{2}}{\frac {\sigma ^{2}}{n}}=1.571\,\,\sigma ^{2}({\bar {x}})}$ and hence ${\displaystyle \sigma ^{2}({\bar {x}})<\sigma ^{2}(md)\,.}$ The sample mean ${\displaystyle {\bar {x}}}$ is relatively efficient in contrast to the sample median ${\displaystyle md}$.
• The relative efficiency of various estimators of the same parameter in general depends on the distribution from which one is drawing observations.

Consistency

• Consistency is usually considered to be a minimum requirement of an estimator. Of course, consistency does not preclude the estimator having large bias and variance in small or moderately sized samples.  Consistency only guarantees that bias and variance go to zero for sufficiently large samples. On the other hand, since sample size cannot usually be increased at will, consistency may provide a poor guide to the finite sample properties of the estimator.

• For random samples, the sample mean ${\displaystyle {\bar {x}}_{n}}$ is a consistent estimator of the population expectation ${\displaystyle \mu }$ since ${\displaystyle bias\ {\bar {x}}_{n}=0}$ and the variance ${\displaystyle Var({\bar {x}}_{n})=\sigma ^{2}/n}$ converges to zero, i.e., ${\displaystyle \lim _{n\rightarrow \infty }{\frac {\sigma ^{2}}{n}}=0\,.}$

• For random samples the sample proportion ${\displaystyle {\widehat {\pi }}_{n}}$ is a consistent estimator for the population proportion ${\displaystyle \pi }$ as the estimator is unbiased ${\displaystyle bias\ {\widehat {\pi }}_{n}=0}$ and the variance ${\displaystyle Var({\widehat {\pi }}_{n})=\pi (1-\pi )/n}$ converges to zero, i.e., ${\displaystyle \lim _{n\rightarrow \infty }{\frac {\pi (1-\pi )}{n}}=0\,.}$

• For a Gaussian distributed population the sample median ${\displaystyle md}$ is a consistent estimator for the unknown parameter ${\displaystyle \mu }$ .

• For a Gaussian distribution, the estimator${\displaystyle s^{2}={\frac {1}{n-1}}\sum \limits _{i=1}^{n}(X_{i}-{\bar {x}})^{2}}$ is consistent for the unknown variance ${\displaystyle \sigma ^{2}}$, since the estimator is unbiased ${\displaystyle bias\,\,s^{2}=0}$ and the variance ${\displaystyle Var(s^{2})=2\sigma ^{4}/(n-1)}$ converges to zero:${\displaystyle \lim _{n\rightarrow \infty }{\frac {2\sigma ^{4}}{n-1}}=0\,.}$

The sample variance is also a consistent estimator of the population variance for arbitrary distributions which have a finite mean and variance.