From MM*Stat International

Jump to: navigation, search


  • Absolute frequency Number of occurences of certain value or combination of certain values of the investigated variable.

Metric scale containing natural unit of measurement and natural zero point. See also scale. of a null hypothesis is a set for the test statistic which does not lead to rejecting the null hypothesis. The hypothesis opposing to the null hypothesis (hypothesis testing) Under some assumptions, we are allowed to substitute some well known simple distribution (typically Normal) for the true and complicated one. This value is obtained by spreading the sum of all observed realizations uniformly across all statistical elements. The arithmetics average makes sense only for metrically scaled variables. Property of an estimator. With increasing number of observations, the expected value of the estimator converges towards the true value of the estimated parameter. [B]


(also dichotomous variable) Random variable whose result is always one from two distinct values, most often “0”,“1” or “true”,“false”. Distribution of a discrete random variable: "number of occurences of an event in n repetitions of the experiment if the probability of occurence of the event in one trial is p. The Binomial distribution has parameters n and p.

It measures the strength and the direction of the dependence between two metrically distributed random variables. It is the ratio of covariance (common variability) and the product of standard deviations (variability of the variables). Its value lies between -1 and 1. [C]


See Metric scale.

  • Census Sampling and investigation of all elements of the sample space.

See Median. The theorem concerns the approximation of the sum of random variables by Normal distribution for sufficiently large number of summands. An interval with fixed boundaries x_u and x_o (x_u \leq x_o) such that the probability that a random variable falls into this interval is equal to 1 - \alpha and that both the intervals lying outside the central probability interval have equal probability \alpha/2. Statistical test. The null hypothesis states that the true distribution function of the observed data is equal to a given distribution function. The test statistics has Chi-square distribution. Statistical test. The null hypothesis says that two random variables are independent. The test statistic has Chi-square distribution. A distribution of a sum of n independent, identically distributed random variables with standard Normal distributions. The parameter n is called the degrees of freedom. The boundary of a class of a metrically scaled variable is a value which bounds a given class from above (upper bound) or from bellow (lower bound). The difference between upper and lower bound is called the width of the group. The value representing the group which is obtained as arithmetic average of its upper and lower boundary.

The coefficient of determination measure the quality and suitability of the chosen regression function for given data. It is defined as a ration of the variability explained by the regression function and the total variability of the regressors, i.e., it can be interpreted as a proportion of variability explained by the regression model. Its values lie between 0 and 1, higher values mean that the model explains the data better. In linear regression, the coefficient of determination is equal to the square of the correlation coefficient.

  • Combination Choice of k elements out of total n elements if the order is not important is called combination of k-th class out of n elements. We distinguish combinations with and without replacement. See also combinatorics.

It investigates various ways of sorting and/or grouping of certain elements. It is very important for the probability theory. See also permutation, variation, combination. Sampling and investigation of all elements of the sample space. We distinguish the systematic components (trend, periodic fluctuations) and irregular random residual fluctuations. See event. Probability of an occurance of a certain event under the condition that some other event also occurs. In the frameworkk of two-dimensional frequency distribution, it is the distribution of a variable X (resp. Y) for fixed value (outcome) of variable Y (resp. X). Random interval, result of an interval estimate of some unknown parameter. Probability that the confidence interval calculated from our data covers the true unknow value of the estimated parameter. A property of an estimator of some unknown parameter. With increasing number of observations, the expected value of the consistent estimator converges towards the true value of the unknown parameter and its variance converges to zero. It measures the intensity of a relation between two nominal random variables. It is calculated using quadratic contingency and its value lies between 0 and 1, where 0 means statistical independence. The contingency coefficient is practically never equal to 1 (complete dependency). Therefore, the adjusted contingency coefficient was introduced. Two-dimensional contingency table (or cross-table) is used to display the joint frequency distribution of two nominal or ordinal random variables.

Two-dimensional correlation table displays the joint frequency distributionfre.dis of two metrically scaled random variables. A measure of joint variability of a pair of metrically scaled variables. It measure both the strength and the direction of the dependency. The correlation coefficient can be used to compare different covariances. Value(s) of the test statistics separating the critical and acceptance regions of the null hypothesis. It depends on the probability distribution of the test statistic and on the chosen level of significance. Values of the test statistic that lead to rejection of the null hypothesis. See contingency table. Data collected at the same point in time or for the same period of time on different elements.

  • Cumulated frequency Frequency of observations smaller or equal to a given value or, for grouped variables, the upper bound of the class in which this value lies. It is defined for at least ordinal variables. We can have absolute or relative cumulated frequency.



The dependent variable in the regression model. It is often called also the explained variable. See also regression function.

  • Descriptive statistics Statistical methods oriented towards the collection of data and its basic description. The results concern only the investegated set of data.

See binary variable. See intersection

The distribution function F(x) of random variable X is equal to the probability that the random variable is smaller or equal to x.

  • Dotplot Two-dimensional graphical display of one-dimensional data. On the horizontal axis, you find the observed value. The value on the vertical axis is arbitrary (usually randomly chosen).



A property of unbiased estimator. An estimator is called efficient if its variance is smaller than the variance of any other unbiased estimator of the same parameter.

  • Equivalence Equivalence of events means their equality. This means that whenever we observe A, we can observe also B and the other way around. In this case is A a subset of B and B is a subset of A. See also implication.

See event. Rejection of null hypothesis if it is true. Acceptance of null hypothesis if it is false. An event is any possible outcome of random experiment. Elementary event is an event which cannot be split to some partial events; elementary events are disjoint. Complementary event is a set of all elementary events of the sample space S which are not contained in the investigated event. Events are subsets of the sample space and therefore we can use here common set relations and operations. (See also implication, gllinkequivalenceequ, union, intersection, logical difference.) Realization of the estimator. Function of the sample variables which is suitable for estimating some unknown parameter of the investigated distribution. The value of the random variable which we expect to obtain before the random experiment is carried out. It corresponds to the arithmetic average of the frequency distribution. A distribution of a continuous random variable. It has the parameter \lambda and it represents the probability distribution of the distance of two subsequent events in a Poisson process. [F]


A distribution of a continuous random variable which is a ratio of two independent random variables with Chi-square distribution with f_1 and f_2 degrees of freedom. The distribution has two parameters, the above mentioned degrees of freedom f_1 and f_2.

  • Filter A set of weights which are used to calculated moving averages for a given time series. The choice of the filter depends on the type of seasonal fluctuations and on the desired level of smoothing. Symmetric filters are often used.

The average value of the dependent variable which is lying on the regression function for given values of the explanatory variables (regressor). See also regression function. For grouped data: the ratio of the absolute or relative frequencies of a certain group and the range of the group.

  • Frequency distribution Sorted results of an experimets together with their absolute frequencies are called the frequency distribution of the investigated variable. Depending on the number of variables, we distinguish one- and more-dimensional frequency distributions. Frequency table provides systematic and accessible information about the data.

See frequency distribution. [G]


It can be used to calculate the mean for (at least) ratio scale random variables with positive values, which are multiplicatively itnerrelated. The logarithm of the geometric average is equal to the arithmetic average of the logarithms of the observed values. Statistical test designed for the hypothesis that the unknown true distribution is equal to the specified distribution. Interval of some values of metrically scaled variable (see also grouping). Joining of equal or similar observations of some variable into one group or class. See also class boundaries. [H]


Discrete discribution with parameters M,N, and n. It describes the probability of occurence of an event in n repetitions of random experiment under assumptions of independence and constant probability of success in a single trial. [I]


Characteristic which clearly defines the sample space and which identifies statistical elements (so that we know if they belong to the sample space under investigation). Its value is the same for all statistical elements in the sample space and it doesn’t change during the investigation. Relation between events, if the even A occurs, then event B also has to occur. This is the same as saying that A is subset of B (see also equivalence). See sample space.

  1. According to the Theory of Probability, these are the methods which allow to make, with given accuracy, statements on the population based on the information from random sample.
  • Interpolation Method of calculating unknown function value from known “close” values of that function.
  • Interquartile range It is te difference between the upper and the lower quartile. It is the width (size) of a region in which lies 50% of the central observed values.
  • Intersection (of events) A set of all elementary events which belong to all events under consideration (i.e. the events involved in the intersection). Two events with an empty intersection are called disjoint events.

The unknown parameter is estimated by an interval which covers the true value of the parameter with prescribed probability. We can measure and interpret differences between the values of random variables which are measured on the interval scale. Such variables do not have any natural zero and any natural unit of measurement (see also scale). Characteristics of interest in the statistical investigation and whose (varying) values are observed on all statistical elements of the sample space. [J]



  • Kendall correlation coefficient Kendall rank correlation coefficient is based on the comparison of the order of all possible pairs of the observed values. The pairs of observations with the same (or opposite) order are called concordant (resp. discordant). Apart of this, some pairs can have equal values. Kendall correlation coefficient is the ratio of the difference between the number of concordant and discordant pairs and the sum of concordant and discordant pairs.



  1. Principle for the construction of estimators of an unknown parameter based on the minimalization of the sum of squared differences between sample values and some function of the parameter.

Probability that the test statistics falls into the critical region if the null hypothesis is true. Function which assigns, with respect to the observations, values (probability or density) to all possible values of the estimated parameter.



For two-dimensional frequency distribution, the marginal distribution is the one-dimensional distribution of the variable X (or Y) which does not contain any information about the distribution of the other random variables Y (or X). General principle for the construction of estimators of unknown parameters. The estimator is the value which maximazes the probability (or density) of the realized sample.

Characteristic of a location of frequency distribution. Each mean value is a value of the investigated property, i.e., it is measured on the same scale. Commonly used mean values are the mode, the median, and the arithmetic average.

  1. Expected value of the squared deviation of the estimator and the true value of the estimated parameter.
  • Median The value which splits the sorted realizations of (at least ordinal) random variable into two equal parts. It is robust with respect to outliers and it corresponds to the second quartile.
  • Metric scale (also cardinal scale). The metric scale is used if we can the realizations of the variable naturally sort and measure the differences between them. See also scale, interval scal, ratio scale, and absolute scale.
  • Mode It is the most often observed realizations of the variable. It can be determined for any scale. For the nominal variables it represents the only reasonable mean value. The mode is not sensitive with respect to outlying observations.



  • Nominal scale We say that the scale is nominal if only the equivalence of the results can be determined, i.e., the various results of the experiment cannot be sorted. See also scale.

A bell-shaped distribution of a continuous random variable with parameters \mu and \sigma. The parameters \mu determined the expected value and the parameter \sigma the standard deviation of the normally distributed random variable. Statistical formulation of some statement concerning the sample space which can be tested (and rejected) by a statistical test. [O]


  • Observation The actual values assumed by statistical variables.
  • Ordinal scale The scale is ordinal if the outcomes of the experiment could be represented by natural numbers, we can determine equivalence of two elements and the results can be naturally sorted. Attention: using ordinal scale, you cannot interpret the size of differences between the classes. See also scale.



Function giving the probability that random variable X equals to the value x_j. Statistical test about hypothesis concerning some unknown parameter of the sample.

(also seasonal fluctuations) Short-time influence which in regular patterns affects the behavior of a time series. The length of its period in economic data is often 1 year.

  • Permutation Each sorting of all n elements contained in some set is called a permutation. We distinguish permutations with repeating, permutations without repeating and permutations involving more groups of identical elements. See also combinatorics.

Realization of the estimator of some unknown parameter, calculated from the realized random sample. Distribution of a discrete random variable describing number of occurences of an event; the event occurs repeatedly, but randomly and independently in a fixed time period. The Poisson distribution has parameter \lambda.

  • Population Set of all statistical elements relevant for the statistical investigation of at least one chosen characteristic.

Function which gives the dependency of the probability of rejecting the null hypothesis on the true value of the tested parameter. See fitted value. Theory concentrated on quantitative models of experiments with random outcomes (random experiments).

See Confidence level. Interval with given bounds x_l and x_u (x_l \leq x_u) such taht the random variables falls into this interval with probability 1 - \alpha. It is obtained by assigning probabilities to the sorted values of random variable (discrete probability distribution). [Q]


Auxiliary variable used in the calculation of the contingency coefficient of nominal variables. It is the sum of quadratic deviations of the observed absolute (relative) frequencies from the absolute (relative) frequencies expected under the assumption of independence. Quantile x_p is the value which splits the upwards sorted realizations of the (at least ordinal) variable in the ratio p : (1 - p), where p lies between 0 and 1. Special cases are quartiles, quintiles, and deciles. Special case of the quantile for p = 0.25, p = 0.5, and p = 0.75. The sorted observations are split by the quartiles into four parts of equal size. The quartile x_{0,25} is the lower quartile, x_{0,75} is the upper quartile, and x_{0,5} is the median. Special case of the quantile for p = 0.2, 0.4, 0.6, and 0.8. The sorted observations are split by the quintiles into five equaly large parts. [R]


This is a real or constructed experiment which can be repeated arbitrarily many times under the same conditions and whose result can not be determined in advance. The ratio scale is characterized by the fact that ratios of our observations have natural interpretation. Variables with a ratio scale have natural zero, but they do not have natural measurement unit. Method of choosing elements of the sample space. Each element has nonzero probability of being selected. The probabilities do not have to be equal. Random variable is the (real) number which is assigned to every elementary event. Parameter of scale, it is the difference of the highest and smallest observation (for classified data it is the difference between the highest and smallest bound of the groups). Description of a dependency of the explained variable (dependent variable) on one or more explanatory variables (independent variables, regressors) via a (usually linear) function based on n observations. The regression function assigns to the values of the explanatory variable some average value (fitted value) which can be very different from the value which was really observed. The difference between the fitted value and the observations is called the residual. The independent variable in the regression model. Also: explanatory variable. See also regression function. The ratio of absolute frequency and the total number of observations. Method of random sampling. Each element has the same probability of being chosen. Random sample consisting of identically distributed random variables X_1, \dots, X_n which do not have to be independent. [S]


  • Sample survey Subset of the sample space; the elements which have been chosen for the statistical investigation.

Distribution of sample function. Random variable, function of the random variables (observed on the elements of the sample) X_1, \dots, X_n. Arithmetic average of the sample variables X_1, \dots, X_n. Ratio of the sample size n and the number of elements in the sample space N. Arithmetic average of some dichotomous (0-1) sample variables X_1, \dots, X_n. Number of elements in the sample. A set of all possible events of a random experiment. Each event is thus a subset of the sample space. The impossible event is empty set, the sure event is the complete sample space. Realizations (values) of the observed random variables X_1, \dots, X_n (after sampling the n elements). Random variable X_i which is defined as the value of the random variable X which will be observed on the i-th element of the sample space. Empirical variance of the sample variables X_1,\dots,X_n. Sampling procedure. The selected elements are taken out of the sample space before the choice of next element. It corresponds to the representative random sample. Sampling procedure. Each selected element is returned before next element is chosen. It corresponds to the simple random sample. Projection of some numerical set (scale) onto the set of investigated statistical elements, such that the relations are preserved. See also nominal scale, ordinal scale, metric scale, interval scale, ratio scale, and absolute scale. Graphical display of observed values of a pair of metrically scaled random variables. The values are displayed as a point in the cartesian system of coordinates. It allows to visualize the dependency between the variables. 3D scatterplot can be used for 3 variables.

  • Scatterplot matrix It is used for graphical display of more than two metrically scaled variables. It contains scatterplots of all pairs of the variables. Attention: with large number of variables, the scatterplot matrix becomes too complex to interpret.

See Periodic fluctuations. Method of random sampling. Each element of the set has equal probability of being chosen and the elements are selected independently. Random sample consisting from independent and identically distributed random variables X_1, \dots, X_n. “Identically distributed” means that the random variables have common distribution function F(x).

Normal distribution of a continuous random variable with expected value \mu = 0 and the variance \sigma = 1. Set of statistical elements which are during the time in the same state.

  • Statistics Science allowing to investigate objective empirical information obtained from (random) experiments and questionnaires, to build theoretical models for this information, and to analyze and interpret it.
  • Statistical element One object of the statistical investigation. It carries the information of interest in the experiment.
  • Statistical sequence The series of the observed values (data). The series can be sorted or unsorted.

The set of all statistical elements with corresponding defining, spatial, and temporal characteristics. A method which allows to draw conclusions concerning unknown distribution or some parameter of this unknown distribution based on the results of the random sample from this distribution.

Half-graphical display of the value of the observed series of the metrically scaled random variable.

See sample space. [T]


Variables of interest in the statistical investigation and whose (varying) values are observed on all statistical elements of the sample space. Bounds the probability that a random variable falls outside an interval around its expected value. Distribution of a continuous random variable with parameter f (degrees of freedom). Random variable with t-distribution can be obtained as a ratio of a two independent random variables with standard Normal and Chi-square distribution. Function of the observed values which is used in the statistical test. Statistical sequence whose values were obtained in a sequence in different time points or time periods. See also components of a time series.

  • Trend The long-time development of the observed time series. The trend is usually estimated by the method of moving averages of by the Least Squares method (see also filter).



Property of an estimator. The expected value of the estimator is equal to the true value of the estimated parameter. Discrete version: each possible value of the random variable has equal probability of occurence. Continuous version: random variable with constant density. The union of two events A and B is a set of all elementary events which belong to A or to B or to both A and B. [V]


The variance is mean squared error of the observed values from their arithmetic average.

  • Variations Each selection of k elements out of total n elements, where we take the ordering of the elements into account, is called variations of k-th class out of n elements. We distinguish variations with and without repetition. See also combinatorics.
  • Variation coefficient Relative measure of spread, it is the ratio of standard deviation and arithmetic average. It allows to compare the spread in different distributions.