fbpx
Wikipedia

Estimation theory

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:[1]

  • The probabilistic approach (described in this article) assumes that the measured data is random with probability distribution dependent on the parameters of interest
  • The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector.

Examples edit

For example, it is desired to estimate the proportion of a population of voters who will vote for a particular candidate. That proportion is the parameter sought; the estimate is based on a small random sample of voters. Alternatively, it is desired to estimate the probability of a voter voting for a particular candidate, based on some demographic features, such as age.

Or, for example, in radar the aim is to find the range of objects (airplanes, boats, etc.) by analyzing the two-way transit timing of received echoes of transmitted pulses. Since the reflected pulses are unavoidably embedded in electrical noise, their measured values are randomly distributed, so that the transit time must be estimated.

As another example, in electrical communication theory, the measurements which contain information regarding the parameters of interest are often associated with a noisy signal.

Basics edit

For a given model, several statistical "ingredients" are needed so the estimator can be implemented. The first is a statistical sample – a set of data points taken from a random vector (RV) of size N. Put into a vector,

 

Secondly, there are M parameters

 

whose values are to be estimated. Third, the continuous probability density function (pdf) or its discrete counterpart, the probability mass function (pmf), of the underlying distribution that generated the data must be stated conditional on the values of the parameters:

 

It is also possible for the parameters themselves to have a probability distribution (e.g., Bayesian statistics). It is then necessary to define the Bayesian probability

 

After the model is formed, the goal is to estimate the parameters, with the estimates commonly denoted  , where the "hat" indicates the estimate.

One common estimator is the minimum mean squared error (MMSE) estimator, which utilizes the error between the estimated parameters and the actual value of the parameters

 

as the basis for optimality. This error term is then squared and the expected value of this squared value is minimized for the MMSE estimator.

Estimators edit

Commonly used estimators (estimation methods) and topics related to them include:

Examples edit

Unknown constant in additive white Gaussian noise edit

Consider a received discrete signal,  , of   independent samples that consists of an unknown constant   with additive white Gaussian noise (AWGN)   with zero mean and known variance   (i.e.,  ). Since the variance is known then the only unknown parameter is  .

The model for the signal is then

 

Two possible (of many) estimators for the parameter   are:

  •  
  •   which is the sample mean

Both of these estimators have a mean of  , which can be shown through taking the expected value of each estimator

 

and

 

At this point, these two estimators would appear to perform the same. However, the difference between them becomes apparent when comparing the variances.

 

and

 

It would seem that the sample mean is a better estimator since its variance is lower for every N > 1.

Maximum likelihood edit

Continuing the example using the maximum likelihood estimator, the probability density function (pdf) of the noise for one sample   is

 

and the probability of   becomes (  can be thought of a  )

 

By independence, the probability of   becomes

 

Taking the natural logarithm of the pdf

 

and the maximum likelihood estimator is

 

Taking the first derivative of the log-likelihood function

 

and setting it to zero

 

This results in the maximum likelihood estimator

 

which is simply the sample mean. From this example, it was found that the sample mean is the maximum likelihood estimator for   samples of a fixed, unknown parameter corrupted by AWGN.

Cramér–Rao lower bound edit

To find the Cramér–Rao lower bound (CRLB) of the sample mean estimator, it is first necessary to find the Fisher information number

 

and copying from above

 

Taking the second derivative

 

and finding the negative expected value is trivial since it is now a deterministic constant  

Finally, putting the Fisher information into

 

results in

 

Comparing this to the variance of the sample mean (determined previously) shows that the sample mean is equal to the Cramér–Rao lower bound for all values of   and  . In other words, the sample mean is the (necessarily unique) efficient estimator, and thus also the minimum variance unbiased estimator (MVUE), in addition to being the maximum likelihood estimator.

Maximum of a uniform distribution edit

One of the simplest non-trivial examples of estimation is the estimation of the maximum of a uniform distribution. It is used as a hands-on classroom exercise and to illustrate basic principles of estimation theory. Further, in the case of estimation based on a single sample, it demonstrates philosophical issues and possible misunderstandings in the use of maximum likelihood estimators and likelihood functions.

Given a discrete uniform distribution   with unknown maximum, the UMVU estimator for the maximum is given by

 

where m is the sample maximum and k is the sample size, sampling without replacement.[2][3] This problem is commonly known as the German tank problem, due to application of maximum estimation to estimates of German tank production during World War II.

The formula may be understood intuitively as;

"The sample maximum plus the average gap between observations in the sample",

the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population maximum.[note 1]

This has a variance of[2]

 

so a standard deviation of approximately  , the (population) average size of a gap between samples; compare   above. This can be seen as a very simple case of maximum spacing estimation.

The sample maximum is the maximum likelihood estimator for the population maximum, but, as discussed above, it is biased.

Applications edit

Numerous fields require the use of estimation theory. Some of these fields include:

Measured data are likely to be subject to noise or uncertainty and it is through statistical probability that optimal solutions are sought to extract as much information from the data as possible.

See also edit

Notes edit

  1. ^ The sample maximum is never more than the population maximum, but can be less, hence it is a biased estimator: it will tend to underestimate the population maximum.

References edit

Citations edit

  1. ^ Walter, E.; Pronzato, L. (1997). Identification of Parametric Models from Experimental Data. London, England: Springer-Verlag.
  2. ^ a b Johnson, Roger (1994), "Estimating the Size of a Population", Teaching Statistics, 16 (2 (Summer)): 50–52, doi:10.1111/j.1467-9639.1994.tb00688.x
  3. ^ Johnson, Roger (2006), , Getting the Best from Teaching Statistics, archived from the original (PDF) on November 20, 2008

Sources edit

External links edit

  •   Media related to Estimation theory at Wikimedia Commons

estimation, theory, parameter, estimation, redirects, here, confused, with, point, estimation, interval, estimation, other, uses, estimation, disambiguation, branch, statistics, that, deals, with, estimating, values, parameters, based, measured, empirical, dat. Parameter estimation redirects here Not to be confused with Point estimation or Interval estimation For other uses see Estimation disambiguation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data An estimator attempts to approximate the unknown parameters using the measurements In estimation theory two approaches are generally considered 1 The probabilistic approach described in this article assumes that the measured data is random with probability distribution dependent on the parameters of interest The set membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector Contents 1 Examples 2 Basics 3 Estimators 4 Examples 4 1 Unknown constant in additive white Gaussian noise 4 1 1 Maximum likelihood 4 1 2 Cramer Rao lower bound 4 2 Maximum of a uniform distribution 5 Applications 6 See also 7 Notes 8 References 8 1 Citations 8 2 Sources 9 External linksExamples editFor example it is desired to estimate the proportion of a population of voters who will vote for a particular candidate That proportion is the parameter sought the estimate is based on a small random sample of voters Alternatively it is desired to estimate the probability of a voter voting for a particular candidate based on some demographic features such as age Or for example in radar the aim is to find the range of objects airplanes boats etc by analyzing the two way transit timing of received echoes of transmitted pulses Since the reflected pulses are unavoidably embedded in electrical noise their measured values are randomly distributed so that the transit time must be estimated As another example in electrical communication theory the measurements which contain information regarding the parameters of interest are often associated with a noisy signal Basics editFor a given model several statistical ingredients are needed so the estimator can be implemented The first is a statistical sample a set of data points taken from a random vector RV of size N Put into a vector x x 0 x 1 x N 1 displaystyle mathbf x begin bmatrix x 0 x 1 vdots x N 1 end bmatrix nbsp Secondly there are M parameters 8 8 1 8 2 8 M displaystyle boldsymbol theta begin bmatrix theta 1 theta 2 vdots theta M end bmatrix nbsp whose values are to be estimated Third the continuous probability density function pdf or its discrete counterpart the probability mass function pmf of the underlying distribution that generated the data must be stated conditional on the values of the parameters p x 8 displaystyle p mathbf x boldsymbol theta nbsp It is also possible for the parameters themselves to have a probability distribution e g Bayesian statistics It is then necessary to define the Bayesian probability p 8 displaystyle pi boldsymbol theta nbsp After the model is formed the goal is to estimate the parameters with the estimates commonly denoted 8 displaystyle hat boldsymbol theta nbsp where the hat indicates the estimate One common estimator is the minimum mean squared error MMSE estimator which utilizes the error between the estimated parameters and the actual value of the parameters e 8 8 displaystyle mathbf e hat boldsymbol theta boldsymbol theta nbsp as the basis for optimality This error term is then squared and the expected value of this squared value is minimized for the MMSE estimator Estimators editMain article Estimator Commonly used estimators estimation methods and topics related to them include Maximum likelihood estimators Bayes estimators Method of moments estimators Cramer Rao bound Least squares Minimum mean squared error MMSE also known as Bayes least squared error BLSE Maximum a posteriori MAP Minimum variance unbiased estimator MVUE Nonlinear system identification Best linear unbiased estimator BLUE Unbiased estimators see estimator bias Particle filter Markov chain Monte Carlo MCMC Kalman filter and its various derivatives Wiener filterExamples editUnknown constant in additive white Gaussian noise edit Consider a received discrete signal x n displaystyle x n nbsp of N displaystyle N nbsp independent samples that consists of an unknown constant A displaystyle A nbsp with additive white Gaussian noise AWGN w n displaystyle w n nbsp with zero mean and known variance s 2 displaystyle sigma 2 nbsp i e N 0 s 2 displaystyle mathcal N 0 sigma 2 nbsp Since the variance is known then the only unknown parameter is A displaystyle A nbsp The model for the signal is then x n A w n n 0 1 N 1 displaystyle x n A w n quad n 0 1 dots N 1 nbsp Two possible of many estimators for the parameter A displaystyle A nbsp are A 1 x 0 displaystyle hat A 1 x 0 nbsp A 2 1 N n 0 N 1 x n displaystyle hat A 2 frac 1 N sum n 0 N 1 x n nbsp which is the sample mean Both of these estimators have a mean of A displaystyle A nbsp which can be shown through taking the expected value of each estimator E A 1 E x 0 A displaystyle mathrm E left hat A 1 right mathrm E left x 0 right A nbsp and E A 2 E 1 N n 0 N 1 x n 1 N n 0 N 1 E x n 1 N N A A displaystyle mathrm E left hat A 2 right mathrm E left frac 1 N sum n 0 N 1 x n right frac 1 N left sum n 0 N 1 mathrm E left x n right right frac 1 N left NA right A nbsp At this point these two estimators would appear to perform the same However the difference between them becomes apparent when comparing the variances v a r A 1 v a r x 0 s 2 displaystyle mathrm var left hat A 1 right mathrm var left x 0 right sigma 2 nbsp and v a r A 2 v a r 1 N n 0 N 1 x n independence 1 N 2 n 0 N 1 v a r x n 1 N 2 N s 2 s 2 N displaystyle mathrm var left hat A 2 right mathrm var left frac 1 N sum n 0 N 1 x n right overset text independence frac 1 N 2 left sum n 0 N 1 mathrm var x n right frac 1 N 2 left N sigma 2 right frac sigma 2 N nbsp It would seem that the sample mean is a better estimator since its variance is lower for every N gt 1 Maximum likelihood edit Main article Maximum likelihood Continuing the example using the maximum likelihood estimator the probability density function pdf of the noise for one sample w n displaystyle w n nbsp is p w n 1 s 2 p exp 1 2 s 2 w n 2 displaystyle p w n frac 1 sigma sqrt 2 pi exp left frac 1 2 sigma 2 w n 2 right nbsp and the probability of x n displaystyle x n nbsp becomes x n displaystyle x n nbsp can be thought of a N A s 2 displaystyle mathcal N A sigma 2 nbsp p x n A 1 s 2 p exp 1 2 s 2 x n A 2 displaystyle p x n A frac 1 sigma sqrt 2 pi exp left frac 1 2 sigma 2 x n A 2 right nbsp By independence the probability of x displaystyle mathbf x nbsp becomes p x A n 0 N 1 p x n A 1 s 2 p N exp 1 2 s 2 n 0 N 1 x n A 2 displaystyle p mathbf x A prod n 0 N 1 p x n A frac 1 left sigma sqrt 2 pi right N exp left frac 1 2 sigma 2 sum n 0 N 1 x n A 2 right nbsp Taking the natural logarithm of the pdf ln p x A N ln s 2 p 1 2 s 2 n 0 N 1 x n A 2 displaystyle ln p mathbf x A N ln left sigma sqrt 2 pi right frac 1 2 sigma 2 sum n 0 N 1 x n A 2 nbsp and the maximum likelihood estimator is A arg max ln p x A displaystyle hat A arg max ln p mathbf x A nbsp Taking the first derivative of the log likelihood function A ln p x A 1 s 2 n 0 N 1 x n A 1 s 2 n 0 N 1 x n N A displaystyle frac partial partial A ln p mathbf x A frac 1 sigma 2 left sum n 0 N 1 x n A right frac 1 sigma 2 left sum n 0 N 1 x n NA right nbsp and setting it to zero 0 1 s 2 n 0 N 1 x n N A n 0 N 1 x n N A displaystyle 0 frac 1 sigma 2 left sum n 0 N 1 x n NA right sum n 0 N 1 x n NA nbsp This results in the maximum likelihood estimator A 1 N n 0 N 1 x n displaystyle hat A frac 1 N sum n 0 N 1 x n nbsp which is simply the sample mean From this example it was found that the sample mean is the maximum likelihood estimator for N displaystyle N nbsp samples of a fixed unknown parameter corrupted by AWGN Cramer Rao lower bound edit Further information Cramer Rao bound To find the Cramer Rao lower bound CRLB of the sample mean estimator it is first necessary to find the Fisher information number I A E A ln p x A 2 E 2 A 2 ln p x A displaystyle mathcal I A mathrm E left left frac partial partial A ln p mathbf x A right 2 right mathrm E left frac partial 2 partial A 2 ln p mathbf x A right nbsp and copying from above A ln p x A 1 s 2 n 0 N 1 x n N A displaystyle frac partial partial A ln p mathbf x A frac 1 sigma 2 left sum n 0 N 1 x n NA right nbsp Taking the second derivative 2 A 2 ln p x A 1 s 2 N N s 2 displaystyle frac partial 2 partial A 2 ln p mathbf x A frac 1 sigma 2 N frac N sigma 2 nbsp and finding the negative expected value is trivial since it is now a deterministic constant E 2 A 2 ln p x A N s 2 displaystyle mathrm E left frac partial 2 partial A 2 ln p mathbf x A right frac N sigma 2 nbsp Finally putting the Fisher information into v a r A 1 I displaystyle mathrm var left hat A right geq frac 1 mathcal I nbsp results in v a r A s 2 N displaystyle mathrm var left hat A right geq frac sigma 2 N nbsp Comparing this to the variance of the sample mean determined previously shows that the sample mean is equal to the Cramer Rao lower bound for all values of N displaystyle N nbsp and A displaystyle A nbsp In other words the sample mean is the necessarily unique efficient estimator and thus also the minimum variance unbiased estimator MVUE in addition to being the maximum likelihood estimator Maximum of a uniform distribution edit Main article German tank problem One of the simplest non trivial examples of estimation is the estimation of the maximum of a uniform distribution It is used as a hands on classroom exercise and to illustrate basic principles of estimation theory Further in the case of estimation based on a single sample it demonstrates philosophical issues and possible misunderstandings in the use of maximum likelihood estimators and likelihood functions Given a discrete uniform distribution 1 2 N displaystyle 1 2 dots N nbsp with unknown maximum the UMVU estimator for the maximum is given by k 1 k m 1 m m k 1 displaystyle frac k 1 k m 1 m frac m k 1 nbsp where m is the sample maximum and k is the sample size sampling without replacement 2 3 This problem is commonly known as the German tank problem due to application of maximum estimation to estimates of German tank production during World War II The formula may be understood intuitively as The sample maximum plus the average gap between observations in the sample the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population maximum note 1 This has a variance of 2 1 k N k N 1 k 2 N 2 k 2 for small samples k N displaystyle frac 1 k frac N k N 1 k 2 approx frac N 2 k 2 text for small samples k ll N nbsp so a standard deviation of approximately N k displaystyle N k nbsp the population average size of a gap between samples compare m k displaystyle frac m k nbsp above This can be seen as a very simple case of maximum spacing estimation The sample maximum is the maximum likelihood estimator for the population maximum but as discussed above it is biased Applications editNumerous fields require the use of estimation theory Some of these fields include Interpretation of scientific experiments Signal processing Clinical trials Opinion polls Quality control Telecommunications Project management Software engineering Control theory in particular Adaptive control Network intrusion detection system Orbit determination Measured data are likely to be subject to noise or uncertainty and it is through statistical probability that optimal solutions are sought to extract as much information from the data as possible See also editMain category Estimation theory Best linear unbiased estimator BLUE Completeness statistics Detection theory Efficiency statistics Expectation maximization algorithm EM algorithm Fermi problem Grey box model Information theory Least squares spectral analysis Matched filter Maximum entropy spectral estimation Nuisance parameter Parametric equation Pareto principle Rule of three statistics State estimator Statistical signal processing Sufficiency statistics Notes edit The sample maximum is never more than the population maximum but can be less hence it is a biased estimator it will tend to underestimate the population maximum References editCitations edit Walter E Pronzato L 1997 Identification of Parametric Models from Experimental Data London England Springer Verlag a b Johnson Roger 1994 Estimating the Size of a Population Teaching Statistics 16 2 Summer 50 52 doi 10 1111 j 1467 9639 1994 tb00688 x Johnson Roger 2006 Estimating the Size of a Population Getting the Best from Teaching Statistics archived from the original PDF on November 20 2008 Sources edit Theory of Point Estimation by E L Lehmann and G Casella ISBN 0387985026 Systems Cost Engineering by Dale Shermon ISBN 978 0 566 08861 2 Mathematical Statistics and Data Analysis by John Rice ISBN 0 534 209343 Fundamentals of Statistical Signal Processing Estimation Theory by Steven M Kay ISBN 0 13 345711 7 An Introduction to Signal Detection and Estimation by H Vincent Poor ISBN 0 387 94173 8 Detection Estimation and Modulation Theory Part 1 by Harry L Van Trees ISBN 0 471 09517 6 website Optimal State Estimation Kalman H infinity and Nonlinear Approaches by Dan Simon website Archived 2010 12 30 at the Wayback Machine Ali H Sayed Adaptive Filters Wiley NJ 2008 ISBN 978 0 470 25388 5 Ali H Sayed Fundamentals of Adaptive Filtering Wiley NJ 2003 ISBN 0 471 46126 1 Thomas Kailath Ali H Sayed and Babak Hassibi Linear Estimation Prentice Hall NJ 2000 ISBN 978 0 13 022464 4 Babak Hassibi Ali H Sayed and Thomas Kailath Indefinite Quadratic Estimation and Control A Unified Approach to H2 and H displaystyle infty nbsp Theories Society for Industrial amp Applied Mathematics SIAM PA 1999 ISBN 978 0 89871 411 1 V G Voinov M S Nikulin Unbiased estimators and their applications Vol 1 Univariate case Kluwer Academic Publishers 1993 ISBN 0 7923 2382 3 V G Voinov M S Nikulin Unbiased estimators and their applications Vol 2 Multivariate case Kluwer Academic Publishers 1996 ISBN 0 7923 3939 8 External links edit nbsp Media related to Estimation theory at Wikimedia Commons Retrieved from https en wikipedia org w index php title Estimation theory amp oldid 1214045430, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.