fbpx
Wikipedia

Least mean squares filter

Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal (difference between the desired and the actual signal). It is a stochastic gradient descent method in that the filter is only adapted based on the error at the current time. It was invented in 1960 by Stanford University professor Bernard Widrow and his first Ph.D. student, Ted Hoff, based on their research in single-layer neural networks (ADALINE). Specifically, they used gradient descent to train ADALINE to recognize patterns, and called the algorithm "delta rule". They then applied the rule to filters, resulting in the LMS algorithm.

Problem formulation edit

The picture shows the various parts of the filter.   is the input signal, which is then transformed by an unknown filter   that we wish to match using  . The output from the unknown filter is  , which is then interfered with a noise signal  , producing  . Then the error signal   is computed, and it is fed back to the adaptive filter, to adjust its parameters in order to minimize the mean square of the error signal  .

 

Relationship to the Wiener filter edit

The realization of the causal Wiener filter looks a lot like the solution to the least squares estimate, except in the signal processing domain. The least squares solution for input matrix   and output vector   is

 

The FIR least mean squares filter is related to the Wiener filter, but minimizing the error criterion of the former does not rely on cross-correlations or auto-correlations. Its solution converges to the Wiener filter solution. Most linear adaptive filtering problems can be formulated using the block diagram above. That is, an unknown system   is to be identified and the adaptive filter attempts to adapt the filter   to make it as close as possible to  , while using only observable signals  ,   and  ; but  ,   and   are not directly observable. Its solution is closely related to the Wiener filter.

Definition of symbols edit

  is the number of the current input sample
  is the number of filter taps
  (Hermitian transpose or conjugate transpose)
 
 
 
 
  estimated filter; interpret as the estimation of the filter coefficients after n samples
 

Idea edit

The basic idea behind LMS filter is to approach the optimum filter weights  , by updating the filter weights in a manner to converge to the optimum filter weight. This is based on the gradient descent algorithm. The algorithm starts by assuming small weights (zero in most cases) and, at each step, by finding the gradient of the mean square error, the weights are updated. That is, if the MSE-gradient is positive, it implies the error would keep increasing positively if the same weight is used for further iterations, which means we need to reduce the weights. In the same way, if the gradient is negative, we need to increase the weights. The weight update equation is

 

where   represents the mean-square error and   is a convergence coefficient.

The negative sign shows that we go down the slope of the error,   to find the filter weights,  , which minimize the error.

The mean-square error as a function of filter weights is a quadratic function which means it has only one extremum, that minimizes the mean-square error, which is the optimal weight. The LMS thus, approaches towards this optimal weights by ascending/descending down the mean-square-error vs filter weight curve.

Derivation edit

The idea behind LMS filters is to use steepest descent to find filter weights   which minimize a cost function. We start by defining the cost function as

 

where   is the error at the current sample n and   denotes the expected value.

This cost function ( ) is the mean square error, and it is minimized by the LMS. This is where the LMS gets its name. Applying steepest descent means to take the partial derivatives with respect to the individual entries of the filter coefficient (weight) vector

 

where   is the gradient operator

 
 

Now,   is a vector which points towards the steepest ascent of the cost function. To find the minimum of the cost function we need to take a step in the opposite direction of  . To express that in mathematical terms

 

where   is the step size (adaptation constant). That means we have found a sequential update algorithm which minimizes the cost function. Unfortunately, this algorithm is not realizable until we know  .

Generally, the expectation above is not computed. Instead, to run the LMS in an online (updating after each new sample is received) environment, we use an instantaneous estimate of that expectation. See below.

Simplifications edit

For most systems the expectation function   must be approximated. This can be done with the following unbiased estimator

 

where   indicates the number of samples we use for that estimate. The simplest case is  

 

For that simple case the update algorithm follows as

 

Indeed, this constitutes the update algorithm for the LMS filter.

LMS algorithm summary edit

The LMS algorithm for a  th order filter can be summarized as

Parameters:   filter order
  step size
Initialisation:  
Computation: For  

 

 
 

Convergence and stability in the mean edit

As the LMS algorithm does not use the exact values of the expectations, the weights would never reach the optimal weights in the absolute sense, but a convergence is possible in mean. That is, even though the weights may change by small amounts, it changes about the optimal weights. However, if the variance with which the weights change, is large, convergence in mean would be misleading. This problem may occur, if the value of step-size   is not chosen properly.

If   is chosen to be large, the amount with which the weights change depends heavily on the gradient estimate, and so the weights may change by a large value so that gradient which was negative at the first instant may now become positive. And at the second instant, the weight may change in the opposite direction by a large amount because of the negative gradient and would thus keep oscillating with a large variance about the optimal weights. On the other hand, if   is chosen to be too small, time to converge to the optimal weights will be too large.

Thus, an upper bound on   is needed which is given as  ,

where   is the greatest eigenvalue of the autocorrelation matrix  . If this condition is not fulfilled, the algorithm becomes unstable and   diverges.

Maximum convergence speed is achieved when

 

where   is the smallest eigenvalue of  . Given that   is less than or equal to this optimum, the convergence speed is determined by  , with a larger value yielding faster convergence. This means that faster convergence can be achieved when   is close to  , that is, the maximum achievable convergence speed depends on the eigenvalue spread of  .

A white noise signal has autocorrelation matrix   where   is the variance of the signal. In this case all eigenvalues are equal, and the eigenvalue spread is the minimum over all possible matrices. The common interpretation of this result is therefore that the LMS converges quickly for white input signals, and slowly for colored input signals, such as processes with low-pass or high-pass characteristics.

It is important to note that the above upperbound on   only enforces stability in the mean, but the coefficients of   can still grow infinitely large, i.e. divergence of the coefficients is still possible. A more practical bound is

 

where   denotes the trace of  . This bound guarantees that the coefficients of   do not diverge (in practice, the value of   should not be chosen close to this upper bound, since it is somewhat optimistic due to approximations and assumptions made in the derivation of the bound).

Normalized least mean squares filter (NLMS) edit

The main drawback of the "pure" LMS algorithm is that it is sensitive to the scaling of its input  . This makes it very hard (if not impossible) to choose a learning rate   that guarantees stability of the algorithm (Haykin 2002). The Normalised least mean squares filter (NLMS) is a variant of the LMS algorithm that solves this problem by normalising with the power of the input. The NLMS algorithm can be summarised as:

Parameters:   filter order
  step size
Initialization:  
Computation: For  

 

 
 

Optimal learning rate edit

It can be shown that if there is no interference ( ), then the optimal learning rate for the NLMS algorithm is

 

and is independent of the input   and the real (unknown) impulse response  . In the general case with interference ( ), the optimal learning rate is

 

The results above assume that the signals   and   are uncorrelated to each other, which is generally the case in practice.

Proof edit

Let the filter misalignment be defined as  , we can derive the expected misalignment for the next sample as:

 
 

Let   and  

 
 

Assuming independence, we have:

 
 

The optimal learning rate is found at  , which leads to:

 
 

See also edit

References edit

  • Monson H. Hayes: Statistical Digital Signal Processing and Modeling, Wiley, 1996, ISBN 0-471-59431-8
  • Simon Haykin: Adaptive Filter Theory, Prentice Hall, 2002, ISBN 0-13-048434-2
  • Simon S. Haykin, Bernard Widrow (Editor): Least-Mean-Square Adaptive Filters, Wiley, 2003, ISBN 0-471-21570-8
  • Bernard Widrow, Samuel D. Stearns: Adaptive Signal Processing, Prentice Hall, 1985, ISBN 0-13-004029-0
  • Weifeng Liu, Jose Principe and Simon Haykin: Kernel Adaptive Filtering: A Comprehensive Introduction, John Wiley, 2010, ISBN 0-470-44753-2
  • Paulo S.R. Diniz: Adaptive Filtering: Algorithms and Practical Implementation, Kluwer Academic Publishers, 1997, ISBN 0-7923-9912-9

External links edit

  • LMS Algorithm in Adaptive Antenna Arrays www.antenna-theory.com
  • LMS Noise cancellation demo www.advsolned.com

least, mean, squares, filter, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, january, 2019, learn, when, remove, this, messag. This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations January 2019 Learn how and when to remove this message Least mean squares LMS algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal difference between the desired and the actual signal It is a stochastic gradient descent method in that the filter is only adapted based on the error at the current time It was invented in 1960 by Stanford University professor Bernard Widrow and his first Ph D student Ted Hoff based on their research in single layer neural networks ADALINE Specifically they used gradient descent to train ADALINE to recognize patterns and called the algorithm delta rule They then applied the rule to filters resulting in the LMS algorithm Contents 1 Problem formulation 1 1 Relationship to the Wiener filter 1 2 Definition of symbols 2 Idea 3 Derivation 4 Simplifications 5 LMS algorithm summary 6 Convergence and stability in the mean 7 Normalized least mean squares filter NLMS 7 1 Optimal learning rate 7 2 Proof 8 See also 9 References 10 External linksProblem formulation editThe picture shows the various parts of the filter x displaystyle x nbsp is the input signal which is then transformed by an unknown filter h displaystyle h nbsp that we wish to match using h displaystyle hat h nbsp The output from the unknown filter is y displaystyle y nbsp which is then interfered with a noise signal n displaystyle nu nbsp producing d y n displaystyle d y nu nbsp Then the error signal e d y y n y displaystyle e d hat y y nu hat y nbsp is computed and it is fed back to the adaptive filter to adjust its parameters in order to minimize the mean square of the error signal e displaystyle e nbsp nbsp Relationship to the Wiener filter edit The realization of the causal Wiener filter looks a lot like the solution to the least squares estimate except in the signal processing domain The least squares solution for input matrix X displaystyle mathbf X nbsp and output vector y displaystyle boldsymbol y nbsp is b X T X 1 X T y displaystyle boldsymbol hat beta mathbf X mathbf T mathbf X 1 mathbf X mathbf T boldsymbol y nbsp The FIR least mean squares filter is related to the Wiener filter but minimizing the error criterion of the former does not rely on cross correlations or auto correlations Its solution converges to the Wiener filter solution Most linear adaptive filtering problems can be formulated using the block diagram above That is an unknown system h n displaystyle mathbf h n nbsp is to be identified and the adaptive filter attempts to adapt the filter h n displaystyle hat mathbf h n nbsp to make it as close as possible to h n displaystyle mathbf h n nbsp while using only observable signals x n displaystyle x n nbsp d n displaystyle d n nbsp and e n displaystyle e n nbsp but y n displaystyle y n nbsp v n displaystyle v n nbsp and h n displaystyle h n nbsp are not directly observable Its solution is closely related to the Wiener filter Definition of symbols edit n displaystyle n nbsp is the number of the current input sample p displaystyle p nbsp is the number of filter taps H displaystyle cdot H nbsp Hermitian transpose or conjugate transpose x n x n x n 1 x n p 1 T displaystyle mathbf x n left x n x n 1 dots x n p 1 right T nbsp h n h 0 n h 1 n h p 1 n T h n C p displaystyle mathbf h n left h 0 n h 1 n dots h p 1 n right T quad mathbf h n in mathbb C p nbsp y n h H n x n displaystyle y n mathbf h H n cdot mathbf x n nbsp d n y n n n displaystyle d n y n nu n nbsp h n displaystyle hat mathbf h n nbsp estimated filter interpret as the estimation of the filter coefficients after n samples e n d n y n d n h H n x n displaystyle e n d n hat y n d n hat mathbf h H n cdot mathbf x n nbsp Idea editThe basic idea behind LMS filter is to approach the optimum filter weights R 1 P displaystyle R 1 P nbsp by updating the filter weights in a manner to converge to the optimum filter weight This is based on the gradient descent algorithm The algorithm starts by assuming small weights zero in most cases and at each step by finding the gradient of the mean square error the weights are updated That is if the MSE gradient is positive it implies the error would keep increasing positively if the same weight is used for further iterations which means we need to reduce the weights In the same way if the gradient is negative we need to increase the weights The weight update equation is W n 1 W n m e n displaystyle W n 1 W n mu nabla varepsilon n nbsp where e displaystyle varepsilon nbsp represents the mean square error and m displaystyle mu nbsp is a convergence coefficient The negative sign shows that we go down the slope of the error e displaystyle varepsilon nbsp to find the filter weights W i displaystyle W i nbsp which minimize the error The mean square error as a function of filter weights is a quadratic function which means it has only one extremum that minimizes the mean square error which is the optimal weight The LMS thus approaches towards this optimal weights by ascending descending down the mean square error vs filter weight curve Derivation editThe idea behind LMS filters is to use steepest descent to find filter weights h n displaystyle hat mathbf h n nbsp which minimize a cost function We start by defining the cost function as C n E e n 2 displaystyle C n E left e n 2 right nbsp where e n displaystyle e n nbsp is the error at the current sample n and E displaystyle E cdot nbsp denotes the expected value This cost function C n displaystyle C n nbsp is the mean square error and it is minimized by the LMS This is where the LMS gets its name Applying steepest descent means to take the partial derivatives with respect to the individual entries of the filter coefficient weight vector h H C n h H E e n e n 2 E h H e n e n displaystyle nabla hat mathbf h H C n nabla hat mathbf h H E left e n e n right 2E left nabla hat mathbf h H e n e n right nbsp where displaystyle nabla nbsp is the gradient operator h H e n h H d n h H x n x n displaystyle nabla hat mathbf h H e n nabla hat mathbf h H left d n hat mathbf h H cdot mathbf x n right mathbf x n nbsp C n 2 E x n e n displaystyle nabla C n 2E left mathbf x n e n right nbsp Now C n displaystyle nabla C n nbsp is a vector which points towards the steepest ascent of the cost function To find the minimum of the cost function we need to take a step in the opposite direction of C n displaystyle nabla C n nbsp To express that in mathematical terms h n 1 h n m 2 C n h n m E x n e n displaystyle hat mathbf h n 1 hat mathbf h n frac mu 2 nabla C n hat mathbf h n mu E left mathbf x n e n right nbsp where m 2 displaystyle frac mu 2 nbsp is the step size adaptation constant That means we have found a sequential update algorithm which minimizes the cost function Unfortunately this algorithm is not realizable until we know E x n e n displaystyle E left mathbf x n e n right nbsp Generally the expectation above is not computed Instead to run the LMS in an online updating after each new sample is received environment we use an instantaneous estimate of that expectation See below Simplifications editFor most systems the expectation function E x n e n displaystyle E left mathbf x n e n right nbsp must be approximated This can be done with the following unbiased estimator E x n e n 1 N i 0 N 1 x n i e n i displaystyle hat E left mathbf x n e n right frac 1 N sum i 0 N 1 mathbf x n i e n i nbsp where N displaystyle N nbsp indicates the number of samples we use for that estimate The simplest case is N 1 displaystyle N 1 nbsp E x n e n x n e n displaystyle hat E left mathbf x n e n right mathbf x n e n nbsp For that simple case the update algorithm follows as h n 1 h n m x n e n displaystyle hat mathbf h n 1 hat mathbf h n mu mathbf x n e n nbsp Indeed this constitutes the update algorithm for the LMS filter LMS algorithm summary editThe LMS algorithm for a p displaystyle p nbsp th order filter can be summarized as Parameters p displaystyle p nbsp filter order m displaystyle mu nbsp step size Initialisation h 0 zeros p displaystyle hat mathbf h 0 operatorname zeros p nbsp Computation For n 0 1 2 displaystyle n 0 1 2 nbsp x n x n x n 1 x n p 1 T displaystyle mathbf x n left x n x n 1 dots x n p 1 right T nbsp e n d n h H n x n displaystyle e n d n hat mathbf h H n mathbf x n nbsp h n 1 h n m e n x n displaystyle hat mathbf h n 1 hat mathbf h n mu e n mathbf x n nbsp Convergence and stability in the mean editAs the LMS algorithm does not use the exact values of the expectations the weights would never reach the optimal weights in the absolute sense but a convergence is possible in mean That is even though the weights may change by small amounts it changes about the optimal weights However if the variance with which the weights change is large convergence in mean would be misleading This problem may occur if the value of step size m displaystyle mu nbsp is not chosen properly If m displaystyle mu nbsp is chosen to be large the amount with which the weights change depends heavily on the gradient estimate and so the weights may change by a large value so that gradient which was negative at the first instant may now become positive And at the second instant the weight may change in the opposite direction by a large amount because of the negative gradient and would thus keep oscillating with a large variance about the optimal weights On the other hand if m displaystyle mu nbsp is chosen to be too small time to converge to the optimal weights will be too large Thus an upper bound on m displaystyle mu nbsp is needed which is given as 0 lt m lt 2 l m a x displaystyle 0 lt mu lt frac 2 lambda mathrm max nbsp where l max displaystyle lambda max nbsp is the greatest eigenvalue of the autocorrelation matrix R E x n x H n displaystyle mathbf R E mathbf x n mathbf x H n nbsp If this condition is not fulfilled the algorithm becomes unstable and h n displaystyle hat h n nbsp diverges Maximum convergence speed is achieved when m 2 l m a x l m i n displaystyle mu frac 2 lambda mathrm max lambda mathrm min nbsp where l min displaystyle lambda min nbsp is the smallest eigenvalue of R displaystyle mathbf R nbsp Given that m displaystyle mu nbsp is less than or equal to this optimum the convergence speed is determined by l min displaystyle lambda min nbsp with a larger value yielding faster convergence This means that faster convergence can be achieved when l max displaystyle lambda max nbsp is close to l min displaystyle lambda min nbsp that is the maximum achievable convergence speed depends on the eigenvalue spread of R displaystyle mathbf R nbsp A white noise signal has autocorrelation matrix R s 2 I displaystyle mathbf R sigma 2 mathbf I nbsp where s 2 displaystyle sigma 2 nbsp is the variance of the signal In this case all eigenvalues are equal and the eigenvalue spread is the minimum over all possible matrices The common interpretation of this result is therefore that the LMS converges quickly for white input signals and slowly for colored input signals such as processes with low pass or high pass characteristics It is important to note that the above upperbound on m displaystyle mu nbsp only enforces stability in the mean but the coefficients of h n displaystyle hat h n nbsp can still grow infinitely large i e divergence of the coefficients is still possible A more practical bound is 0 lt m lt 2 t r R displaystyle 0 lt mu lt frac 2 mathrm tr left mathbf R right nbsp where t r R displaystyle mathrm tr mathbf R nbsp denotes the trace of R displaystyle mathbf R nbsp This bound guarantees that the coefficients of h n displaystyle hat h n nbsp do not diverge in practice the value of m displaystyle mu nbsp should not be chosen close to this upper bound since it is somewhat optimistic due to approximations and assumptions made in the derivation of the bound Normalized least mean squares filter NLMS editThe main drawback of the pure LMS algorithm is that it is sensitive to the scaling of its input x n displaystyle x n nbsp This makes it very hard if not impossible to choose a learning rate m displaystyle mu nbsp that guarantees stability of the algorithm Haykin 2002 The Normalised least mean squares filter NLMS is a variant of the LMS algorithm that solves this problem by normalising with the power of the input The NLMS algorithm can be summarised as Parameters p displaystyle p nbsp filter order m displaystyle mu nbsp step size Initialization h 0 zeros p displaystyle hat mathbf h 0 operatorname zeros p nbsp Computation For n 0 1 2 displaystyle n 0 1 2 nbsp x n x n x n 1 x n p 1 T displaystyle mathbf x n left x n x n 1 dots x n p 1 right T nbsp e n d n h H n x n displaystyle e n d n hat mathbf h H n mathbf x n nbsp h n 1 h n m e n x n x H n x n displaystyle hat mathbf h n 1 hat mathbf h n frac mu e n mathbf x n mathbf x H n mathbf x n nbsp Optimal learning rate edit It can be shown that if there is no interference v n 0 displaystyle v n 0 nbsp then the optimal learning rate for the NLMS algorithm is m o p t 1 displaystyle mu opt 1 nbsp and is independent of the input x n displaystyle x n nbsp and the real unknown impulse response h n displaystyle mathbf h n nbsp In the general case with interference v n 0 displaystyle v n neq 0 nbsp the optimal learning rate is m o p t E y n y n 2 E e n 2 displaystyle mu opt frac E left left y n hat y n right 2 right E left e n 2 right nbsp The results above assume that the signals v n displaystyle v n nbsp and x n displaystyle x n nbsp are uncorrelated to each other which is generally the case in practice Proof edit Let the filter misalignment be defined as L n h n h n 2 displaystyle Lambda n left mathbf h n hat mathbf h n right 2 nbsp we can derive the expected misalignment for the next sample as E L n 1 E h n m e n x n x H n x n h n 2 displaystyle E left Lambda n 1 right E left left hat mathbf h n frac mu e n mathbf x n mathbf x H n mathbf x n mathbf h n right 2 right nbsp E L n 1 E h n m v n y n y n x n x H n x n h n 2 displaystyle E left Lambda n 1 right E left left hat mathbf h n frac mu left v n y n hat y n right mathbf x n mathbf x H n mathbf x n mathbf h n right 2 right nbsp Let d h n h n displaystyle mathbf delta hat mathbf h n mathbf h n nbsp and r n y n y n displaystyle r n hat y n y n nbsp E L n 1 E d n m v n r n x n x H n x n 2 displaystyle E left Lambda n 1 right E left left mathbf delta n frac mu left v n r n right mathbf x n mathbf x H n mathbf x n right 2 right nbsp E L n 1 E d n m v n r n x n x H n x n H d n m v n r n x n x H n x n displaystyle E left Lambda n 1 right E left left mathbf delta n frac mu left v n r n right mathbf x n mathbf x H n mathbf x n right H left mathbf delta n frac mu left v n r n right mathbf x n mathbf x H n mathbf x n right right nbsp Assuming independence we have E L n 1 L n E m v n r n x n x H n x n H m v n r n x n x H n x n 2 E m r n 2 x H n x n displaystyle E left Lambda n 1 right Lambda n E left left frac mu left v n r n right mathbf x n mathbf x H n mathbf x n right H left frac mu left v n r n right mathbf x n mathbf x H n mathbf x n right right 2E left frac mu r n 2 mathbf x H n mathbf x n right nbsp E L n 1 L n m 2 E e n 2 x H n x n 2 m E r n 2 x H n x n displaystyle E left Lambda n 1 right Lambda n frac mu 2 E left e n 2 right mathbf x H n mathbf x n frac 2 mu E left r n 2 right mathbf x H n mathbf x n nbsp The optimal learning rate is found at d E L n 1 d m 0 displaystyle frac dE left Lambda n 1 right d mu 0 nbsp which leads to 2 m E e n 2 2 E r n 2 0 displaystyle 2 mu E left e n 2 right 2E left r n 2 right 0 nbsp m E r n 2 E e n 2 displaystyle mu frac E left r n 2 right E left e n 2 right nbsp See also editRecursive least squares For statistical techniques relevant to LMS filter see Least squares Similarities between Wiener and LMS Multidelay block frequency domain adaptive filter Zero forcing equalizer Kernel adaptive filter Matched filter Wiener filterReferences editMonson H Hayes Statistical Digital Signal Processing and Modeling Wiley 1996 ISBN 0 471 59431 8 Simon Haykin Adaptive Filter Theory Prentice Hall 2002 ISBN 0 13 048434 2 Simon S Haykin Bernard Widrow Editor Least Mean Square Adaptive Filters Wiley 2003 ISBN 0 471 21570 8 Bernard Widrow Samuel D Stearns Adaptive Signal Processing Prentice Hall 1985 ISBN 0 13 004029 0 Weifeng Liu Jose Principe and Simon Haykin Kernel Adaptive Filtering A Comprehensive Introduction John Wiley 2010 ISBN 0 470 44753 2 Paulo S R Diniz Adaptive Filtering Algorithms and Practical Implementation Kluwer Academic Publishers 1997 ISBN 0 7923 9912 9External links editLMS Algorithm in Adaptive Antenna Arrays www antenna theory com LMS Noise cancellation demo www advsolned com Retrieved from https en wikipedia org w index php title Least mean squares filter amp oldid 1221777592, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.