fbpx
Wikipedia

Kanade–Lucas–Tomasi feature tracker

In computer vision, the Kanade–Lucas–Tomasi (KLT) feature tracker is an approach to feature extraction. It is proposed mainly for the purpose of dealing with the problem that traditional image registration techniques are generally costly. KLT makes use of spatial intensity information to direct the search for the position that yields the best match. It is faster than traditional techniques for examining far fewer potential matches between the images.

The registration problem edit

The traditional image registration problem can be characterized as follows: Given two functions   and  , representing pixel values at each location   in two images, respectively, where   is a vector. We wish to find the disparity vector   that minimizes some measure of the difference between   and  , for   in some region of interest  .

Some measures of the difference between   and  :

  • L1 norm:
     
  • L2 norm:
     
  • Negative of normalized correlation:
     

Basic description of the registration algorithm edit

The KLT feature tracker is based on two papers:

In the first paper, Lucas and Kanade[1] developed the idea of a local search using gradients weighted by an approximation to the second derivative of the image.

One-dimensional case edit

If   is the displacement between two images   and   then the approximation is made that

 

so that

 

This approximation to the gradient of the image is only accurate if the displacement of the local area between the two images to be registered is not too large. The approximation to   depends on  . For combining the various estimates of   at various values of  , it is natural to average them:

 

The average can be further improved by weighting the contribution of each term to it, which is inversely proportional to an estimate of  , where

 

For the purpose of facilitating the expression, a weighting function is defined:

 

The average with weighting is thereby:

 

Upon obtaining the estimate   can be moved by the estimate of  . The procedure is applied repeatedly, yielding a type of Newton–Raphson iteration. The sequence of estimates will ideally converge to the best  . The iteration can be expressed by

 

An alternative derivation edit

The derivation above cannot be generalized well to two dimensions for the 2-D linear approximation occurs differently. This can be corrected by applying the linear approximation in the form:

 

to find the   which minimizes the L2 norm measure of the difference (or error) between the curves, where the error can be expressed as:

 

To minimize the error with respect to  , partially differentiate   and set it to zero:

 
 

This is basically the same as the 1-D case, except for the fact that the weighting function   And the iteration form with weighting can be expressed as:

 

Performance edit

To evaluate the performance of the algorithm, we are naturally curious about under what conditions and how fast the sequence of  's converges to the real  .

Consider the case:

 

Both versions of the registration algorithm will converge to the correct   for  , i.e. for initial misregistrations as large as one-half wavelength. The range of convergence can be improved by suppressing high spatial frequencies in the image, which could be achieved by smoothing the image, that will also undesirably suppress small details of it. If the window of smoothing is much larger than the size of the object being matched, the object may be suppressed entirely, so that a match would be no longer possible.

Since lowpass-filtered images can be sampled at lower resolution with no loss of information, a coarse-to-fine strategy is adopted. A low-resolution smoothed version of the image can be used to obtain an approximate match. Applying the algorithm to higher resolution images will refine the match obtained at lower resolution.

As smoothing extends the range of convergence, the weighting function improves the accuracy of approximation, speeding up the convergence. Without weighting, the calculated displacement   of the first iteration with   falls off to zero as the displacement approaches one-half wavelength.

Implementation edit

The implementation requires the calculation of the weighted sums of the quantities     and   over the region of interest   Although   cannot be calculated exactly, it can be estimated by:

 

where   is chosen appropriately small.

Some sophisticated technique can be used for estimating the first derivatives, but in general such techniques are equivalent to first smoothing the function, and then taking the difference.

Generalization to multiple dimensions edit

The registration algorithm for 1-D and 2-D can be generalized to more dimensions. To do so, we try to minimize the L2 norm measure of error:

 

where   and   are n-dimensional row vectors.

A linear approximation analogous:

 

And partially differentiate   with respect to  :

 
 

which has much the same form as the 1-D version.

Further generalizations edit

The method can also be extended to take into account registration based on more complex transformations, such as rotation, scaling, and shearing, by considering

 

where   is a linear spatial transform. The error to be minimized is then

 

To determine the amount   to adjust   and   to adjust  , again, use the linear approximation:

 

The approximation can be used similarly to find the error expression, which becomes quadratic in the quantities to be minimized with respect to. After figuring out the error expression, differentiate it with respect to the quantities to be minimized and set the results zero, yielding a set of linear equations, then solve them.

A further generalization is designed for accounting for the fact that the brightness may be different in the two views, due to the difference of the viewpoints of the cameras or to differences in the processing of the two images. Assume the difference as linear transformation:

 

where   represents a contrast adjustment and   represents a brightness adjustment.

Combining this expression with the general linear transformation registration problem:

 

as the quantity to minimize with respect to       and  

Detection and tracking of point features edit

In the second paper Tomasi and Kanade[2] used the same basic method for finding the registration due to the translation but improved the technique by tracking features that are suitable for the tracking algorithm. The proposed features would be selected if both the eigenvalues of the gradient matrix were larger than some threshold.

By a very similar derivation, the problem is formulated as

 

where   is the gradient. This is the same as the last formula of Lucas–Kanade above. A local patch is considered a good feature to track if both of the two eigenvalues (  and  ) of   are larger than a threshold.

A tracking method based on these two papers is generally considered a KLT tracker.

Improvements and variations edit

In a third paper, Shi and Tomasi[3] proposed an additional stage of verifying that features were tracked correctly.

An affine transformation is fit between the image of the currently tracked feature and its image from a non-consecutive previous frame. If the affine compensated image is too dissimilar the feature is dropped.

The reasoning is that between consecutive frames a translation is a sufficient model for tracking but due to more complex motion, perspective effects, etc. a more complex model is required when frames are further apart.

Using a similar derivation as for the KLT, Shi and Tomasi showed that the search can be performed using the formula

 

where   is a matrix of gradients,   is a vector of affine coefficients and   is an error vector. Compare this to  .

References edit

  1. ^ Bruce D. Lucas and Takeo Kanade. An Iterative Image Registration Technique with an Application to Stereo Vision. International Joint Conference on Artificial Intelligence, pages 674–679, 1981.
  2. ^ Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991.
  3. ^ Jianbo Shi and Carlo Tomasi. Good Features to Track. IEEE Conference on Computer Vision and Pattern Recognition, pages 593–600, 1994.

See also edit

kanade, lucas, tomasi, feature, tracker, computer, vision, kanade, lucas, tomasi, feature, tracker, approach, feature, extraction, proposed, mainly, purpose, dealing, with, problem, that, traditional, image, registration, techniques, generally, costly, makes, . In computer vision the Kanade Lucas Tomasi KLT feature tracker is an approach to feature extraction It is proposed mainly for the purpose of dealing with the problem that traditional image registration techniques are generally costly KLT makes use of spatial intensity information to direct the search for the position that yields the best match It is faster than traditional techniques for examining far fewer potential matches between the images Contents 1 The registration problem 2 Basic description of the registration algorithm 2 1 One dimensional case 2 2 An alternative derivation 2 3 Performance 2 4 Implementation 2 5 Generalization to multiple dimensions 2 6 Further generalizations 3 Detection and tracking of point features 4 Improvements and variations 5 References 6 See alsoThe registration problem editThe traditional image registration problem can be characterized as follows Given two functions F x displaystyle F x nbsp and G x displaystyle G x nbsp representing pixel values at each location x displaystyle x nbsp in two images respectively where x displaystyle x nbsp is a vector We wish to find the disparity vector h displaystyle h nbsp that minimizes some measure of the difference between F x h displaystyle F x h nbsp and G x displaystyle G x nbsp for x displaystyle x nbsp in some region of interest R displaystyle R nbsp Some measures of the difference between F x h displaystyle F x h nbsp and G x displaystyle G x nbsp L1 norm x R F x h G x displaystyle sum x in R left vert F x h G x right vert nbsp L2 norm x R F x h G x 2 displaystyle sqrt sum x in R left F x h G x right 2 nbsp Negative of normalized correlation x R F x h G x x R F x h 2 x R G x 2 displaystyle dfrac sum x in R F x h G x sqrt sum x in R F x h 2 sqrt sum x in R G x 2 nbsp Basic description of the registration algorithm editThe KLT feature tracker is based on two papers In the first paper Lucas and Kanade 1 developed the idea of a local search using gradients weighted by an approximation to the second derivative of the image One dimensional case edit If h displaystyle h nbsp is the displacement between two images F x displaystyle F x nbsp and G x F x h displaystyle G x F x h nbsp then the approximation is made thatF x F x h F x h G x F x h displaystyle F x approx dfrac F x h F x h dfrac G x F x h nbsp so thath G x F x F x displaystyle h approx dfrac G x F x F x nbsp This approximation to the gradient of the image is only accurate if the displacement of the local area between the two images to be registered is not too large The approximation to h displaystyle h nbsp depends on x displaystyle x nbsp For combining the various estimates of h displaystyle h nbsp at various values of x displaystyle x nbsp it is natural to average them h x G x F x F x x 1 displaystyle h approx dfrac sum x dfrac G x F x F x sum x 1 nbsp The average can be further improved by weighting the contribution of each term to it which is inversely proportional to an estimate of F x displaystyle left vert F x right vert nbsp whereF x G x F x h displaystyle F x approx dfrac G x F x h nbsp For the purpose of facilitating the expression a weighting function is defined w x 1 G x F x displaystyle w x dfrac 1 left vert G x F x right vert nbsp The average with weighting is thereby h x w x G x F x F x x w x displaystyle h dfrac sum x dfrac w x left G x F x right F x sum x w x nbsp Upon obtaining the estimate F x displaystyle F x nbsp can be moved by the estimate of h displaystyle h nbsp The procedure is applied repeatedly yielding a type of Newton Raphson iteration The sequence of estimates will ideally converge to the best h displaystyle h nbsp The iteration can be expressed by h 0 0 h k 1 h k x w x G x F x h k F x h k x w x displaystyle begin cases h 0 0 h k 1 h k dfrac sum x dfrac w x left G x F x h k right F x h k sum x w x end cases nbsp An alternative derivation edit The derivation above cannot be generalized well to two dimensions for the 2 D linear approximation occurs differently This can be corrected by applying the linear approximation in the form F x h F x h F x displaystyle F x h approx F x hF x nbsp to find the h displaystyle h nbsp which minimizes the L2 norm measure of the difference or error between the curves where the error can be expressed as E x F x h G x 2 displaystyle E sum x left F x h G x right 2 nbsp To minimize the error with respect to h displaystyle h nbsp partially differentiate E displaystyle E nbsp and set it to zero 0 E h h x F x h F x G x 2 x 2 F x F x h F x G x displaystyle begin aligned 0 dfrac partial E partial h amp approx dfrac partial partial h sum x left F x hF x G x right 2 amp sum x 2F x left F x hF x G x right end aligned nbsp h x F x G x F x x F x 2 displaystyle Rightarrow h approx dfrac sum x F x G x F x sum x F x 2 nbsp This is basically the same as the 1 D case except for the fact that the weighting function w x F x 2 displaystyle w x F x 2 nbsp And the iteration form with weighting can be expressed as h 0 0 h k 1 h k x w x F x h k G x F x h k x w x F x h k 2 displaystyle begin cases h 0 0 h k 1 h k dfrac sum x w x F x h k left G x F x h k right sum x w x F x h k 2 end cases nbsp Performance edit To evaluate the performance of the algorithm we are naturally curious about under what conditions and how fast the sequence of h k displaystyle h k nbsp s converges to the real h displaystyle h nbsp Consider the case F x sin x G x F x h sin x h displaystyle begin aligned F x amp sin x G x amp F x h sin x h end aligned nbsp Both versions of the registration algorithm will converge to the correct h displaystyle h nbsp for h lt p displaystyle left vert h right vert lt pi nbsp i e for initial misregistrations as large as one half wavelength The range of convergence can be improved by suppressing high spatial frequencies in the image which could be achieved by smoothing the image that will also undesirably suppress small details of it If the window of smoothing is much larger than the size of the object being matched the object may be suppressed entirely so that a match would be no longer possible Since lowpass filtered images can be sampled at lower resolution with no loss of information a coarse to fine strategy is adopted A low resolution smoothed version of the image can be used to obtain an approximate match Applying the algorithm to higher resolution images will refine the match obtained at lower resolution As smoothing extends the range of convergence the weighting function improves the accuracy of approximation speeding up the convergence Without weighting the calculated displacement h 1 displaystyle h 1 nbsp of the first iteration with F x sin x displaystyle F x sin x nbsp falls off to zero as the displacement approaches one half wavelength Implementation edit The implementation requires the calculation of the weighted sums of the quantities F G displaystyle F G nbsp F F displaystyle F F nbsp and F 2 displaystyle F 2 nbsp over the region of interest R displaystyle R nbsp Although F x displaystyle F x nbsp cannot be calculated exactly it can be estimated by F x F x D x F x D x displaystyle F x approx dfrac F x Delta x F x Delta x nbsp where D x displaystyle Delta x nbsp is chosen appropriately small Some sophisticated technique can be used for estimating the first derivatives but in general such techniques are equivalent to first smoothing the function and then taking the difference Generalization to multiple dimensions edit The registration algorithm for 1 D and 2 D can be generalized to more dimensions To do so we try to minimize the L2 norm measure of error E x R F x h G x 2 displaystyle E sum mathbf x in R left F mathbf x mathbf h G mathbf x right 2 nbsp where x displaystyle mathbf x nbsp and h displaystyle mathbf h nbsp are n dimensional row vectors A linear approximation analogous F x h F x h x F x T displaystyle F mathbf x mathbf h approx F mathbf x mathbf h left dfrac partial partial mathbf x F mathbf x right T nbsp And partially differentiate E displaystyle E nbsp with respect to h displaystyle mathbf h nbsp 0 E h h x F x h F x T G x 2 x 2 F x h F x T G x F x displaystyle begin aligned 0 dfrac partial E partial mathbf h amp approx dfrac partial partial mathbf h sum mathbf x left F mathbf x mathbf h left dfrac partial F partial mathbf x right T G mathbf x right 2 amp sum mathbf x 2 left F mathbf x mathbf h left dfrac partial F partial mathbf x right T G mathbf x right left dfrac partial F partial mathbf x right end aligned nbsp h x G x F x F x x F x T F x 1 displaystyle Rightarrow mathbf h approx left sum mathbf x left G mathbf x F mathbf x right left dfrac partial F partial mathbf x right right left sum mathbf x left dfrac partial F partial mathbf x right T left dfrac partial F partial mathbf x right right 1 nbsp which has much the same form as the 1 D version Further generalizations edit The method can also be extended to take into account registration based on more complex transformations such as rotation scaling and shearing by consideringG x F A x h displaystyle G x F Ax h nbsp where A displaystyle A nbsp is a linear spatial transform The error to be minimized is thenE x F A x h G x 2 displaystyle E sum x left F Ax h G x right 2 nbsp To determine the amount D A displaystyle Delta A nbsp to adjust A displaystyle A nbsp and D h displaystyle Delta h nbsp to adjust h displaystyle h nbsp again use the linear approximation F x A D A h D h F A x h D A x D h x F x displaystyle F x A Delta A h Delta h approx F Ax h Delta Ax Delta h dfrac partial partial x F x nbsp The approximation can be used similarly to find the error expression which becomes quadratic in the quantities to be minimized with respect to After figuring out the error expression differentiate it with respect to the quantities to be minimized and set the results zero yielding a set of linear equations then solve them A further generalization is designed for accounting for the fact that the brightness may be different in the two views due to the difference of the viewpoints of the cameras or to differences in the processing of the two images Assume the difference as linear transformation F x a G x b displaystyle F x alpha G x beta nbsp where a displaystyle alpha nbsp represents a contrast adjustment and b displaystyle beta nbsp represents a brightness adjustment Combining this expression with the general linear transformation registration problem E x F A x h a G x b 2 displaystyle E sum x left F Ax h alpha G x beta right 2 nbsp as the quantity to minimize with respect to a displaystyle alpha nbsp b displaystyle beta nbsp A displaystyle A nbsp and h displaystyle h nbsp Detection and tracking of point features editIn the second paper Tomasi and Kanade 2 used the same basic method for finding the registration due to the translation but improved the technique by tracking features that are suitable for the tracking algorithm The proposed features would be selected if both the eigenvalues of the gradient matrix were larger than some threshold By a very similar derivation the problem is formulated as d e displaystyle nabla d e nbsp where displaystyle nabla nbsp is the gradient This is the same as the last formula of Lucas Kanade above A local patch is considered a good feature to track if both of the two eigenvalues l 1 displaystyle lambda 1 nbsp and l 2 displaystyle lambda 2 nbsp of displaystyle nabla nbsp are larger than a threshold A tracking method based on these two papers is generally considered a KLT tracker Improvements and variations editIn a third paper Shi and Tomasi 3 proposed an additional stage of verifying that features were tracked correctly An affine transformation is fit between the image of the currently tracked feature and its image from a non consecutive previous frame If the affine compensated image is too dissimilar the feature is dropped The reasoning is that between consecutive frames a translation is a sufficient model for tracking but due to more complex motion perspective effects etc a more complex model is required when frames are further apart Using a similar derivation as for the KLT Shi and Tomasi showed that the search can be performed using the formulaT z a displaystyle Tz a nbsp where T displaystyle T nbsp is a matrix of gradients z displaystyle z nbsp is a vector of affine coefficients and a displaystyle a nbsp is an error vector Compare this to d e displaystyle nabla d e nbsp References edit Bruce D Lucas and Takeo Kanade An Iterative Image Registration Technique with an Application to Stereo Vision International Joint Conference on Artificial Intelligence pages 674 679 1981 Carlo Tomasi and Takeo Kanade Detection and Tracking of Point Features Carnegie Mellon University Technical Report CMU CS 91 132 April 1991 Jianbo Shi and Carlo Tomasi Good Features to Track IEEE Conference on Computer Vision and Pattern Recognition pages 593 600 1994 See also editKanade Tomasi features in the context of feature detection Lucas Kanade method an optical flow algorithm derived from reference 1 Retrieved from https en wikipedia org w index php title Kanade Lucas Tomasi feature tracker amp oldid 1144995401, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.