fbpx
Wikipedia

Camera matrix

In computer vision a camera matrix or (camera) projection matrix is a matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.

Let be a representation of a 3D point in homogeneous coordinates (a 4-dimensional vector), and let be a representation of the image of this point in the pinhole camera (a 3-dimensional vector). Then the following relation holds

where is the camera matrix and the sign implies that the left and right hand sides are equal except for a multiplication by a non-zero scalar :

Since the camera matrix is involved in the mapping between elements of two projective spaces, it too can be regarded as a projective element. This means that it has only 11 degrees of freedom since any multiplication by a non-zero scalar results in an equivalent camera matrix.

Derivation edit

The mapping from the coordinates of a 3D point P to the 2D image coordinates of the point's projection onto the image plane, according to the pinhole camera model, is given by

 

where   are the 3D coordinates of P relative to a camera centered coordinate system,   are the resulting image coordinates, and f is the camera's focal length for which we assume f > 0. Furthermore, we also assume that x3 > 0.

To derive the camera matrix, the expression above is rewritten in terms of homogeneous coordinates. Instead of the 2D vector   we consider the projective element (a 3D vector)   and instead of equality we consider equality up to scaling by a non-zero number, denoted  . First, we write the homogeneous image coordinates as expressions in the usual 3D coordinates.

 

Finally, also the 3D coordinates are expressed in a homogeneous representation   and this is how the camera matrix appears:

    or    

where   is the camera matrix, which here is given by

 ,

and the corresponding camera matrix now becomes

 

The last step is a consequence of   itself being a projective element.

The camera matrix derived here may appear trivial in the sense that it contains very few non-zero elements. This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points. In practice, however, other forms of camera matrices are common, as will be shown below.

Camera position edit

The camera matrix   derived in the previous section has a null space which is spanned by the vector

 

This is also the homogeneous representation of the 3D point which has coordinates (0,0,0), that is, the "camera center" (aka the entrance pupil; the position of the pinhole of a pinhole camera) is at O. This means that the camera center (and only this point) cannot be mapped to a point in the image plane by the camera (or equivalently, it maps to all points on the image as every ray on the image goes through this point).

For any other 3D point with  , the result   is well-defined and has the form  . This corresponds to a point at infinity in the projective image plane (even though, if the image plane is taken to be a Euclidean plane, no corresponding intersection point exists).

Normalized camera matrix and normalized image coordinates edit

The camera matrix derived above can be simplified even further if we assume that f = 1:

 

where   here denotes a   identity matrix. Note that   matrix   here is divided into a concatenation of a   matrix and a 3-dimensional vector. The camera matrix   is sometimes referred to as a canonical form.

So far all points in the 3D world have been represented in a camera centered coordinate system, that is, a coordinate system which has its origin at the camera center (the location of the pinhole of a pinhole camera). In practice however, the 3D points may be represented in terms of coordinates relative to an arbitrary coordinate system (X1', X2', X3'). Assuming that the camera coordinate axes (X1, X2, X3) and the axes (X1', X2', X3') are of Euclidean type (orthogonal and isotropic), there is a unique Euclidean 3D transformation (rotation and translation) between the two coordinate systems. In other words, the camera is not necessarily at the origin looking along the z axis.

The two operations of rotation and translation of 3D coordinates can be represented as the two   matrices

  and  

where   is a   rotation matrix and   is a 3-dimensional translation vector. When the first matrix is multiplied onto the homogeneous representation of a 3D point, the result is the homogeneous representation of the rotated point, and the second matrix performs instead a translation. Performing the two operations in sequence, i.e. first the rotation and then the translation (with translation vector given in the already rotated coordinate system), gives a combined rotation and translation matrix

 

Assuming that   and   are precisely the rotation and translations which relate the two coordinate system (X1,X2,X3) and (X1',X2',X3') above, this implies that

 

where   is the homogeneous representation of the point P in the coordinate system (X1',X2',X3').

Assuming also that the camera matrix is given by  , the mapping from the coordinates in the (X1,X2,X3) system to homogeneous image coordinates becomes

 

Consequently, the camera matrix which relates points in the coordinate system (X1',X2',X3') to image coordinates is

 

a concatenation of a 3D rotation matrix and a 3-dimensional translation vector.

This type of camera matrix is referred to as a normalized camera matrix, it assumes focal length = 1 and that image coordinates are measured in a coordinate system where the origin is located at the intersection between axis X3 and the image plane and has the same units as the 3D coordinate system. The resulting image coordinates are referred to as normalized image coordinates.

The camera position edit

Again, the null space of the normalized camera matrix,   described above, is spanned by the 4-dimensional vector

 

This is also, again, the coordinates of the camera center, now relative to the (X1',X2',X3') system. This can be seen by applying first the rotation and then the translation to the 3-dimensional vector   and the result is the homogeneous representation of 3D coordinates (0,0,0).

This implies that the camera center (in its homogeneous representation) lies in the null space of the camera matrix, provided that it is represented in terms of 3D coordinates relative to the same coordinate system as the camera matrix refers to.

The normalized camera matrix   can now be written as

 

where   is the 3D coordinates of the camera relative to the (X1',X2',X3') system.

General camera matrix edit

Given the mapping produced by a normalized camera matrix, the resulting normalized image coordinates can be transformed by means of an arbitrary 2D homography. This includes 2D translations and rotations as well as scaling (isotropic and anisotropic) but also general 2D perspective transformations. Such a transformation can be represented as a   matrix   which maps the homogeneous normalized image coordinates   to the homogeneous transformed image coordinates  :

 

Inserting the above expression for the normalized image coordinates in terms of the 3D coordinates gives

 

This produces the most general form of camera matrix

 

See also edit

References edit

  • Richard Hartley and Andrew Zisserman (2003). Multiple View Geometry in computer vision. Cambridge University Press. ISBN 0-521-54051-8.

camera, matrix, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, july, 2010,. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Camera matrix news newspapers books scholar JSTOR July 2010 Learn how and when to remove this message In computer vision a camera matrix or camera projection matrix is a 3 4 displaystyle 3 times 4 matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image Let x displaystyle mathbf x be a representation of a 3D point in homogeneous coordinates a 4 dimensional vector and let y displaystyle mathbf y be a representation of the image of this point in the pinhole camera a 3 dimensional vector Then the following relation holds y C x displaystyle mathbf y sim mathbf C mathbf x where C displaystyle mathbf C is the camera matrix and the displaystyle sim sign implies that the left and right hand sides are equal except for a multiplication by a non zero scalar k 0 displaystyle k neq 0 y k C x displaystyle mathbf y k mathbf C mathbf x Since the camera matrix C displaystyle mathbf C is involved in the mapping between elements of two projective spaces it too can be regarded as a projective element This means that it has only 11 degrees of freedom since any multiplication by a non zero scalar results in an equivalent camera matrix Contents 1 Derivation 2 Camera position 3 Normalized camera matrix and normalized image coordinates 3 1 The camera position 4 General camera matrix 5 See also 6 ReferencesDerivation editThe mapping from the coordinates of a 3D point P to the 2D image coordinates of the point s projection onto the image plane according to the pinhole camera model is given by y 1 y 2 f x 3 x 1 x 2 displaystyle begin pmatrix y 1 y 2 end pmatrix frac f x 3 begin pmatrix x 1 x 2 end pmatrix nbsp where x 1 x 2 x 3 displaystyle x 1 x 2 x 3 nbsp are the 3D coordinates of P relative to a camera centered coordinate system y 1 y 2 displaystyle y 1 y 2 nbsp are the resulting image coordinates and f is the camera s focal length for which we assume f gt 0 Furthermore we also assume that x3 gt 0 To derive the camera matrix the expression above is rewritten in terms of homogeneous coordinates Instead of the 2D vector y 1 y 2 displaystyle y 1 y 2 nbsp we consider the projective element a 3D vector y y 1 y 2 1 displaystyle mathbf y y 1 y 2 1 nbsp and instead of equality we consider equality up to scaling by a non zero number denoted displaystyle sim nbsp First we write the homogeneous image coordinates as expressions in the usual 3D coordinates y 1 y 2 1 f x 3 x 1 f x 3 x 2 1 x 1 x 2 x 3 f displaystyle begin pmatrix y 1 y 2 1 end pmatrix begin pmatrix frac f x 3 x 1 frac f x 3 x 2 1 end pmatrix sim begin pmatrix x 1 x 2 frac x 3 f end pmatrix nbsp Finally also the 3D coordinates are expressed in a homogeneous representation x displaystyle mathbf x nbsp and this is how the camera matrix appears y 1 y 2 1 1 0 0 0 0 1 0 0 0 0 1 f 0 x 1 x 2 x 3 1 displaystyle begin pmatrix y 1 y 2 1 end pmatrix sim begin pmatrix 1 amp 0 amp 0 amp 0 0 amp 1 amp 0 amp 0 0 amp 0 amp frac 1 f amp 0 end pmatrix begin pmatrix x 1 x 2 x 3 1 end pmatrix nbsp or y C x displaystyle mathbf y sim mathbf C mathbf x nbsp where C displaystyle mathbf C nbsp is the camera matrix which here is given by C 1 0 0 0 0 1 0 0 0 0 1 f 0 displaystyle mathbf C begin pmatrix 1 amp 0 amp 0 amp 0 0 amp 1 amp 0 amp 0 0 amp 0 amp frac 1 f amp 0 end pmatrix nbsp and the corresponding camera matrix now becomes C 1 0 0 0 0 1 0 0 0 0 1 f 0 f 0 0 0 0 f 0 0 0 0 1 0 displaystyle mathbf C begin pmatrix 1 amp 0 amp 0 amp 0 0 amp 1 amp 0 amp 0 0 amp 0 amp frac 1 f amp 0 end pmatrix sim begin pmatrix f amp 0 amp 0 amp 0 0 amp f amp 0 amp 0 0 amp 0 amp 1 amp 0 end pmatrix nbsp The last step is a consequence of C displaystyle mathbf C nbsp itself being a projective element The camera matrix derived here may appear trivial in the sense that it contains very few non zero elements This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points In practice however other forms of camera matrices are common as will be shown below Camera position editThe camera matrix C displaystyle mathbf C nbsp derived in the previous section has a null space which is spanned by the vector n 0 0 0 1 displaystyle mathbf n begin pmatrix 0 0 0 1 end pmatrix nbsp This is also the homogeneous representation of the 3D point which has coordinates 0 0 0 that is the camera center aka the entrance pupil the position of the pinhole of a pinhole camera is at O This means that the camera center and only this point cannot be mapped to a point in the image plane by the camera or equivalently it maps to all points on the image as every ray on the image goes through this point For any other 3D point with x 3 0 displaystyle x 3 0 nbsp the result y C x displaystyle mathbf y sim mathbf C mathbf x nbsp is well defined and has the form y y 1 y 2 0 displaystyle mathbf y y 1 y 2 0 top nbsp This corresponds to a point at infinity in the projective image plane even though if the image plane is taken to be a Euclidean plane no corresponding intersection point exists Normalized camera matrix and normalized image coordinates editThe camera matrix derived above can be simplified even further if we assume that f 1 C 0 1 0 0 0 0 1 0 0 0 0 1 0 I 0 displaystyle mathbf C 0 begin pmatrix 1 amp 0 amp 0 amp 0 0 amp 1 amp 0 amp 0 0 amp 0 amp 1 amp 0 end pmatrix left begin array c c mathbf I amp mathbf 0 end array right nbsp where I displaystyle mathbf I nbsp here denotes a 3 3 displaystyle 3 times 3 nbsp identity matrix Note that 3 4 displaystyle 3 times 4 nbsp matrix C displaystyle mathbf C nbsp here is divided into a concatenation of a 3 3 displaystyle 3 times 3 nbsp matrix and a 3 dimensional vector The camera matrix C 0 displaystyle mathbf C 0 nbsp is sometimes referred to as a canonical form So far all points in the 3D world have been represented in a camera centered coordinate system that is a coordinate system which has its origin at the camera center the location of the pinhole of a pinhole camera In practice however the 3D points may be represented in terms of coordinates relative to an arbitrary coordinate system X1 X2 X3 Assuming that the camera coordinate axes X1 X2 X3 and the axes X1 X2 X3 are of Euclidean type orthogonal and isotropic there is a unique Euclidean 3D transformation rotation and translation between the two coordinate systems In other words the camera is not necessarily at the origin looking along the z axis The two operations of rotation and translation of 3D coordinates can be represented as the two 4 4 displaystyle 4 times 4 nbsp matrices R 0 0 1 displaystyle left begin array c c mathbf R amp mathbf 0 hline mathbf 0 amp 1 end array right nbsp and I t 0 1 displaystyle left begin array c c mathbf I amp mathbf t hline mathbf 0 amp 1 end array right nbsp where R displaystyle mathbf R nbsp is a 3 3 displaystyle 3 times 3 nbsp rotation matrix and t displaystyle mathbf t nbsp is a 3 dimensional translation vector When the first matrix is multiplied onto the homogeneous representation of a 3D point the result is the homogeneous representation of the rotated point and the second matrix performs instead a translation Performing the two operations in sequence i e first the rotation and then the translation with translation vector given in the already rotated coordinate system gives a combined rotation and translation matrix R t 0 1 displaystyle left begin array c c mathbf R amp mathbf t hline mathbf 0 amp 1 end array right nbsp Assuming that R displaystyle mathbf R nbsp and t displaystyle mathbf t nbsp are precisely the rotation and translations which relate the two coordinate system X1 X2 X3 and X1 X2 X3 above this implies that x R t 0 1 x displaystyle mathbf x left begin array c c mathbf R amp mathbf t hline mathbf 0 amp 1 end array right mathbf x nbsp where x displaystyle mathbf x nbsp is the homogeneous representation of the point P in the coordinate system X1 X2 X3 Assuming also that the camera matrix is given by C 0 displaystyle mathbf C 0 nbsp the mapping from the coordinates in the X1 X2 X3 system to homogeneous image coordinates becomes y C 0 x I 0 R t 0 1 x R t x displaystyle mathbf y sim mathbf C 0 mathbf x left begin array c c mathbf I amp mathbf 0 end array right left begin array c c mathbf R amp mathbf t hline mathbf 0 amp 1 end array right mathbf x left begin array c c mathbf R amp mathbf t end array right mathbf x nbsp Consequently the camera matrix which relates points in the coordinate system X1 X2 X3 to image coordinates is C N R t displaystyle mathbf C N left begin array c c mathbf R amp mathbf t end array right nbsp a concatenation of a 3D rotation matrix and a 3 dimensional translation vector This type of camera matrix is referred to as a normalized camera matrix it assumes focal length 1 and that image coordinates are measured in a coordinate system where the origin is located at the intersection between axis X3 and the image plane and has the same units as the 3D coordinate system The resulting image coordinates are referred to as normalized image coordinates The camera position edit Again the null space of the normalized camera matrix C N displaystyle mathbf C N nbsp described above is spanned by the 4 dimensional vector n R 1 t 1 n 1 displaystyle mathbf n begin pmatrix mathbf R 1 mathbf t 1 end pmatrix begin pmatrix tilde mathbf n 1 end pmatrix nbsp This is also again the coordinates of the camera center now relative to the X1 X2 X3 system This can be seen by applying first the rotation and then the translation to the 3 dimensional vector n displaystyle tilde mathbf n nbsp and the result is the homogeneous representation of 3D coordinates 0 0 0 This implies that the camera center in its homogeneous representation lies in the null space of the camera matrix provided that it is represented in terms of 3D coordinates relative to the same coordinate system as the camera matrix refers to The normalized camera matrix C N displaystyle mathbf C N nbsp can now be written as C N R I R 1 t R I n displaystyle mathbf C N mathbf R left begin array c c mathbf I amp mathbf R 1 mathbf t end array right mathbf R left begin array c c mathbf I amp tilde mathbf n end array right nbsp where n displaystyle tilde mathbf n nbsp is the 3D coordinates of the camera relative to the X1 X2 X3 system General camera matrix editGiven the mapping produced by a normalized camera matrix the resulting normalized image coordinates can be transformed by means of an arbitrary 2D homography This includes 2D translations and rotations as well as scaling isotropic and anisotropic but also general 2D perspective transformations Such a transformation can be represented as a 3 3 displaystyle 3 times 3 nbsp matrix H displaystyle mathbf H nbsp which maps the homogeneous normalized image coordinates y displaystyle mathbf y nbsp to the homogeneous transformed image coordinates y displaystyle mathbf y nbsp y H y displaystyle mathbf y mathbf H mathbf y nbsp Inserting the above expression for the normalized image coordinates in terms of the 3D coordinates gives y H C N x displaystyle mathbf y mathbf H mathbf C N mathbf x nbsp This produces the most general form of camera matrix C H C N H R t displaystyle mathbf C mathbf H mathbf C N mathbf H left begin array c c mathbf R amp mathbf t end array right nbsp See also edit3D projection Camera resectioningReferences editRichard Hartley and Andrew Zisserman 2003 Multiple View Geometry in computer vision Cambridge University Press ISBN 0 521 54051 8 Retrieved from https en wikipedia org w index php title Camera matrix amp oldid 1162278810, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.