fbpx
Wikipedia

Articulated body pose estimation

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.[1][2]

Description edit

Perception of human beings in their neighboring environment is an important capability that robots must possess. If a person uses gestures to point to a particular object, then the interacting machine should be able to understand the situation in real world context. Thus pose estimation is an important and challenging problem in computer vision, and many algorithms have been deployed in solving this problem over the last two decades. Many solutions involve training complex models with large data sets.

Pose estimation is a difficult problem and an active subject of research because the human body has 244 degrees of freedom with 230 joints. Although not all movements between joints are evident, the human body is composed of 10 large parts with 20 degrees of freedom. Algorithms must account for large variability introduced by differences in appearance due to clothing, body shape, size, and hairstyles. Additionally, the results may be ambiguous due to partial occlusions from self-articulation, such as a person's hand covering their face, or occlusions from external objects. Finally, most algorithms estimate pose from monocular (two-dimensional) images, taken from a normal camera. Other issues include varying lighting and camera configurations. The difficulties are compounded if there are additional performance requirements. These images lack the three-dimensional information of an actual body pose, leading to further ambiguities. There is recent work in this area wherein images from RGBD cameras provide information about color and depth.[3]

Sensors edit

The typical articulated body pose estimation system involves a model-based approach, in which the pose estimation is achieved by maximizing/minimizing a similarity/dissimilarity between an observation (input) and a template model. Different kinds of sensors have been explored for use in making the observation, including the following:

These sensors produce intermediate representations that are directly used by the model. The representations include the following:

  • Image appearance,
  • Voxel (volume element) reconstruction,
  • 3D point clouds, and sum of Gaussian kernels[5]
  • 3D surface meshes.

Classical models edit

Part models edit

The basic idea of part based model can be attributed to the human skeleton. Any object having the property of articulation can be broken down into smaller parts wherein each part can take different orientations, resulting in different articulations of the same object. Different scales and orientations of the main object can be articulated to scales and orientations of the corresponding parts. To formulate the model so that it can be represented in mathematical terms, the parts are connected to each other using springs. As such, the model is also known as a spring model. The degree of closeness between each part is accounted for by the compression and expansion of the springs. There is geometric constraint on the orientation of springs. For example, limbs of legs cannot move 360 degrees. Hence parts cannot have that extreme orientation. This reduces the possible permutations.[6]

The spring model forms a graph G(V,E) where V (nodes) corresponds to the parts and E (edges) represents springs connecting two neighboring parts. Each location in the image can be reached by the   and   coordinates of the pixel location. Let   be point at   location. Then the cost associated in joining the spring between   and the   point can be given by  . Hence the total cost associated in placing   components at locations   is given by

 

The above equation simply represents the spring model used to describe body pose. To estimate pose from images, cost or energy function must be minimized. This energy function consists of two terms. The first is related to how each component matches the image data and the second deals with how much the oriented (deformed) parts match, thus accounting for articulation along with object detection.[7]

The part models, also known as pictorial structures, are of one of the basic models on which other efficient models are built by slight modification. One such example is the flexible mixture model which reduces the database of hundreds or thousands of deformed parts by exploiting the notion of local rigidity.[8]

Articulated model with quaternion edit

The kinematic skeleton is constructed by a tree-structured chain.[9] Each rigid body segment has its local coordinate system that can be transformed to the world coordinate system via a 4×4 transformation matrix  ,

 

where   denotes the local transformation from body segment   to its parent  . Each joint in the body has 3 degrees of freedom (DoF) rotation. Given a transformation matrix   , the joint position at the T-pose can be transferred to its corresponding position in the world coordination. In many works, the 3D joint rotation is expressed as a normalized quaternion   due to its continuity that can facilitate gradient-based optimization in the parameter estimation.

Deep learning based models edit

Since about 2016, deep learning has emerged as the dominant method for performing accurate articulated body pose estimation. Rather than building an explicit model for the parts as above, the appearances of the joints and relationships between the joints of the body are learned from large training sets. Models generally focus on extracting the 2D positions of joints (keypoints), the 3D positions of joints, or the 3D shape of the body from either a single or multiple images.

Supervised edit

2D joint positions edit

The first deep learning models that emerged focused on extracting the 2D positions of human joints in an image. Such models take in an image and pass it through a convolutional neural network to obtain a series of heatmaps (one for each joint) which take on high values where joints are detected.[10][11]

When there are multiple people per image, two main techniques have emerged for grouping joints within each person. In the first, "bottom-up" approach, the neural network is trained to also generate "part affinity fields" which indicate the location of limbs. Using these fields, joints can be grouped limb by limb by solving a series of assignment problems.[11] In the second, "top-down" approach, an additional network is used to first detect people in the image and then the pose estimation network is applied to each image.[12]

3D joint positions edit

With the advent of multiple datasets with human pose annotated in multiple views,[13][14] models which detect 3D joint positions became more popular. These again fell into two categories In the first, a neural network is used to detect 2D joint positions from each view and these detections are then triangulated to obtain 3D joint positions.[15] The 2D network may be refined to produce better detections based on the 3D data.[16] Furthermore, such approaches often have filters in both 2D and 3D to refine the detected points.[17][18] In the second, a neural network is trained end-to-end to predict 3D joint positions directly from a set of images, without 2D joint position intermediate detections. Such approaches often project image features into a cube and then use a 3D convolutional neural network to predict a 3D heatmap for each joint.[19][16][20]

3D shape edit

Concurrently with the work above, scientists have been working on estimating the full 3D shape of a human or animal from a set of images. Most of the work is based on estimating the appropriate pose of the skinned multi-person linear (SMPL) model[21] within an image. Variants of the SMPL model for other animals have also been developed.[22][23][24] Generally, some keypoints and a silhouette are detected for each animal within the image, and then the parameters 3D shape model are fit to match the position of keypoints and silhouette.

Unsupervised edit

The above algorithms all rely on annotated images, which can be time-consuming to produce. To address this issue, computer vision researchers have developed new algorithms which can learn 3D keypoints given only annotated 2D images from a single view or identify keypoints given videos without any annotations.

Applications edit

Assisted living edit

Personal care robots may be deployed in future assisted living homes. For these robots, high-accuracy human detection and pose estimation is necessary to perform a variety of tasks, such as fall detection. Additionally, this application has a number of performance constraints. [citation needed]

Character animation edit

Traditionally, character animation has been a manual process. However, poses can be synced directly to a real-life actor through specialized pose estimation systems. Older systems relied on markers or specialized suits. Recent advances in pose estimation and motion capture have enabled markerless applications, sometimes in real time.[25]

Intelligent driver assisting system edit

Car accidents account for about two percent of deaths globally each year. As such, an intelligent system tracking driver pose may be useful for emergency alerts [dubious ]. Along the same lines, pedestrian detection algorithms have been used successfully in autonomous cars, enabling the car to make smarter decisions. [citation needed]

Video games edit

Commercially, pose estimation has been used in the context of video games, popularized with the Microsoft Kinect sensor (a depth camera). These systems track the user to render their avatar in-game, in addition to performing tasks like gesture recognition to enable the user to interact with the game. As such, this application has a strict real-time requirement.[26]

Medical Applications edit

Pose estimation has been used to detect postural issues such as scoliosis by analyzing abnormalities in a patient's posture,[27] physical therapy, and the study of the cognitive brain development of young children by monitoring motor functionality.[28]

Other applications edit

Other applications include video surveillance, animal tracking and behavior understanding, sign language detection, advanced human–computer interaction, and markerless motion capturing.

Related technology edit

A commercially successful but specialized computer vision-based articulated body pose estimation technique is optical motion capture. This approach involves placing markers on the individual at strategic locations to capture the 6 degrees-of-freedom of each body part.

Research groups edit

A number of groups and companies are researching pose estimation, including groups at Brown University, Carnegie Mellon University, MPI Saarbruecken, Stanford University, the University of California, San Diego, the University of Toronto, the École Centrale Paris, ETH Zurich, National University of Sciences and Technology (NUST),[29] the University of California, Irvine and Polytechnic University of Catalonia.

Companies edit

At present, several companies are working on articulated body pose estimation.

  • Bodylabs: Bodylabs is a Manhattan-based software provider of human-aware artificial intelligence.

References edit

  1. ^ Moeslund, Thomas B.; Granum, Erik (2001-03-01). "A Survey of Computer Vision-Based Human Motion Capture". Computer Vision and Image Understanding. 81 (3): 231–268. doi:10.1006/cviu.2000.0897. ISSN 1077-3142.
  2. ^ . Archived from the original on 2008-03-02. Retrieved 2007-09-15.
  3. ^ Droeschel, David, and Sven Behnke. "3D body pose estimation using an adaptive person model for articulated ICP." Intelligent Robotics and Applications. Springer Berlin Heidelberg, 2011. 157167.
  4. ^ Han, J.; Gaszczak, A.; Maciol, R.; Barnes, S.E.; Breckon, T.P. (September 2013). "Human Pose Classification within the Context of Near-IR Imagery Tracking" (PDF). In Zamboni, Roberto; Kajzar, Francois; Szep, Attila A.; Burgess, Douglas; Owen, Gari (eds.). Proc. SPIE Optics and Photonics for Counterterrorism, Crime Fighting and Defence. Optics and Photonics for Counterterrorism, Crime Fighting and Defence IX; and Optical Materials and Biomaterials in Security and Defence Systems Technology X. Vol. 8901. SPIE. pp. 89010E. CiteSeerX 10.1.1.391.380. doi:10.1117/12.2028375. S2CID 17034080. Retrieved 5 November 2013.
  5. ^ M. Ding and G. Fan, "Generalized Sum of Gaussians for Real-Time Human Pose Tracking from a Single Depth Sensor" 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), Jan 2015
  6. ^ Fischler, Martin A., and Robert A. Elschlager. "The representation and matching of pictorial structures." IEEE Transactions on computers 1 (1973): 6792.
  7. ^ Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Pictorial structures for object recognition." International Journal of Computer Vision 61.1 (2005): 5579.
  8. ^ Yang, Yi, and Deva Ramanan. "Articulated pose estimation with flexible mixtures-of-parts." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
  9. ^ M. Ding and G. Fan, "Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation" IEEE Transactions on Image Processing, Vol. 25, No. 2, Feb 2016
  10. ^ Insafutdinov, Eldar; Pishchulin, Leonid; Andres, Bjoern; Andriluka, Mykhaylo; Schiele, Bernt (2016), "DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model", Computer Vision – ECCV 2016, Lecture Notes in Computer Science, vol. 9910, Cham: Springer International Publishing, pp. 34–50, arXiv:1605.03170, doi:10.1007/978-3-319-46466-4_3, ISBN 978-3-319-46465-7, S2CID 6736694, retrieved 2021-06-30
  11. ^ a b Cao, Zhe; Simon, Tomas; Wei, Shih-En; Sheikh, Yaser (July 2017). "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields". 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 1302–1310. arXiv:1611.08050. doi:10.1109/cvpr.2017.143. ISBN 978-1-5386-0457-1. S2CID 16224674.
  12. ^ Fang, Hao-Shu; Xie, Shuqin; Tai, Yu-Wing; Lu, Cewu (October 2017). "RMPE: Regional Multi-person Pose Estimation". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 2353–2362. arXiv:1612.00137. doi:10.1109/iccv.2017.256. ISBN 978-1-5386-1032-9. S2CID 6529517.
  13. ^ Ionescu, Catalin; Papava, Dragos; Olaru, Vlad; Sminchisescu, Cristian (July 2014). "Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments". IEEE Transactions on Pattern Analysis and Machine Intelligence. 36 (7): 1325–1339. doi:10.1109/tpami.2013.248. ISSN 0162-8828. PMID 26353306. S2CID 4244548.
  14. ^ Sigal, Leonid; Balan, Alexandru O.; Black, Michael J. (2009-08-05). "HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion". International Journal of Computer Vision. 87 (1–2): 4–27. doi:10.1007/s11263-009-0273-6. ISSN 0920-5691. S2CID 11279201.
  15. ^ Nath, Tanmay; Mathis, Alexander; Chen, An Chi; Patel, Amir; Bethge, Matthias; Mathis, Mackenzie Weygandt (2018-11-24). "Using DeepLabCut for 3D markerless pose estimation across species and behaviors". bioRxiv: 476531. doi:10.1101/476531. S2CID 92206469. Retrieved 2021-06-30.
  16. ^ a b Iskakov, Karim; Burkov, Egor; Lempitsky, Victor; Malkov, Yury (October 2019). "Learnable Triangulation of Human Pose". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. pp. 7717–7726. arXiv:1905.05754. doi:10.1109/iccv.2019.00781. ISBN 978-1-7281-4803-8. S2CID 153312868.
  17. ^ Karashchuk, Pierre; Rupp, Katie L.; Dickinson, Evyn S.; Sanders, Elischa; Azim, Eiman; Brunton, Bingni W.; Tuthill, John C. (2020-05-29). "Anipose: a toolkit for robust markerless 3D pose estimation". bioRxiv. 36 (13). doi:10.1101/2020.05.26.117325. PMC 8498918. PMID 34592148. S2CID 219167984.
  18. ^ Günel, Semih; Rhodin, Helge; Morales, Daniel; Campagnolo, João; Ramdya, Pavan; Fua, Pascal (2019-10-04). O'Leary, Timothy; Calabrese, Ronald L; Shaevitz, Josh W (eds.). "DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila". eLife. 8: e48571. doi:10.7554/eLife.48571. ISSN 2050-084X. PMC 6828327. PMID 31584428.
  19. ^ Dunn, Timothy W.; Marshall, Jesse D.; Severson, Kyle S.; Aldarondo, Diego E.; Hildebrand, David G. C.; Chettih, Selmaan N.; Wang, William L.; Gellis, Amanda J.; Carlson, David E.; Aronov, Dmitriy; Freiwald, Winrich A. (2021-04-19). "Geometric deep learning enables 3D kinematic profiling across species and environments". Nature Methods. 18 (5): 564–573. doi:10.1038/s41592-021-01106-6. ISSN 1548-7091. PMC 8530226. PMID 33875887. S2CID 233310558.
  20. ^ Zimmermann, Christian; Schneider, Artur; Alyahyay, Mansour; Brox, Thomas; Diester, Ilka (2020-02-27). "FreiPose: A Deep Learning Framework for Precise Animal Motion Capture in 3D Spaces". bioRxiv. doi:10.1101/2020.02.27.967620. S2CID 213583372. Retrieved 2021-06-30.
  21. ^ Loper, Matthew; Mahmood, Naureen; Romero, Javier; Pons-Moll, Gerard; Black, Michael J. (2015-11-04). "SMPL". ACM Transactions on Graphics. 34 (6): 1–16. doi:10.1145/2816795.2818013. ISSN 0730-0301. S2CID 229365481.
  22. ^ Badger, Marc; Wang, Yufu; Modh, Adarsh; Perkes, Ammon; Kolotouros, Nikos; Pfrommer, Bernd G.; Schmidt, Marc F.; Daniilidis, Kostas (2020), "3D Bird Reconstruction: A Dataset, Model, and Shape Recovery from a Single View", Computer Vision – ECCV 2020, Lecture Notes in Computer Science, vol. 12363, Cham: Springer International Publishing, pp. 1–17, arXiv:2008.06133, doi:10.1007/978-3-030-58523-5_1, ISBN 978-3-030-58522-8, PMC 9273110, PMID 35822859, S2CID 221135758, retrieved 2021-06-30
  23. ^ Zuffi, Silvia; Kanazawa, Angjoo; Black, Michael J. (June 2018). "Lions and Tigers and Bears: Capturing Non-rigid, 3D, Articulated Shape from Images". 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE. pp. 3955–3963. doi:10.1109/cvpr.2018.00416. ISBN 978-1-5386-6420-9. S2CID 46907802.
  24. ^ Biggs, Benjamin; Roddick, Thomas; Fitzgibbon, Andrew; Cipolla, Roberto (2019), "Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video", Computer Vision – ACCV 2018, Lecture Notes in Computer Science, vol. 11365, Cham: Springer International Publishing, pp. 3–19, arXiv:1811.05804, doi:10.1007/978-3-030-20873-8_1, ISBN 978-3-030-20872-1, S2CID 53305772, retrieved 2021-06-30
  25. ^ Dent, Steven. "What you need to know about 3D motion capture". Engadget. AOL Inc. Retrieved 31 May 2017.
  26. ^ Kohli, Pushmeet; Shotton, Jamie. "Key Developments in Human Pose Estimation for Kinect" (PDF). Microsoft. Retrieved 31 May 2017.
  27. ^ Aroeira, Rozilene Maria C., Estevam B. de Las Casas, Antônio Eustáquio M. Pertence, Marcelo Greco, and João Manuel R.S. Tavares. “Non-Invasive Methods of Computer Vision in the Posture Evaluation of Adolescent Idiopathic Scoliosis.” Journal of Bodywork and Movement Therapies 20, no. 4 (October 2016): 832–43. https://doi.org/10.1016/j.jbmt.2016.02.004.
  28. ^ Khan, Muhammad Hassan, Julien Helsper, Muhammad Shahid Farid, and Marcin Grzegorzek. “A Computer Vision-Based System for Monitoring Vojta Therapy.” International Journal of Medical Informatics 113 (May 2018): 85–95. https://doi.org/10.1016/j.ijmedinf.2018.02.010.
  29. ^ "NUST-SMME RISE Research Center".

External links edit

  • Michael J. Black, Professor at Brown University
  • Research Project Page of German Cheung at Carnegie Mellon University
  • Computer Vision and Robotics Research Laboratory at the University of California, San Diego
  • Research Projects of David J. Fleet at the University of Toronto
  • Ronald Poppe at the University of Twente.
  • Professor Nikos Paragios at the Ecole Centrale de Paris
  • Articulated Pose Estimation with Flexible Mixtures of Parts Project at UC Irvine
  • http://screenrant.com/crazy3dtechnologyjamescameronavatarkofi3367/
  • 2D articulated human pose estimation software
  • Articulated Pose Estimation with Flexible Mixtures of Parts

articulated, body, pose, estimation, this, article, technical, most, readers, understand, please, help, improve, make, understandable, experts, without, removing, technical, details, june, 2018, learn, when, remove, this, template, message, computer, vision, s. This article may be too technical for most readers to understand Please help improve it to make it understandable to non experts without removing the technical details June 2018 Learn how and when to remove this template message Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body which consists of joints and rigid parts using image based observations It is one of the longest lasting problems in computer vision because of the complexity of the models that relate observation with pose and because of the variety of situations in which it would be useful 1 2 Contents 1 Description 2 Sensors 3 Classical models 3 1 Part models 3 2 Articulated model with quaternion 4 Deep learning based models 4 1 Supervised 4 1 1 2D joint positions 4 1 2 3D joint positions 4 1 3 3D shape 4 2 Unsupervised 5 Applications 5 1 Assisted living 5 2 Character animation 5 3 Intelligent driver assisting system 5 4 Video games 5 5 Medical Applications 5 6 Other applications 6 Related technology 7 Research groups 8 Companies 9 References 10 External linksDescription editPerception of human beings in their neighboring environment is an important capability that robots must possess If a person uses gestures to point to a particular object then the interacting machine should be able to understand the situation in real world context Thus pose estimation is an important and challenging problem in computer vision and many algorithms have been deployed in solving this problem over the last two decades Many solutions involve training complex models with large data sets Pose estimation is a difficult problem and an active subject of research because the human body has 244 degrees of freedom with 230 joints Although not all movements between joints are evident the human body is composed of 10 large parts with 20 degrees of freedom Algorithms must account for large variability introduced by differences in appearance due to clothing body shape size and hairstyles Additionally the results may be ambiguous due to partial occlusions from self articulation such as a person s hand covering their face or occlusions from external objects Finally most algorithms estimate pose from monocular two dimensional images taken from a normal camera Other issues include varying lighting and camera configurations The difficulties are compounded if there are additional performance requirements These images lack the three dimensional information of an actual body pose leading to further ambiguities There is recent work in this area wherein images from RGBD cameras provide information about color and depth 3 Sensors editThe typical articulated body pose estimation system involves a model based approach in which the pose estimation is achieved by maximizing minimizing a similarity dissimilarity between an observation input and a template model Different kinds of sensors have been explored for use in making the observation including the following Visible wavelength imagery Long wave thermal infrared imagery 4 Time of flight imagery and Laser range scanner imagery These sensors produce intermediate representations that are directly used by the model The representations include the following Image appearance Voxel volume element reconstruction 3D point clouds and sum of Gaussian kernels 5 3D surface meshes Classical models editPart models edit The basic idea of part based model can be attributed to the human skeleton Any object having the property of articulation can be broken down into smaller parts wherein each part can take different orientations resulting in different articulations of the same object Different scales and orientations of the main object can be articulated to scales and orientations of the corresponding parts To formulate the model so that it can be represented in mathematical terms the parts are connected to each other using springs As such the model is also known as a spring model The degree of closeness between each part is accounted for by the compression and expansion of the springs There is geometric constraint on the orientation of springs For example limbs of legs cannot move 360 degrees Hence parts cannot have that extreme orientation This reduces the possible permutations 6 The spring model forms a graph G V E where V nodes corresponds to the parts and E edges represents springs connecting two neighboring parts Each location in the image can be reached by the x displaystyle x nbsp and y displaystyle y nbsp coordinates of the pixel location Let pi x y displaystyle mathbf p i x y nbsp be point at ith displaystyle mathbf i th nbsp location Then the cost associated in joining the spring between ith displaystyle mathbf i th nbsp and the jth displaystyle mathbf j th nbsp point can be given by S pi pj S pi pj displaystyle S mathbf p i mathbf p j S mathbf p i mathbf p j nbsp Hence the total cost associated in placing l displaystyle l nbsp components at locations Pl displaystyle mathbf P l nbsp is given by S Pl i 1l j 1isij pi pj displaystyle S mathbf P l displaystyle sum i 1 l displaystyle sum j 1 i mathbf s ij mathbf p i mathbf p j nbsp The above equation simply represents the spring model used to describe body pose To estimate pose from images cost or energy function must be minimized This energy function consists of two terms The first is related to how each component matches the image data and the second deals with how much the oriented deformed parts match thus accounting for articulation along with object detection 7 The part models also known as pictorial structures are of one of the basic models on which other efficient models are built by slight modification One such example is the flexible mixture model which reduces the database of hundreds or thousands of deformed parts by exploiting the notion of local rigidity 8 Articulated model with quaternion edit The kinematic skeleton is constructed by a tree structured chain 9 Each rigid body segment has its local coordinate system that can be transformed to the world coordinate system via a 4 4 transformation matrix Tl displaystyle T l nbsp Tl Tpar l Rl displaystyle T l T operatorname par l R l nbsp where Rl displaystyle R l nbsp denotes the local transformation from body segment Sl displaystyle S l nbsp to its parent par Sl displaystyle operatorname par S l nbsp Each joint in the body has 3 degrees of freedom DoF rotation Given a transformation matrix Tl displaystyle T l nbsp the joint position at the T pose can be transferred to its corresponding position in the world coordination In many works the 3D joint rotation is expressed as a normalized quaternion x y z w displaystyle x y z w nbsp due to its continuity that can facilitate gradient based optimization in the parameter estimation Deep learning based models editSince about 2016 deep learning has emerged as the dominant method for performing accurate articulated body pose estimation Rather than building an explicit model for the parts as above the appearances of the joints and relationships between the joints of the body are learned from large training sets Models generally focus on extracting the 2D positions of joints keypoints the 3D positions of joints or the 3D shape of the body from either a single or multiple images Supervised edit 2D joint positions edit The first deep learning models that emerged focused on extracting the 2D positions of human joints in an image Such models take in an image and pass it through a convolutional neural network to obtain a series of heatmaps one for each joint which take on high values where joints are detected 10 11 When there are multiple people per image two main techniques have emerged for grouping joints within each person In the first bottom up approach the neural network is trained to also generate part affinity fields which indicate the location of limbs Using these fields joints can be grouped limb by limb by solving a series of assignment problems 11 In the second top down approach an additional network is used to first detect people in the image and then the pose estimation network is applied to each image 12 3D joint positions edit With the advent of multiple datasets with human pose annotated in multiple views 13 14 models which detect 3D joint positions became more popular These again fell into two categories In the first a neural network is used to detect 2D joint positions from each view and these detections are then triangulated to obtain 3D joint positions 15 The 2D network may be refined to produce better detections based on the 3D data 16 Furthermore such approaches often have filters in both 2D and 3D to refine the detected points 17 18 In the second a neural network is trained end to end to predict 3D joint positions directly from a set of images without 2D joint position intermediate detections Such approaches often project image features into a cube and then use a 3D convolutional neural network to predict a 3D heatmap for each joint 19 16 20 3D shape edit Concurrently with the work above scientists have been working on estimating the full 3D shape of a human or animal from a set of images Most of the work is based on estimating the appropriate pose of the skinned multi person linear SMPL model 21 within an image Variants of the SMPL model for other animals have also been developed 22 23 24 Generally some keypoints and a silhouette are detected for each animal within the image and then the parameters 3D shape model are fit to match the position of keypoints and silhouette Unsupervised edit The above algorithms all rely on annotated images which can be time consuming to produce To address this issue computer vision researchers have developed new algorithms which can learn 3D keypoints given only annotated 2D images from a single view or identify keypoints given videos without any annotations Applications editAssisted living edit Personal care robots may be deployed in future assisted living homes For these robots high accuracy human detection and pose estimation is necessary to perform a variety of tasks such as fall detection Additionally this application has a number of performance constraints citation needed Character animation edit Traditionally character animation has been a manual process However poses can be synced directly to a real life actor through specialized pose estimation systems Older systems relied on markers or specialized suits Recent advances in pose estimation and motion capture have enabled markerless applications sometimes in real time 25 Intelligent driver assisting system edit Car accidents account for about two percent of deaths globally each year As such an intelligent system tracking driver pose may be useful for emergency alerts dubious discuss Along the same lines pedestrian detection algorithms have been used successfully in autonomous cars enabling the car to make smarter decisions citation needed Video games edit Commercially pose estimation has been used in the context of video games popularized with the Microsoft Kinect sensor a depth camera These systems track the user to render their avatar in game in addition to performing tasks like gesture recognition to enable the user to interact with the game As such this application has a strict real time requirement 26 Medical Applications edit Pose estimation has been used to detect postural issues such as scoliosis by analyzing abnormalities in a patient s posture 27 physical therapy and the study of the cognitive brain development of young children by monitoring motor functionality 28 Other applications edit Other applications include video surveillance animal tracking and behavior understanding sign language detection advanced human computer interaction and markerless motion capturing Related technology editA commercially successful but specialized computer vision based articulated body pose estimation technique is optical motion capture This approach involves placing markers on the individual at strategic locations to capture the 6 degrees of freedom of each body part Research groups editA number of groups and companies are researching pose estimation including groups at Brown University Carnegie Mellon University MPI Saarbruecken Stanford University the University of California San Diego the University of Toronto the Ecole Centrale Paris ETH Zurich National University of Sciences and Technology NUST 29 the University of California Irvine and Polytechnic University of Catalonia Companies editAt present several companies are working on articulated body pose estimation Bodylabs Bodylabs is a Manhattan based software provider of human aware artificial intelligence References edit Moeslund Thomas B Granum Erik 2001 03 01 A Survey of Computer Vision Based Human Motion Capture Computer Vision and Image Understanding 81 3 231 268 doi 10 1006 cviu 2000 0897 ISSN 1077 3142 Survey of Advances in Computer Vision based Human Motion Capture 2006 Archived from the original on 2008 03 02 Retrieved 2007 09 15 Droeschel David and Sven Behnke 3D body pose estimation using an adaptive person model for articulated ICP Intelligent Robotics and Applications Springer Berlin Heidelberg 2011 157167 Han J Gaszczak A Maciol R Barnes S E Breckon T P September 2013 Human Pose Classification within the Context of Near IR Imagery Tracking PDF In Zamboni Roberto Kajzar Francois Szep Attila A Burgess Douglas Owen Gari eds Proc SPIE Optics and Photonics for Counterterrorism Crime Fighting and Defence Optics and Photonics for Counterterrorism Crime Fighting and Defence IX and Optical Materials and Biomaterials in Security and Defence Systems Technology X Vol 8901 SPIE pp 89010E CiteSeerX 10 1 1 391 380 doi 10 1117 12 2028375 S2CID 17034080 Retrieved 5 November 2013 M Ding and G Fan Generalized Sum of Gaussians for Real Time Human Pose Tracking from a Single Depth Sensor 2015 IEEE Winter Conference on Applications of Computer Vision WACV Jan 2015 Fischler Martin A and Robert A Elschlager The representation and matching of pictorial structures IEEE Transactions on computers 1 1973 6792 Felzenszwalb Pedro F and Daniel P Huttenlocher Pictorial structures for object recognition International Journal of Computer Vision 61 1 2005 5579 Yang Yi and Deva Ramanan Articulated pose estimation with flexible mixtures of parts Computer Vision and Pattern Recognition CVPR 2011 IEEE Conference on IEEE 2011 M Ding and G Fan Articulated and Generalized Gaussian Kernel Correlation for Human Pose Estimation IEEE Transactions on Image Processing Vol 25 No 2 Feb 2016 Insafutdinov Eldar Pishchulin Leonid Andres Bjoern Andriluka Mykhaylo Schiele Bernt 2016 DeeperCut A Deeper Stronger and Faster Multi person Pose Estimation Model Computer Vision ECCV 2016 Lecture Notes in Computer Science vol 9910 Cham Springer International Publishing pp 34 50 arXiv 1605 03170 doi 10 1007 978 3 319 46466 4 3 ISBN 978 3 319 46465 7 S2CID 6736694 retrieved 2021 06 30 a b Cao Zhe Simon Tomas Wei Shih En Sheikh Yaser July 2017 Realtime Multi person 2D Pose Estimation Using Part Affinity Fields 2017 IEEE Conference on Computer Vision and Pattern Recognition CVPR IEEE pp 1302 1310 arXiv 1611 08050 doi 10 1109 cvpr 2017 143 ISBN 978 1 5386 0457 1 S2CID 16224674 Fang Hao Shu Xie Shuqin Tai Yu Wing Lu Cewu October 2017 RMPE Regional Multi person Pose Estimation 2017 IEEE International Conference on Computer Vision ICCV IEEE pp 2353 2362 arXiv 1612 00137 doi 10 1109 iccv 2017 256 ISBN 978 1 5386 1032 9 S2CID 6529517 Ionescu Catalin Papava Dragos Olaru Vlad Sminchisescu Cristian July 2014 Human3 6M Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments IEEE Transactions on Pattern Analysis and Machine Intelligence 36 7 1325 1339 doi 10 1109 tpami 2013 248 ISSN 0162 8828 PMID 26353306 S2CID 4244548 Sigal Leonid Balan Alexandru O Black Michael J 2009 08 05 HumanEva Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion International Journal of Computer Vision 87 1 2 4 27 doi 10 1007 s11263 009 0273 6 ISSN 0920 5691 S2CID 11279201 Nath Tanmay Mathis Alexander Chen An Chi Patel Amir Bethge Matthias Mathis Mackenzie Weygandt 2018 11 24 Using DeepLabCut for 3D markerless pose estimation across species and behaviors bioRxiv 476531 doi 10 1101 476531 S2CID 92206469 Retrieved 2021 06 30 a b Iskakov Karim Burkov Egor Lempitsky Victor Malkov Yury October 2019 Learnable Triangulation of Human Pose 2019 IEEE CVF International Conference on Computer Vision ICCV IEEE pp 7717 7726 arXiv 1905 05754 doi 10 1109 iccv 2019 00781 ISBN 978 1 7281 4803 8 S2CID 153312868 Karashchuk Pierre Rupp Katie L Dickinson Evyn S Sanders Elischa Azim Eiman Brunton Bingni W Tuthill John C 2020 05 29 Anipose a toolkit for robust markerless 3D pose estimation bioRxiv 36 13 doi 10 1101 2020 05 26 117325 PMC 8498918 PMID 34592148 S2CID 219167984 Gunel Semih Rhodin Helge Morales Daniel Campagnolo Joao Ramdya Pavan Fua Pascal 2019 10 04 O Leary Timothy Calabrese Ronald L Shaevitz Josh W eds DeepFly3D a deep learning based approach for 3D limb and appendage tracking in tethered adult Drosophila eLife 8 e48571 doi 10 7554 eLife 48571 ISSN 2050 084X PMC 6828327 PMID 31584428 Dunn Timothy W Marshall Jesse D Severson Kyle S Aldarondo Diego E Hildebrand David G C Chettih Selmaan N Wang William L Gellis Amanda J Carlson David E Aronov Dmitriy Freiwald Winrich A 2021 04 19 Geometric deep learning enables 3D kinematic profiling across species and environments Nature Methods 18 5 564 573 doi 10 1038 s41592 021 01106 6 ISSN 1548 7091 PMC 8530226 PMID 33875887 S2CID 233310558 Zimmermann Christian Schneider Artur Alyahyay Mansour Brox Thomas Diester Ilka 2020 02 27 FreiPose A Deep Learning Framework for Precise Animal Motion Capture in 3D Spaces bioRxiv doi 10 1101 2020 02 27 967620 S2CID 213583372 Retrieved 2021 06 30 Loper Matthew Mahmood Naureen Romero Javier Pons Moll Gerard Black Michael J 2015 11 04 SMPL ACM Transactions on Graphics 34 6 1 16 doi 10 1145 2816795 2818013 ISSN 0730 0301 S2CID 229365481 Badger Marc Wang Yufu Modh Adarsh Perkes Ammon Kolotouros Nikos Pfrommer Bernd G Schmidt Marc F Daniilidis Kostas 2020 3D Bird Reconstruction A Dataset Model and Shape Recovery from a Single View Computer Vision ECCV 2020 Lecture Notes in Computer Science vol 12363 Cham Springer International Publishing pp 1 17 arXiv 2008 06133 doi 10 1007 978 3 030 58523 5 1 ISBN 978 3 030 58522 8 PMC 9273110 PMID 35822859 S2CID 221135758 retrieved 2021 06 30 Zuffi Silvia Kanazawa Angjoo Black Michael J June 2018 Lions and Tigers and Bears Capturing Non rigid 3D Articulated Shape from Images 2018 IEEE CVF Conference on Computer Vision and Pattern Recognition IEEE pp 3955 3963 doi 10 1109 cvpr 2018 00416 ISBN 978 1 5386 6420 9 S2CID 46907802 Biggs Benjamin Roddick Thomas Fitzgibbon Andrew Cipolla Roberto 2019 Creatures Great and SMAL Recovering the Shape and Motion of Animals from Video Computer Vision ACCV 2018 Lecture Notes in Computer Science vol 11365 Cham Springer International Publishing pp 3 19 arXiv 1811 05804 doi 10 1007 978 3 030 20873 8 1 ISBN 978 3 030 20872 1 S2CID 53305772 retrieved 2021 06 30 Dent Steven What you need to know about 3D motion capture Engadget AOL Inc Retrieved 31 May 2017 Kohli Pushmeet Shotton Jamie Key Developments in Human Pose Estimation for Kinect PDF Microsoft Retrieved 31 May 2017 Aroeira Rozilene Maria C Estevam B de Las Casas Antonio Eustaquio M Pertence Marcelo Greco and Joao Manuel R S Tavares Non Invasive Methods of Computer Vision in the Posture Evaluation of Adolescent Idiopathic Scoliosis Journal of Bodywork and Movement Therapies 20 no 4 October 2016 832 43 https doi org 10 1016 j jbmt 2016 02 004 Khan Muhammad Hassan Julien Helsper Muhammad Shahid Farid and Marcin Grzegorzek A Computer Vision Based System for Monitoring Vojta Therapy International Journal of Medical Informatics 113 May 2018 85 95 https doi org 10 1016 j ijmedinf 2018 02 010 NUST SMME RISE Research Center External links editMichael J Black Professor at Brown University Research Project Page of German Cheung at Carnegie Mellon University Homepage of Dr Ing at MPI Saarbruecken Markerless Motion Capture Project at Stanford Computer Vision and Robotics Research Laboratory at the University of California San Diego Research Projects of David J Fleet at the University of Toronto Ronald Poppe at the University of Twente Professor Nikos Paragios at the Ecole Centrale de Paris Articulated Pose Estimation with Flexible Mixtures of Parts Project at UC Irvine http screenrant com crazy3dtechnologyjamescameronavatarkofi3367 2D articulated human pose estimation software Articulated Pose Estimation with Flexible Mixtures of Parts Retrieved from https en wikipedia org w index php title Articulated body pose estimation amp oldid 1188357600, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.