fbpx
Wikipedia

Bag-of-words model in computer vision

In computer vision, the bag-of-words model (BoW model) sometimes called bag-of-visual-words model [1][2] can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

Image representation based on the BoW model edit

To represent an image using the BoW model, an image can be treated as a document. Similarly, "words" in images need to be defined too. To achieve this, it usually includes following three steps: feature detection, feature description, and codebook generation. [1][2][3] A definition of the BoW model can be the "histogram representation based on independent features".[4] Content based image indexing and retrieval (CBIR) appears to be the early adopter of this image representation technique.[5]

Feature representation edit

After feature detection, each image is abstracted by several local patches. Feature representation methods deal with how to represent the patches as numerical vectors. These vectors are called feature descriptors. A good descriptor should have the ability to handle intensity, rotation, scale and affine variations to some extent. One of the most famous descriptors is the scale-invariant feature transform (SIFT).[6] SIFT converts each patch to 128-dimensional vector. After this step, each image is a collection of vectors of the same dimension (128 for SIFT), where the order of different vectors is of no importance.

Codebook generation edit

The final step for the BoW model is to convert vector-represented patches to "codewords" (analogous to words in text documents), which also produces a "codebook" (analogy to a word dictionary). A codeword can be considered as a representative of several similar patches. One simple method is performing k-means clustering over all the vectors.[7] Codewords are then defined as the centers of the learned clusters. The number of the clusters is the codebook size (analogous to the size of the word dictionary).

Thus, each patch in an image is mapped to a certain codeword through the clustering process and the image can be represented by the histogram of the codewords.

Learning and recognition based on the BoW model edit

Computer vision researchers have developed several learning methods to leverage the BoW model for image related tasks, such as object categorization. These methods can roughly be divided into two categories, unsupervised and supervised models. For multiple label categorization problem, the confusion matrix can be used as an evaluation metric.

Unsupervised models edit

Here are some notations for this section. Suppose the size of codebook is  .

  •  : each patch   is a V-dimensional vector that has a single component equal to one and all other components equal to zero (For k-means clustering setting, the single component equal one indicates the cluster that   belongs to). The  th codeword in the codebook can be represented as   and   for  .
  •  : each image is represented by  , all the patches in an image
  •  : the  th image in an image collection
  •  : category of the image
  •  : theme or topic of the patch
  •  : mixture proportion

Since the BoW model is an analogy to the BoW model in NLP, generative models developed in text domains can also be adapted in computer vision. Simple Naïve Bayes model and hierarchical Bayesian models are discussed.

Naïve Bayes edit

The simplest one is Naïve Bayes classifier.[2] Using the language of graphical models, the Naïve Bayes classifier is described by the equation below. The basic idea (or assumption) of this model is that each category has its own distribution over the codebooks, and that the distributions of each category are observably different. Take a face category and a car category for an example. The face category may emphasize the codewords which represent "nose", "eye" and "mouth", while the car category may emphasize the codewords which represent "wheel" and "window". Given a collection of training examples, the classifier learns different distributions for different categories. The categorization decision is made by

 

Since the Naïve Bayes classifier is simple yet effective, it is usually used as a baseline method for comparison.

Hierarchical Bayesian models edit

The basic assumption of Naïve Bayes model does not hold sometimes. For example, a natural scene image may contain several different themes. Probabilistic latent semantic analysis (pLSA)[8][9] and latent Dirichlet allocation (LDA)[10] are two popular topic models from text domains to tackle the similar multiple "theme" problem. Take LDA for an example. To model natural scene images using LDA, an analogy is made with document analysis:

  • the image category is mapped to the document category;
  • the mixture proportion of themes maps the mixture proportion of topics;
  • the theme index is mapped to topic index;
  • the codeword is mapped to the word.

This method shows very promising results in natural scene categorization on 13 Natural Scene Categories.[3]

Supervised models edit

Since images are represented based on the BoW model, any discriminative model suitable for text document categorization can be tried, such as support vector machine (SVM)[2] and AdaBoost.[11] Kernel trick is also applicable when kernel based classifier is used, such as SVM. Pyramid match kernel is newly developed one based on the BoW model. The local feature approach of using BoW model representation learnt by machine learning classifiers with different kernels (e.g., EMD-kernel and   kernel) has been vastly tested in the area of texture and object recognition.[12] Very promising results on a number of datasets have been reported. This approach[12] has achieved very impressive results in .

Pyramid match kernel edit

Pyramid match kernel[13] is a fast algorithm (linear complexity instead of classic one in quadratic complexity) kernel function (satisfying Mercer's condition) which maps the BoW features, or set of features in high dimension, to multi-dimensional multi-resolution histograms. An advantage of these multi-resolution histograms is their ability to capture co-occurring features. The pyramid match kernel builds multi-resolution histograms by binning data points into discrete regions of increasing size. Thus, points that do not match at high resolutions have the chance to match at low resolutions. The pyramid match kernel performs an approximate similarity match, without explicit search or computation of distance. Instead, it intersects the histograms to approximate the optimal match. Accordingly, the computation time is only linear in the number of features. Compared with other kernel approaches, the pyramid match kernel is much faster, yet provides equivalent accuracy. The pyramid match kernel was applied to and with promising results.[13][14]

Limitations and recent developments edit

One of the notorious disadvantages of BoW is that it ignores the spatial relationships among the patches, which are very important in image representation. Researchers have proposed several methods to incorporate the spatial information. For feature level improvements, correlogram features can capture spatial co-occurrences of features.[15] For generative models, relative positions[16][17] of codewords are also taken into account. The hierarchical shape and appearance model for human action[18] introduces a new part layer (Constellation model) between the mixture proportion and the BoW features, which captures the spatial relationships among parts in the layer. For discriminative models, spatial pyramid match[19] performs pyramid matching by partitioning the image into increasingly fine sub-regions and compute histograms of local features inside each sub-region. Recently, an augmentation of local image descriptors (i.e. SIFT) by their spatial coordinates normalised by the image width and height have proved to be a robust and simple Spatial Coordinate Coding[20][21] approach which introduces spatial information to the BoW model.

The BoW model has not been extensively tested yet for view point invariance and scale invariance, and the performance is unclear. Also the BoW model for object segmentation and localization is not well understood.[4]

A systematic comparison of classification pipelines found that the encoding of first and second order statistics (Vector of Locally Aggregated Descriptors (VLAD)[22] and Fisher Vector (FV)) considerably increased classification accuracy compared to BoW, while also decreasing the codebook size, thus lowering the computational effort for codebook generation.[23] Moreover, a recent detailed comparison of coding and pooling methods[21] for BoW has showed that second order statistics combined with Sparse Coding and an appropriate pooling such as Power Normalisation can further outperform Fisher Vectors and even approach results of simple models of Convolutional Neural Network on some object recognition datasets such as Oxford Flower Dataset 102.

See also edit

References edit

  1. ^ a b Video Google: A Text Retrieval Approach to Object Matching in Videos. 13-16 October 2003. 2003.
  2. ^ a b c d G. Csurka; C. Dance; L.X. Fan; J. Willamowski & C. Bray (2004). "Visual categorization with bags of keypoints". Proc. of ECCV International Workshop on Statistical Learning in Computer Vision.
  3. ^ a b Fei-Fei Li; Perona, P. (2005). "A Bayesian Hierarchical Model for Learning Natural Scene Categories". 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 2. pp. 524–531. doi:10.1109/CVPR.2005.16. ISBN 978-0-7695-2372-9. S2CID 6387937.
  4. ^ a b L. Fei-Fei; R. Fergus & A. Torralba. "Recognizing and Learning Object Categories, CVPR 2007 short course".
  5. ^ Qiu, G. (2002). "Indexing chromatic and achromatic patterns for content-based colour image retrieval" (PDF). Pattern Recognition. 35 (8): 1675–1686. Bibcode:2002PatRe..35.1675Q. doi:10.1016/S0031-3203(01)00162-5.
  6. ^ Vidal-Naquet; Ullman (1999). "Object recognition with informative features and linear classification" (PDF). Proceedings Ninth IEEE International Conference on Computer Vision. pp. 1150–1157. CiteSeerX 10.1.1.131.1283. doi:10.1109/ICCV.2003.1238356. ISBN 978-0-7695-1950-0. S2CID 15620181.
  7. ^ T. Leung; J. Malik (2001). "Representing and recognizing the visual appearance of materials using three-dimensional textons" (PDF). International Journal of Computer Vision. 43 (1): 29–44. doi:10.1023/A:1011126920638. S2CID 14915716.
  8. ^ T. Hoffman (1999). (PDF). Proc. of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Archived from the original (PDF) on 2007-07-10. Retrieved 2007-12-10.
  9. ^ Sivic, J.; Russell, B.C.; Efros, A.A.; Zisserman, A.; Freeman, W.T. (2005). (PDF). Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. p. 370. CiteSeerX 10.1.1.184.1253. doi:10.1109/ICCV.2005.77. ISBN 978-0-7695-2334-7. S2CID 206769491. Archived from the original (PDF) on 2020-01-31. Retrieved 2007-12-10.
  10. ^ D. Blei; A. Ng & M. Jordan (2003). Lafferty, John (ed.). (PDF). Journal of Machine Learning Research. 3 (4–5): 993–1022. doi:10.1162/jmlr.2003.3.4-5.993. Archived from the original (PDF) on 2008-08-22. Retrieved 2007-12-10.
  11. ^ Serre, T.; Wolf, L.; Poggio, T. (2005). (PDF). 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 2. p. 994. CiteSeerX 10.1.1.71.5276. doi:10.1109/CVPR.2005.254. ISBN 978-0-7695-2372-9. S2CID 260426. Archived from the original (PDF) on 2017-07-06. Retrieved 2007-12-10.
  12. ^ a b Jianguo Zhang; Marcin Marszałek; Svetlana Lazebnik; Cordelia Schmid (2007). "Local Features and Kernels for Classification of Texture and Object Categories: a Comprehensive Study" (PDF). International Journal of Computer Vision. 73 (2): 213–238. doi:10.1007/s11263-006-9794-4. S2CID 1486613.
  13. ^ a b Grauman, K.; Darrell, T. (2005). "The pyramid match kernel: discriminative classification with sets of image features" (PDF). Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. p. 1458. CiteSeerX 10.1.1.644.6159. doi:10.1109/ICCV.2005.239. ISBN 978-0-7695-2334-7. S2CID 13036203.
  14. ^ Jianchao Yang; Kai Yu; Yihong Gong; Huang, T. (2009). . 2009 IEEE Conference on Computer Vision and Pattern Recognition. p. 1794. doi:10.1109/CVPR.2009.5206757. ISBN 978-1-4244-3992-8. S2CID 440212. Archived from the original on 2019-03-20. Retrieved 2011-09-09.
  15. ^ Savarese, S.; Winn, J.; Criminisi, A. (2006). (PDF). 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06). Vol. 2. p. 2033. CiteSeerX 10.1.1.587.8853. doi:10.1109/CVPR.2006.102. ISBN 978-0-7695-2597-6. S2CID 1457124. Archived from the original (PDF) on 2013-10-29. Retrieved 2007-12-10.
  16. ^ Sudderth, E.B.; Torralba, A.; Freeman, W.T.; Willsky, A.S. (2005). (PDF). Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1. p. 1331. CiteSeerX 10.1.1.128.7259. doi:10.1109/ICCV.2005.137. ISBN 978-0-7695-2334-7. S2CID 6153430. Archived from the original (PDF) on 2019-02-03. Retrieved 2007-12-10.
  17. ^ E. Sudderth; A. Torralba; W. Freeman & A. Willsky (2005). "Describing Visual Scenes using Transformed Dirichlet Processes" (PDF). Proc. of Neural Information Processing Systems.
  18. ^ Niebles, Juan Carlos; Li Fei-Fei (2007). "A Hierarchical Model of Shape and Appearance for Human Action Classification" (PDF). 2007 IEEE Conference on Computer Vision and Pattern Recognition. p. 1. CiteSeerX 10.1.1.173.2667. doi:10.1109/CVPR.2007.383132. ISBN 978-1-4244-1179-5. S2CID 9213242.
  19. ^ Lazebnik, S.; Schmid, C.; Ponce, J. (2006). (PDF). 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR'06). Vol. 2. p. 2169. CiteSeerX 10.1.1.651.9183. doi:10.1109/CVPR.2006.68. ISBN 978-0-7695-2597-6. S2CID 2421251. Archived from the original (PDF) on 2018-05-08. Retrieved 2007-12-10.
  20. ^ Koniusz, Piotr; Yan, Fei; Mikolajczyk, Krystian (2013-05-01). "Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection". Computer Vision and Image Understanding. 117 (5): 479–492. doi:10.1016/j.cviu.2012.10.010. ISSN 1077-3142.
  21. ^ a b Koniusz, Piotr; Yan, Fei; Gosselin, Philippe Henri; Mikolajczyk, Krystian (2017-02-24). "Higher-order occurrence pooling for bags-of-words: Visual concept detection" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 39 (2): 313–326. doi:10.1109/TPAMI.2016.2545667. hdl:10044/1/39814. ISSN 0162-8828. PMID 27019477.
  22. ^ Jégou, H.; Douze, M.; Schmid, C.; Pérez, P. (2010-06-01). "Aggregating local descriptors into a compact image representation". 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (PDF). pp. 3304–3311. doi:10.1109/CVPR.2010.5540039. ISBN 978-1-4244-6984-0. S2CID 1912782.
  23. ^ Seeland, Marco; Rzanny, Michael; Alaqraa, Nedal; Wäldchen, Jana; Mäder, Patrick (2017-02-24). "Plant species classification using flower images—A comparative study of local feature representations". PLOS ONE. 12 (2): e0170629. Bibcode:2017PLoSO..1270629S. doi:10.1371/journal.pone.0170629. ISSN 1932-6203. PMC 5325198. PMID 28234999.

External links edit

  • Bag of Visual Words in a Nutshell a short tutorial by Bethea Davida.
  • A demo for two bag-of-words classifiers by L. Fei-Fei, R. Fergus, and A. Torralba.
  • : a Matlab/C++ toolbox implementing Inverted File search for Bag of Words model. It also contains implementations for fast approximate nearest neighbor search using randomized k-d tree, locality-sensitive hashing, and hierarchical k-means.
  • DBoW2 library: a library that implements a fast bag of words in C++ with support for OpenCV.

words, model, computer, vision, computer, vision, words, model, model, sometimes, called, visual, words, model, applied, image, classification, retrieval, treating, image, features, words, document, classification, words, sparse, vector, occurrence, counts, wo. In computer vision the bag of words model BoW model sometimes called bag of visual words model 1 2 can be applied to image classification or retrieval by treating image features as words In document classification a bag of words is a sparse vector of occurrence counts of words that is a sparse histogram over the vocabulary In computer vision a bag of visual words is a vector of occurrence counts of a vocabulary of local image features Contents 1 Image representation based on the BoW model 1 1 Feature representation 1 2 Codebook generation 2 Learning and recognition based on the BoW model 2 1 Unsupervised models 2 1 1 Naive Bayes 2 1 2 Hierarchical Bayesian models 2 2 Supervised models 2 2 1 Pyramid match kernel 3 Limitations and recent developments 4 See also 5 References 6 External linksImage representation based on the BoW model editTo represent an image using the BoW model an image can be treated as a document Similarly words in images need to be defined too To achieve this it usually includes following three steps feature detection feature description and codebook generation 1 2 3 A definition of the BoW model can be the histogram representation based on independent features 4 Content based image indexing and retrieval CBIR appears to be the early adopter of this image representation technique 5 Feature representation edit After feature detection each image is abstracted by several local patches Feature representation methods deal with how to represent the patches as numerical vectors These vectors are called feature descriptors A good descriptor should have the ability to handle intensity rotation scale and affine variations to some extent One of the most famous descriptors is the scale invariant feature transform SIFT 6 SIFT converts each patch to 128 dimensional vector After this step each image is a collection of vectors of the same dimension 128 for SIFT where the order of different vectors is of no importance Codebook generation edit The final step for the BoW model is to convert vector represented patches to codewords analogous to words in text documents which also produces a codebook analogy to a word dictionary A codeword can be considered as a representative of several similar patches One simple method is performing k means clustering over all the vectors 7 Codewords are then defined as the centers of the learned clusters The number of the clusters is the codebook size analogous to the size of the word dictionary Thus each patch in an image is mapped to a certain codeword through the clustering process and the image can be represented by the histogram of the codewords Learning and recognition based on the BoW model editComputer vision researchers have developed several learning methods to leverage the BoW model for image related tasks such as object categorization These methods can roughly be divided into two categories unsupervised and supervised models For multiple label categorization problem the confusion matrix can be used as an evaluation metric Unsupervised models edit Here are some notations for this section Suppose the size of codebook is V displaystyle V nbsp w displaystyle w nbsp each patch w displaystyle w nbsp is a V dimensional vector that has a single component equal to one and all other components equal to zero For k means clustering setting the single component equal one indicates the cluster that w displaystyle w nbsp belongs to The v displaystyle v nbsp th codeword in the codebook can be represented as w v 1 displaystyle w v 1 nbsp and w u 0 displaystyle w u 0 nbsp for u v displaystyle u neq v nbsp w displaystyle mathbf w nbsp each image is represented by w w 1 w 2 w N displaystyle mathbf w w 1 w 2 cdots w N nbsp all the patches in an image d j displaystyle d j nbsp the j displaystyle j nbsp th image in an image collection c displaystyle c nbsp category of the image z displaystyle z nbsp theme or topic of the patch p displaystyle pi nbsp mixture proportion Since the BoW model is an analogy to the BoW model in NLP generative models developed in text domains can also be adapted in computer vision Simple Naive Bayes model and hierarchical Bayesian models are discussed Naive Bayes edit The simplest one is Naive Bayes classifier 2 Using the language of graphical models the Naive Bayes classifier is described by the equation below The basic idea or assumption of this model is that each category has its own distribution over the codebooks and that the distributions of each category are observably different Take a face category and a car category for an example The face category may emphasize the codewords which represent nose eye and mouth while the car category may emphasize the codewords which represent wheel and window Given a collection of training examples the classifier learns different distributions for different categories The categorization decision is made by c arg max c p c w arg max c p c p w c arg max c p c n 1 N p w n c displaystyle c arg max c p c mathbf w arg max c p c p mathbf w c arg max c p c prod n 1 N p w n c nbsp Since the Naive Bayes classifier is simple yet effective it is usually used as a baseline method for comparison Hierarchical Bayesian models edit The basic assumption of Naive Bayes model does not hold sometimes For example a natural scene image may contain several different themes Probabilistic latent semantic analysis pLSA 8 9 and latent Dirichlet allocation LDA 10 are two popular topic models from text domains to tackle the similar multiple theme problem Take LDA for an example To model natural scene images using LDA an analogy is made with document analysis the image category is mapped to the document category the mixture proportion of themes maps the mixture proportion of topics the theme index is mapped to topic index the codeword is mapped to the word This method shows very promising results in natural scene categorization on 13 Natural Scene Categories 3 Supervised models edit Since images are represented based on the BoW model any discriminative model suitable for text document categorization can be tried such as support vector machine SVM 2 and AdaBoost 11 Kernel trick is also applicable when kernel based classifier is used such as SVM Pyramid match kernel is newly developed one based on the BoW model The local feature approach of using BoW model representation learnt by machine learning classifiers with different kernels e g EMD kernel and X 2 displaystyle X 2 nbsp kernel has been vastly tested in the area of texture and object recognition 12 Very promising results on a number of datasets have been reported This approach 12 has achieved very impressive results in the PASCAL Visual Object Classes Challenge Pyramid match kernel edit Pyramid match kernel 13 is a fast algorithm linear complexity instead of classic one in quadratic complexity kernel function satisfying Mercer s condition which maps the BoW features or set of features in high dimension to multi dimensional multi resolution histograms An advantage of these multi resolution histograms is their ability to capture co occurring features The pyramid match kernel builds multi resolution histograms by binning data points into discrete regions of increasing size Thus points that do not match at high resolutions have the chance to match at low resolutions The pyramid match kernel performs an approximate similarity match without explicit search or computation of distance Instead it intersects the histograms to approximate the optimal match Accordingly the computation time is only linear in the number of features Compared with other kernel approaches the pyramid match kernel is much faster yet provides equivalent accuracy The pyramid match kernel was applied to ETH 80 database and Caltech 101 database with promising results 13 14 Limitations and recent developments editOne of the notorious disadvantages of BoW is that it ignores the spatial relationships among the patches which are very important in image representation Researchers have proposed several methods to incorporate the spatial information For feature level improvements correlogram features can capture spatial co occurrences of features 15 For generative models relative positions 16 17 of codewords are also taken into account The hierarchical shape and appearance model for human action 18 introduces a new part layer Constellation model between the mixture proportion and the BoW features which captures the spatial relationships among parts in the layer For discriminative models spatial pyramid match 19 performs pyramid matching by partitioning the image into increasingly fine sub regions and compute histograms of local features inside each sub region Recently an augmentation of local image descriptors i e SIFT by their spatial coordinates normalised by the image width and height have proved to be a robust and simple Spatial Coordinate Coding 20 21 approach which introduces spatial information to the BoW model The BoW model has not been extensively tested yet for view point invariance and scale invariance and the performance is unclear Also the BoW model for object segmentation and localization is not well understood 4 A systematic comparison of classification pipelines found that the encoding of first and second order statistics Vector of Locally Aggregated Descriptors VLAD 22 and Fisher Vector FV considerably increased classification accuracy compared to BoW while also decreasing the codebook size thus lowering the computational effort for codebook generation 23 Moreover a recent detailed comparison of coding and pooling methods 21 for BoW has showed that second order statistics combined with Sparse Coding and an appropriate pooling such as Power Normalisation can further outperform Fisher Vectors and even approach results of simple models of Convolutional Neural Network on some object recognition datasets such as Oxford Flower Dataset 102 See also editPart based models Fisher Vector encoding Segmentation based object categorization Vector space model Bag of words model Feature extractionReferences edit a b Video Google A Text Retrieval Approach to Object Matching in Videos 13 16 October 2003 2003 a b c d G Csurka C Dance L X Fan J Willamowski amp C Bray 2004 Visual categorization with bags of keypoints Proc of ECCV International Workshop on Statistical Learning in Computer Vision a b Fei Fei Li Perona P 2005 A Bayesian Hierarchical Model for Learning Natural Scene Categories 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 05 Vol 2 pp 524 531 doi 10 1109 CVPR 2005 16 ISBN 978 0 7695 2372 9 S2CID 6387937 a b L Fei Fei R Fergus amp A Torralba Recognizing and Learning Object Categories CVPR 2007 short course Qiu G 2002 Indexing chromatic and achromatic patterns for content based colour image retrieval PDF Pattern Recognition 35 8 1675 1686 Bibcode 2002PatRe 35 1675Q doi 10 1016 S0031 3203 01 00162 5 Vidal Naquet Ullman 1999 Object recognition with informative features and linear classification PDF Proceedings Ninth IEEE International Conference on Computer Vision pp 1150 1157 CiteSeerX 10 1 1 131 1283 doi 10 1109 ICCV 2003 1238356 ISBN 978 0 7695 1950 0 S2CID 15620181 T Leung J Malik 2001 Representing and recognizing the visual appearance of materials using three dimensional textons PDF International Journal of Computer Vision 43 1 29 44 doi 10 1023 A 1011126920638 S2CID 14915716 T Hoffman 1999 Probabilistic Latent Semantic Analysis PDF Proc of the Fifteenth Conference on Uncertainty in Artificial Intelligence Archived from the original PDF on 2007 07 10 Retrieved 2007 12 10 Sivic J Russell B C Efros A A Zisserman A Freeman W T 2005 Discovering objects and their location in images PDF Tenth IEEE International Conference on Computer Vision ICCV 05 Volume 1 p 370 CiteSeerX 10 1 1 184 1253 doi 10 1109 ICCV 2005 77 ISBN 978 0 7695 2334 7 S2CID 206769491 Archived from the original PDF on 2020 01 31 Retrieved 2007 12 10 D Blei A Ng amp M Jordan 2003 Lafferty John ed Latent Dirichlet allocation PDF Journal of Machine Learning Research 3 4 5 993 1022 doi 10 1162 jmlr 2003 3 4 5 993 Archived from the original PDF on 2008 08 22 Retrieved 2007 12 10 Serre T Wolf L Poggio T 2005 Object Recognition with Features Inspired by Visual Cortex PDF 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 05 Vol 2 p 994 CiteSeerX 10 1 1 71 5276 doi 10 1109 CVPR 2005 254 ISBN 978 0 7695 2372 9 S2CID 260426 Archived from the original PDF on 2017 07 06 Retrieved 2007 12 10 a b Jianguo Zhang Marcin Marszalek Svetlana Lazebnik Cordelia Schmid 2007 Local Features and Kernels for Classification of Texture and Object Categories a Comprehensive Study PDF International Journal of Computer Vision 73 2 213 238 doi 10 1007 s11263 006 9794 4 S2CID 1486613 a b Grauman K Darrell T 2005 The pyramid match kernel discriminative classification with sets of image features PDF Tenth IEEE International Conference on Computer Vision ICCV 05 Volume 1 p 1458 CiteSeerX 10 1 1 644 6159 doi 10 1109 ICCV 2005 239 ISBN 978 0 7695 2334 7 S2CID 13036203 Jianchao Yang Kai Yu Yihong Gong Huang T 2009 Linear spatial pyramid matching using sparse coding for image classification 2009 IEEE Conference on Computer Vision and Pattern Recognition p 1794 doi 10 1109 CVPR 2009 5206757 ISBN 978 1 4244 3992 8 S2CID 440212 Archived from the original on 2019 03 20 Retrieved 2011 09 09 Savarese S Winn J Criminisi A 2006 Discriminative Object Class Models of Appearance and Shape by Correlatons PDF 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Volume 2 CVPR 06 Vol 2 p 2033 CiteSeerX 10 1 1 587 8853 doi 10 1109 CVPR 2006 102 ISBN 978 0 7695 2597 6 S2CID 1457124 Archived from the original PDF on 2013 10 29 Retrieved 2007 12 10 Sudderth E B Torralba A Freeman W T Willsky A S 2005 Learning hierarchical models of scenes objects and parts PDF Tenth IEEE International Conference on Computer Vision ICCV 05 Volume 1 p 1331 CiteSeerX 10 1 1 128 7259 doi 10 1109 ICCV 2005 137 ISBN 978 0 7695 2334 7 S2CID 6153430 Archived from the original PDF on 2019 02 03 Retrieved 2007 12 10 E Sudderth A Torralba W Freeman amp A Willsky 2005 Describing Visual Scenes using Transformed Dirichlet Processes PDF Proc of Neural Information Processing Systems Niebles Juan Carlos Li Fei Fei 2007 A Hierarchical Model of Shape and Appearance for Human Action Classification PDF 2007 IEEE Conference on Computer Vision and Pattern Recognition p 1 CiteSeerX 10 1 1 173 2667 doi 10 1109 CVPR 2007 383132 ISBN 978 1 4244 1179 5 S2CID 9213242 Lazebnik S Schmid C Ponce J 2006 Beyond Bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories PDF 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Volume 2 CVPR 06 Vol 2 p 2169 CiteSeerX 10 1 1 651 9183 doi 10 1109 CVPR 2006 68 ISBN 978 0 7695 2597 6 S2CID 2421251 Archived from the original PDF on 2018 05 08 Retrieved 2007 12 10 Koniusz Piotr Yan Fei Mikolajczyk Krystian 2013 05 01 Comparison of mid level feature coding approaches and pooling strategies in visual concept detection Computer Vision and Image Understanding 117 5 479 492 doi 10 1016 j cviu 2012 10 010 ISSN 1077 3142 a b Koniusz Piotr Yan Fei Gosselin Philippe Henri Mikolajczyk Krystian 2017 02 24 Higher order occurrence pooling for bags of words Visual concept detection PDF IEEE Transactions on Pattern Analysis and Machine Intelligence 39 2 313 326 doi 10 1109 TPAMI 2016 2545667 hdl 10044 1 39814 ISSN 0162 8828 PMID 27019477 Jegou H Douze M Schmid C Perez P 2010 06 01 Aggregating local descriptors into a compact image representation 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition PDF pp 3304 3311 doi 10 1109 CVPR 2010 5540039 ISBN 978 1 4244 6984 0 S2CID 1912782 Seeland Marco Rzanny Michael Alaqraa Nedal Waldchen Jana Mader Patrick 2017 02 24 Plant species classification using flower images A comparative study of local feature representations PLOS ONE 12 2 e0170629 Bibcode 2017PLoSO 1270629S doi 10 1371 journal pone 0170629 ISSN 1932 6203 PMC 5325198 PMID 28234999 External links editBag of Visual Words in a Nutshell a short tutorial by Bethea Davida A demo for two bag of words classifiers by L Fei Fei R Fergus and A Torralba Caltech Large Scale Image Search Toolbox a Matlab C toolbox implementing Inverted File search for Bag of Words model It also contains implementations for fast approximate nearest neighbor search using randomized k d tree locality sensitive hashing and hierarchical k means DBoW2 library a library that implements a fast bag of words in C with support for OpenCV Retrieved from https en wikipedia org w index php title Bag of words model in computer vision amp oldid 1224126532, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.