ABSTRACT
Bag-of-words models are among the most widely used and successful representations in multimedia retrieval. However, the quantization error which is introduced when mapping keypoints to visual words is one of the main drawbacks of the bag-of-words model. Although some techniques, such as soft-assignment to bags [23] and query expansion [27], have been introduced to deal with the problem, the performance gain is always at the cost of longer query response time, which makes them difficult to apply to large-scale multimedia retrieval applications. In this paper, we propose a simple "constrained keypoint quantization" method which can effectively reduce the overall quantization error of the bag-of-words representation and greatly improve the retrieval efficiency at the same time. The central idea of the proposed quantization method is that if a keypoint is far away from all visual words, we simply remove it. At first glance, this simple strategy seems naive and dangerous. However, we show that the proposed method has a solid theoretical background. Our experimental results on three widely used datasets for near duplicate image and video retrieval confirm that by removing a large amount of keypoints which have high quantization error, we obtain comparable or even better retrieval performance while dramatically boosting retrieval efficiency.
- cc_web_video: Near-duplicate web video dataset. available: http://vireo.cs.cityu.edu.hk/webvideo/.Google Scholar
- http://www.flickr.com.Google Scholar
- http://www.robots.ox.ac.uk/~vgg/data/oxbuildings.Google Scholar
- http://www.robots.ox.ac.uk/~vgg/research/affine.Google Scholar
- R. Baeza-Yates and B. Ribeiro-Neto. Modern information retrieval. ACM Press, 1999. Google ScholarDigital Library
- O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In Computer Vision and Pattern Recognition, 2008.Google ScholarCross Ref
- S. Boughhorbel, J.-P. Tarel, and F. Fleuret. Non-mercer kernels for svm object recognition. In British Machine Vision Conference, 2004.Google ScholarCross Ref
- Y. Cai, L. Yang, W. Ping, F. Wang, T. Mei, X.-S. Hua, and S. Li. Million-scale near-duplicate video retrieval system. In ACM Multimedia, 2011. Google ScholarDigital Library
- G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, 2004.Google Scholar
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2000. Google ScholarDigital Library
- K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 2007. Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 1963.Google ScholarCross Ref
- H. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. International Journal of Computer Vision, 2010. Google ScholarDigital Library
- F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In Computer Vision and Pattern Recognition, 2005. Google ScholarDigital Library
- Y. Ke, R. Sukthankar, and L. Huston. Efficient near-duplicate detection and sub-image retrieval. In ACM Multimedia, 2004. Google ScholarDigital Library
- D. Li, L. Yang, X.-S. Hua, and H.-J. Zhang. Large-scale robust visual codebook construction. In ACM Multimedia, 2010. Google ScholarDigital Library
- F. Li, W. Tong, R. Jin, A. K. Jain, and J.-E. Lee. An efficient key point quantization algorithm for large scale image retrieval. In ACM workshop on Large-scale multimedia retrieval and mining, 2009. Google ScholarDigital Library
- D. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004. Google ScholarDigital Library
- S. Lyu. Mercer kernels for object recognition with local features. In Computer Vision and Pattern Recognition, 2005. Google ScholarDigital Library
- M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application (VISSAPP'09), 2009.Google Scholar
- D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Computer Vision and Pattern Recognition, 2006. Google ScholarDigital Library
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition, 2007.Google ScholarCross Ref
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition, 2008.Google ScholarCross Ref
- J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, 2003. Google ScholarDigital Library
- T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. In International Conference on Computer Vision, 2007.Google ScholarCross Ref
- X. Wu, A. G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In ACM Multimedia, 2007. Google ScholarDigital Library
- L. Yang, Y. Cai, A. Hanjalic, X.-S. Hua, and S. Li. Video-based image retrieval. In ACM Multimedia, 2011. Google ScholarDigital Library
- L. Yang, B. Geng, Y. Cai, A. Hanjalic, and X.-S. Hua. Object retrieval using visual query context. IEEE Transactions on Multimedia, 2011. Google ScholarDigital Library
- Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, and Y. Pan. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012. Google ScholarDigital Library
- Y. Yang, Y.-T. Zhuang, F. Wu, and Y.-H. Pan. Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia, 2008. Google ScholarDigital Library
- W.-L. Zhao, S. Tan, and C.-W. Ngo. Large-scale near-duplicate web video search: challenge and opportunity. In International Conference on Multimedia and Expo, 2009. Google ScholarDigital Library
Index Terms
- Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval
Recommendations
Local Deep Descriptors in Bag-of-Words for Image Retrieval
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017The Bag-of-Words (BoW) models using the SIFT descriptors have achieved great success in content-based image retrieval over the past decade. Recent studies show that the neuron activations of the convolutional neural networks (CNN) can be viewed as local ...
Constrained and recursive hierarchical table-lookup vector quantization
DCC '96: Proceedings of the Conference on Data CompressionThis paper presents techniques for the design of generic constrained and recursive vector quantizer encoders implemented by table-lookups. These vector quantizers include entropy-constrained VQ, tree structured VQ, classified VQ, product VQ, mean-...
Color Directional Local Quinary Patterns for Content Based Indexing and Retrieval
This paper presents a novel evaluationary approach to extract color-texture features for image retrieval application namely Color Directional Local Quinary Pattern (CDLQP). The proposed descriptor extracts the individual R, G and B channel wise ...
Comments