research-article

Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval

Authors:
Yang Cai

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China
View Profile

,
Wei Tong

Carnegie Mellon University, Pittsburgh

Carnegie Mellon University, Pittsburgh
View Profile

,
Linjun Yang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Alexander G. Hauptmann

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia RetrievalJune 2012Article No.: 16Pages 1–8https://doi.org/10.1145/2324796.2324816

Published:05 June 2012Publication History

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Pages 1–8

ABSTRACT

Bag-of-words models are among the most widely used and successful representations in multimedia retrieval. However, the quantization error which is introduced when mapping keypoints to visual words is one of the main drawbacks of the bag-of-words model. Although some techniques, such as soft-assignment to bags [23] and query expansion [27], have been introduced to deal with the problem, the performance gain is always at the cost of longer query response time, which makes them difficult to apply to large-scale multimedia retrieval applications. In this paper, we propose a simple "constrained keypoint quantization" method which can effectively reduce the overall quantization error of the bag-of-words representation and greatly improve the retrieval efficiency at the same time. The central idea of the proposed quantization method is that if a keypoint is far away from all visual words, we simply remove it. At first glance, this simple strategy seems naive and dangerous. However, we show that the proposed method has a solid theoretical background. Our experimental results on three widely used datasets for near duplicate image and video retrieval confirm that by removing a large amount of keypoints which have high quantization error, we obtain comparable or even better retrieval performance while dramatically boosting retrieval efficiency.

References

cc_web_video: Near-duplicate web video dataset. available: http://vireo.cs.cityu.edu.hk/webvideo/.Google Scholar
http://www.flickr.com.Google Scholar
http://www.robots.ox.ac.uk/~vgg/data/oxbuildings.Google Scholar
http://www.robots.ox.ac.uk/~vgg/research/affine.Google Scholar
R. Baeza-Yates and B. Ribeiro-Neto. Modern information retrieval. ACM Press, 1999. Google ScholarDigital Library
O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In Computer Vision and Pattern Recognition, 2008.Google ScholarCross Ref
S. Boughhorbel, J.-P. Tarel, and F. Fleuret. Non-mercer kernels for svm object recognition. In British Machine Vision Conference, 2004.Google ScholarCross Ref
Y. Cai, L. Yang, W. Ping, F. Wang, T. Mei, X.-S. Hua, and S. Li. Million-scale near-duplicate video retrieval system. In ACM Multimedia, 2011. Google ScholarDigital Library
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, 2004.Google Scholar
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2000. Google ScholarDigital Library
K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 2007. Google ScholarDigital Library
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 1963.Google ScholarCross Ref
H. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. International Journal of Computer Vision, 2010. Google ScholarDigital Library
F. Jurie and B. Triggs. Creating efficient codebooks for visual recognition. In Computer Vision and Pattern Recognition, 2005. Google ScholarDigital Library
Y. Ke, R. Sukthankar, and L. Huston. Efficient near-duplicate detection and sub-image retrieval. In ACM Multimedia, 2004. Google ScholarDigital Library
D. Li, L. Yang, X.-S. Hua, and H.-J. Zhang. Large-scale robust visual codebook construction. In ACM Multimedia, 2010. Google ScholarDigital Library
F. Li, W. Tong, R. Jin, A. K. Jain, and J.-E. Lee. An efficient key point quantization algorithm for large scale image retrieval. In ACM workshop on Large-scale multimedia retrieval and mining, 2009. Google ScholarDigital Library
D. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, 2004. Google ScholarDigital Library
S. Lyu. Mercer kernels for object recognition with local features. In Computer Vision and Pattern Recognition, 2005. Google ScholarDigital Library
M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application (VISSAPP'09), 2009.Google Scholar
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Computer Vision and Pattern Recognition, 2006. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition, 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition, 2008.Google ScholarCross Ref
J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, 2003. Google ScholarDigital Library
T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. In International Conference on Computer Vision, 2007.Google ScholarCross Ref
X. Wu, A. G. Hauptmann, and C.-W. Ngo. Practical elimination of near-duplicates from web video search. In ACM Multimedia, 2007. Google ScholarDigital Library
L. Yang, Y. Cai, A. Hanjalic, X.-S. Hua, and S. Li. Video-based image retrieval. In ACM Multimedia, 2011. Google ScholarDigital Library
L. Yang, B. Geng, Y. Cai, A. Hanjalic, and X.-S. Hua. Object retrieval using visual query context. IEEE Transactions on Multimedia, 2011. Google ScholarDigital Library
Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, and Y. Pan. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012. Google ScholarDigital Library
Y. Yang, Y.-T. Zhuang, F. Wu, and Y.-H. Pan. Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia, 2008. Google ScholarDigital Library
W.-L. Zhao, S. Tan, and C.-W. Ngo. Large-scale near-duplicate web video search: challenge and opportunity. In International Conference on Multimedia and Expo, 2009. Google ScholarDigital Library

Index Terms

Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations

Recommendations

Local Deep Descriptors in Bag-of-Words for Image Retrieval
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

The Bag-of-Words (BoW) models using the SIFT descriptors have achieved great success in content-based image retrieval over the past decade. Recent studies show that the neuron activations of the convolutional neural networks (CNN) can be viewed as local ...
Read More
Constrained and recursive hierarchical table-lookup vector quantization
DCC '96: Proceedings of the Conference on Data Compression

This paper presents techniques for the design of generic constrained and recursive vector quantizer encoders implemented by table-lookups. These vector quantizers include entropy-constrained VQ, tree structured VQ, classified VQ, product VQ, mean-...
Read More
Color Directional Local Quinary Patterns for Content Based Indexing and Retrieval

This paper presents a novel evaluationary approach to extract color-texture features for image retrieval application namely Color Directional Local Quinary Pattern (CDLQP). The proposed descriptor extracts the individual R, G and B channel wise ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
June 2012
489 pages
ISBN:9781450313292
DOI:10.1145/2324796
Conference Chairs:
Horace H. S. Ip
City University of Hong Kong
,
Yong Rui
Microsoft, China
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bag-of-words model
multimedia retrieval
visual word quantization
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '12 Paper Acceptance Rate50of145submissions,34%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 413
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval

ICMR '12: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Local Deep Descriptors in Bag-of-Words for Image Retrieval

Constrained and recursive hierarchical table-lookup vector quantization

Color Directional Local Quinary Patterns for Content Based Indexing and Retrieval