ABSTRACT
Recent works on image retrieval have proposed to index images by compact representations encoding powerful local descriptors, such as the closely related VLAD and Fisher vector. By combining such a representation with a suitable coding technique, it is possible to encode an image in a few dozen bytes while achieving excellent retrieval results. This paper revisits some assumptions proposed in this context regarding the handling of "visual burstiness", and shows that ad-hoc choices are implicitly done which are not desirable. Focusing on VLAD without loss of generality, we propose to modify several steps of the original design. Albeit simple, these modifications significantly improve VLAD and make it compare favorably against the state of the art.
- R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, Jun. 2012. Google ScholarDigital Library
- R. Arandjelovic and A. Zisserman. All about VLAD. In CVPR, Jun. 2013. Google ScholarDigital Library
- C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007.Google Scholar
- G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV Workshop Statistical Learning in Computer Vision, 2004.Google Scholar
- H. Jégou, M. Douze, and C. Schmid. On the burstiness of visual elements. In CVPR, Jun. 2009.Google ScholarCross Ref
- H. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. IJCV, 87(3):316--336, Feb. 2010. Google ScholarDigital Library
- H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. Trans. PAMI, 33(1):117--128, Jan. 2011. Google ScholarDigital Library
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, Jun. 2010.Google ScholarCross Ref
- H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local descriptors into compact codes. In Trans. PAMI, 34(9):1704--1714, Sep. 2012. Google ScholarDigital Library
- D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, Nov. 2004. Google ScholarDigital Library
- K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. IJCV, 60(1):63--86, Oct. 2004. Google ScholarDigital Library
- D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In CVPR, Jun. 2006. Google ScholarDigital Library
- F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, Jun. 2007.Google ScholarCross Ref
- F. Perronnin, J.Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In ECCV, Sep. 2010. Google ScholarDigital Library
- F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier. Large-scale image retrieval with compressed Fisher vectors. In CVPR, Jun. 2010.Google ScholarCross Ref
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, Jun. 2007.Google ScholarCross Ref
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, Jun. 2008.Google ScholarCross Ref
- J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, Oct. 2003. Google ScholarDigital Library
Index Terms
- Revisiting the VLAD image representation
Recommendations
Boosting VLAD with weighted fusion of local descriptors for image retrieval
In the last decade, many efforts have been developed for discriminative image representations. Among these works, vector of locally aggregated descriptors (VLAD) has been demonstrated to be an effective one. However, most VLAD-based methods generally ...
Weighted two-step aggregated VLAD for image retrieval
AbstractThe vector of locally aggregated descriptor (VLAD) has been demonstrated to be efficient and effective in image retrieval and classification tasks. Due to the small-size codebook adopted by the method, the feature space division is coarse and the ...
Novel color Gabor-LBP-PHOG (GLP) descriptors for object and scene image classification
ICVGIP '12: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image ProcessingThis paper presents a novel set of color descriptors for object and scene image classification. We first introduce a new Gabor-PHOG (GPHOG) descriptor by concatenating the Pyramid of Histograms of Oriented Gradients (PHOG) of the local Gabor filtered ...
Comments