ABSTRACT
Structure-from-Motion reconstruction is to recover the 3 dimensional structure from 2 dimensional images. Recent research in this field demonstrates the ability to reconstruct cities based on images extracted from a photo collection website; SIFT feature is typically extracted to detect correspondences between images. For the reconstruction of large scale unsorted images, the system is required to store all features and points information in the memory to search for correspondences. As SIFT feature is a 128 dimensional real-valued vector, storing each descriptor would consume a significant amount of memory. Due to this limitation, we propose to project the high-dimensional feature into a lower-dimensional space by using a new learned projection matrix while still maintaining the property of the original features. Hence, the result of this projection will shorten the distance among descriptors of the same point while lengthening the distance among descriptors of different points. These projected descriptors use Hellinger distance for calculation of the similarity between features. Furthermore, we learn a mapping function, which will map the real-valued descriptor into binary code coping with the variation of correspondence searching method. Experiments demonstrate that our method achieve excellent results with limited memory requirement.
- S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski, "Building rome in a day," in ICCV, 2009, pp. 72--79.Google Scholar
- S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu, "An optimal algorithm for approximate nearest neighbor searching fixed dimensions," Journal of the ACM (JACM), vol. 45, no. 6, pp. 891--923, 1998. Google ScholarDigital Library
- M. A. Fischler and R. C. Bolles, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, vol. 24, no. 6, pp. 381--395, Jun 1981. Google ScholarDigital Library
- S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, "A comparison and evaluation of multi-view stereo reconstruction algorithms," in CVPR, 2006. Google ScholarDigital Library
- R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge Univ Press, 2000, vol. 2. Google ScholarDigital Library
- O. Faugeras, Q. Luong, and T. Papadopoulo, The geometry of multiple images: the laws that govern the formation of multiple images of a scene and some of their applications. the MIT Press, 2004.Google Scholar
- P. Sturm and B. Triggs, "A factorization based algorithm for multi-image projective structure and motion," in ECCV, 1996. Google ScholarDigital Library
- C. Fruh and A. Zakhor, "An automated method for large-scale, ground-based city model acquisition," International Journal of Computer Vision, vol. 60, no. 1, pp. 5--24, 2004. Google ScholarDigital Library
- M. Pollefeys, D. Nister, J. Frahm, A. Akbarzadeh, P. Mordohai, B. Clipp, C. Engels, D. Gallup, S. Kim, P. Merrell et al., "Detailed real-time urban 3d reconstruction from video," International Journal of Computer Vision, vol. 78, no. 2-3, pp. 143--167, 2008. Google ScholarDigital Library
- L. Zebedin, J. Bauer, K. Karner, and H. Bischof, "Fusion of feature-and area-based information for urban buildings modeling from aerial imagery," in ECCV, 2008. Google ScholarDigital Library
- N. Snavely, S. M. Seitz, and R. Szeliski, "Photo tourism: exploring photo collections in 3d," in ACM transactions on graphics (TOG), vol. 25, no. 3. ACM, 2006, pp. 835--846. Google ScholarDigital Library
- N. Snavely, S. Seitz, and R. Szeliski, "Skeletal graphs for efficient structure from motion." in CVPR, vol. 1, 2008, p. 2.Google Scholar
- J. Frahm, P. Fite-Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y. Jen, E. Dunn, B. Clipp, S. Lazebnik et al., "Building rome on a cloudless day," in ECCV, 2010. Google ScholarDigital Library
- D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91--110, 2004. Google ScholarDigital Library
- H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robust features (surf)," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346--359, 2008. Google ScholarDigital Library
- R. Kalia, L. K., S. B.V.R., S. Je, and O. W.G., "An analysis of the effect of different image preprocessing techniques on the performance of surf: Speeded up robust features," in 2011 17th Korea-Japan Joint Workshop on Frontiers of Computer Vision, 2011, pp. 1--6.Google Scholar
- I. Jolliffe, Principal Component Analysis. Springer Verlag, 1986.Google Scholar
- Y. Ke and R. Sukthankar, "Pca-sift: a more distinctive representation for local image descriptors," in CVPR, vol. 2, 2004, pp. 506--513. Google ScholarDigital Library
- G. Hua, M. Brown, and S. Winder, "Discriminant embedding for local image descriptors," in ICCV, 2007, pp. 1--8.Google Scholar
- S. Winder, G. Hua, and M. Brown, "Picking the best daisy," in CVPR, 2009, pp. 178--185.Google Scholar
- E. Tola, V. Lepetit, and P. Fua, "Daisy: An efficient dense descriptor applied to wide-baseline stereo," TPAMI, vol. 32, no. 5, pp. 815--830, 2010. Google ScholarDigital Library
- S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Mullers, "Fisher discriminant analysis with kernels," in Proceedings of the 1999 IEEE Signal Processing Society Workshop Neural Networks for Signal Processing IX., 1999, pp. 41--48.Google Scholar
- M. Powell, "An efficient method for finding the minimum of a function of several variables without calculating derivatives," vol. 7, pp. 155--162, 1964.Google Scholar
- M. Brown, G. Hua, and S. Winder, "Discriminative learning of local image descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 43--57, 2011. Google ScholarDigital Library
- J. Philbin, M. Isard, J. Sivic, and A. Zisserman, "Descriptor learning for efficient retrieval," in ECCV, 2010, pp. 677--691. Google ScholarDigital Library
- B. Kulis and K. Grauman, "Kernelized locality-sensitive hashing for scalable image search," in ICCV, 2009, pp. 2130--2137.Google Scholar
- B. Kulis and T. Darrell, "Learning to hash with binary reconstructive embeddings." in NIPS, 2009, pp. 1042--1050.Google ScholarDigital Library
- Y. Weiss, A. Torralba, and R. Fergus, "Spectral hashing," in NIPS, 2008, pp. 1753--1760.Google Scholar
- M. Raginsky and S. Lazebnik, "Locality-sensitive binary codes from shift-invariant kernels," in NIPS, 2009, pp. 1509--1517.Google Scholar
- J. Yagnik, D. Strelow, D. Ross, and R. sung Lin, "The power of comparative reasoning," in ICCV, 2011, pp. 2431--2438. Google ScholarDigital Library
- A. Broder, "On the resemblance and containment of documents," in Compression and Complexity of Sequences 1997. Proceedings, 1997, pp. 21--29. Google ScholarDigital Library
- A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher, "Min-wise independent permutations," Journal of Computer and System Sciences, vol. 60, pp. 327--336, 1998. Google ScholarDigital Library
- Y. Jia, F. Nie, and C. Zhang, "Trace ratio problem revisited," IEEE Transactions on Neural Networks, vol. 20, no. 4, pp. 729--735, 2009. Google ScholarDigital Library
- Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, and Y. Pan, "A multimedia retrieval framework based on semi-supervised ranking and relevance feedback," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 723--742, 2012. Google ScholarDigital Library
- C. Strecha, A. Bronstein, M. Bronstein, and P. Fua, "Ldahash: Improved matching with smaller descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, pp. 66--78, 2012. Google ScholarDigital Library
- B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon, "Bundle adjustmentâĂŤa modern synthesis," in Vision algorithms: theory and practice, 2000, pp. 298--372. Google ScholarDigital Library
- M. Jain, R. Benmokhtar, P. Gros, and H. Jegou, "Hamming embedding similarity-based image classification," in In ICMR, 2012. Google ScholarDigital Library
- R. Arandjelovic and A. Zisserman, "Three things everyone should know to improve object retrieval," in In CVPR, 2012. Google ScholarDigital Library
- G. Lu, Large-scale image-based localization using learned projection for local features, Master thesis. RWTH Aachen University, 2012.Google Scholar
Index Terms
- Large-scale Structure-from-Motion Reconstruction with small memory consumption
Recommendations
Memory efficient large-scale image-based localization
Local features have been widely used in the area of image-based localization. However, large-scale 2D-to-3D matching problems still involve massive memory consumption, which is mainly caused by the high dimensionality of the features (e.g. 128 ...
Comparative Study on Dimensionality Reduction in Large-Scale Image Retrieval
ISM '13: Proceedings of the 2013 IEEE International Symposium on MultimediaDimensionality reduction plays a significant role for the performance of large-scale image retrieval. In this paper, various dimensionality reduction methods are compared to validate their own performance in image retrieval. For this purpose, first, the ...
Scale-Invariant Feature Extraction by VQ-Based Local Image Descriptor
CIMCA '08: Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & AutomationSIFT (Scale Invariant Feature Transform) feature is identified as being invariant to common image deformations caused by the rotation, scaling, and illumination. In this paper, instead of using SIFT's smoothed weighted orientation histograms, we apply ...
Comments