research-article

Large-scale Structure-from-Motion Reconstruction with small memory consumption

Authors:
Guoyu Lu

Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, Delaware, USA

Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, Delaware, USA
View Profile

,
Vincent Ly

Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, Delaware, USA

Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, Delaware, USA
View Profile

,
Chandra Kambhamettu

Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, Delaware, USA

Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, Delaware, USA
View Profile

MoMM '13: Proceedings of International Conference on Advances in Mobile Computing & MultimediaDecember 2013Pages 500–508https://doi.org/10.1145/2536853.2536897

Published:02 December 2013Publication History

MoMM '13: Proceedings of International Conference on Advances in Mobile Computing & Multimedia

Pages 500–508

ABSTRACT

Structure-from-Motion reconstruction is to recover the 3 dimensional structure from 2 dimensional images. Recent research in this field demonstrates the ability to reconstruct cities based on images extracted from a photo collection website; SIFT feature is typically extracted to detect correspondences between images. For the reconstruction of large scale unsorted images, the system is required to store all features and points information in the memory to search for correspondences. As SIFT feature is a 128 dimensional real-valued vector, storing each descriptor would consume a significant amount of memory. Due to this limitation, we propose to project the high-dimensional feature into a lower-dimensional space by using a new learned projection matrix while still maintaining the property of the original features. Hence, the result of this projection will shorten the distance among descriptors of the same point while lengthening the distance among descriptors of different points. These projected descriptors use Hellinger distance for calculation of the similarity between features. Furthermore, we learn a mapping function, which will map the real-valued descriptor into binary code coping with the variation of correspondence searching method. Experiments demonstrate that our method achieve excellent results with limited memory requirement.

References

S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski, "Building rome in a day," in ICCV, 2009, pp. 72--79.Google Scholar
S. Arya, D. Mount, N. Netanyahu, R. Silverman, and A. Wu, "An optimal algorithm for approximate nearest neighbor searching fixed dimensions," Journal of the ACM (JACM), vol. 45, no. 6, pp. 891--923, 1998. Google ScholarDigital Library
M. A. Fischler and R. C. Bolles, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, vol. 24, no. 6, pp. 381--395, Jun 1981. Google ScholarDigital Library
S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, "A comparison and evaluation of multi-view stereo reconstruction algorithms," in CVPR, 2006. Google ScholarDigital Library
R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge Univ Press, 2000, vol. 2. Google ScholarDigital Library
O. Faugeras, Q. Luong, and T. Papadopoulo, The geometry of multiple images: the laws that govern the formation of multiple images of a scene and some of their applications. the MIT Press, 2004.Google Scholar
P. Sturm and B. Triggs, "A factorization based algorithm for multi-image projective structure and motion," in ECCV, 1996. Google ScholarDigital Library
C. Fruh and A. Zakhor, "An automated method for large-scale, ground-based city model acquisition," International Journal of Computer Vision, vol. 60, no. 1, pp. 5--24, 2004. Google ScholarDigital Library
M. Pollefeys, D. Nister, J. Frahm, A. Akbarzadeh, P. Mordohai, B. Clipp, C. Engels, D. Gallup, S. Kim, P. Merrell et al., "Detailed real-time urban 3d reconstruction from video," International Journal of Computer Vision, vol. 78, no. 2-3, pp. 143--167, 2008. Google ScholarDigital Library
L. Zebedin, J. Bauer, K. Karner, and H. Bischof, "Fusion of feature-and area-based information for urban buildings modeling from aerial imagery," in ECCV, 2008. Google ScholarDigital Library
N. Snavely, S. M. Seitz, and R. Szeliski, "Photo tourism: exploring photo collections in 3d," in ACM transactions on graphics (TOG), vol. 25, no. 3. ACM, 2006, pp. 835--846. Google ScholarDigital Library
N. Snavely, S. Seitz, and R. Szeliski, "Skeletal graphs for efficient structure from motion." in CVPR, vol. 1, 2008, p. 2.Google Scholar
J. Frahm, P. Fite-Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y. Jen, E. Dunn, B. Clipp, S. Lazebnik et al., "Building rome on a cloudless day," in ECCV, 2010. Google ScholarDigital Library
D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91--110, 2004. Google ScholarDigital Library
H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robust features (surf)," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346--359, 2008. Google ScholarDigital Library
R. Kalia, L. K., S. B.V.R., S. Je, and O. W.G., "An analysis of the effect of different image preprocessing techniques on the performance of surf: Speeded up robust features," in 2011 17th Korea-Japan Joint Workshop on Frontiers of Computer Vision, 2011, pp. 1--6.Google Scholar
I. Jolliffe, Principal Component Analysis. Springer Verlag, 1986.Google Scholar
Y. Ke and R. Sukthankar, "Pca-sift: a more distinctive representation for local image descriptors," in CVPR, vol. 2, 2004, pp. 506--513. Google ScholarDigital Library
G. Hua, M. Brown, and S. Winder, "Discriminant embedding for local image descriptors," in ICCV, 2007, pp. 1--8.Google Scholar
S. Winder, G. Hua, and M. Brown, "Picking the best daisy," in CVPR, 2009, pp. 178--185.Google Scholar
E. Tola, V. Lepetit, and P. Fua, "Daisy: An efficient dense descriptor applied to wide-baseline stereo," TPAMI, vol. 32, no. 5, pp. 815--830, 2010. Google ScholarDigital Library
S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K. Mullers, "Fisher discriminant analysis with kernels," in Proceedings of the 1999 IEEE Signal Processing Society Workshop Neural Networks for Signal Processing IX., 1999, pp. 41--48.Google Scholar
M. Powell, "An efficient method for finding the minimum of a function of several variables without calculating derivatives," vol. 7, pp. 155--162, 1964.Google Scholar
M. Brown, G. Hua, and S. Winder, "Discriminative learning of local image descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 43--57, 2011. Google ScholarDigital Library
J. Philbin, M. Isard, J. Sivic, and A. Zisserman, "Descriptor learning for efficient retrieval," in ECCV, 2010, pp. 677--691. Google ScholarDigital Library
B. Kulis and K. Grauman, "Kernelized locality-sensitive hashing for scalable image search," in ICCV, 2009, pp. 2130--2137.Google Scholar
B. Kulis and T. Darrell, "Learning to hash with binary reconstructive embeddings." in NIPS, 2009, pp. 1042--1050.Google ScholarDigital Library
Y. Weiss, A. Torralba, and R. Fergus, "Spectral hashing," in NIPS, 2008, pp. 1753--1760.Google Scholar
M. Raginsky and S. Lazebnik, "Locality-sensitive binary codes from shift-invariant kernels," in NIPS, 2009, pp. 1509--1517.Google Scholar
J. Yagnik, D. Strelow, D. Ross, and R. sung Lin, "The power of comparative reasoning," in ICCV, 2011, pp. 2431--2438. Google ScholarDigital Library
A. Broder, "On the resemblance and containment of documents," in Compression and Complexity of Sequences 1997. Proceedings, 1997, pp. 21--29. Google ScholarDigital Library
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher, "Min-wise independent permutations," Journal of Computer and System Sciences, vol. 60, pp. 327--336, 1998. Google ScholarDigital Library
Y. Jia, F. Nie, and C. Zhang, "Trace ratio problem revisited," IEEE Transactions on Neural Networks, vol. 20, no. 4, pp. 729--735, 2009. Google ScholarDigital Library
Y. Yang, F. Nie, D. Xu, J. Luo, Y. Zhuang, and Y. Pan, "A multimedia retrieval framework based on semi-supervised ranking and relevance feedback," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 4, pp. 723--742, 2012. Google ScholarDigital Library
C. Strecha, A. Bronstein, M. Bronstein, and P. Fua, "Ldahash: Improved matching with smaller descriptors," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, pp. 66--78, 2012. Google ScholarDigital Library
B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon, "Bundle adjustmentâĂŤa modern synthesis," in Vision algorithms: theory and practice, 2000, pp. 298--372. Google ScholarDigital Library
M. Jain, R. Benmokhtar, P. Gros, and H. Jegou, "Hamming embedding similarity-based image classification," in In ICMR, 2012. Google ScholarDigital Library
R. Arandjelovic and A. Zisserman, "Three things everyone should know to improve object retrieval," in In CVPR, 2012. Google ScholarDigital Library
G. Lu, Large-scale image-based localization using learned projection for local features, Master thesis. RWTH Aachen University, 2012.Google Scholar

Index Terms

Large-scale Structure-from-Motion Reconstruction with small memory consumption
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
      2. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing

Recommendations

Memory efficient large-scale image-based localization

Local features have been widely used in the area of image-based localization. However, large-scale 2D-to-3D matching problems still involve massive memory consumption, which is mainly caused by the high dimensionality of the features (e.g. 128 ...
Read More
Comparative Study on Dimensionality Reduction in Large-Scale Image Retrieval
ISM '13: Proceedings of the 2013 IEEE International Symposium on Multimedia

Dimensionality reduction plays a significant role for the performance of large-scale image retrieval. In this paper, various dimensionality reduction methods are compared to validate their own performance in image retrieval. For this purpose, first, the ...
Read More
Scale-Invariant Feature Extraction by VQ-Based Local Image Descriptor
CIMCA '08: Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation

SIFT (Scale Invariant Feature Transform) feature is identified as being invariant to common image deformations caused by the rotation, scaling, and illumination. In this paper, instead of using SIFT's smoothed weighted orientation histograms, we apply ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MoMM '13: Proceedings of International Conference on Advances in Mobile Computing & Multimedia
December 2013
599 pages
ISBN:9781450321068
DOI:10.1145/2536853
Conference Chairs:
René Mayrhofer,
Luke Chen,
Matthias Steinbauer,
Gabriele Kotsis,
Ismail Khalil
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Dimensionality reduction
Hashing
Hellinger kernel
SIFT feature
Structure-from-motion reconstruction
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 96
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-scale Structure-from-Motion Reconstruction with small memory consumption

MoMM '13: Proceedings of International Conference on Advances in Mobile Computing & Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Memory efficient large-scale image-based localization

Comparative Study on Dimensionality Reduction in Large-Scale Image Retrieval

Scale-Invariant Feature Extraction by VQ-Based Local Image Descriptor