skip to main content
research-article
Public Access

What makes Paris look like Paris?

Published:23 November 2015Publication History
Skip Abstract Section

Abstract

Given a large repository of geo-tagged imagery, we seek to automatically find visual elements, for example windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically informed image retrieval.

References

  1. Berg, T., Berg, A. Finding iconic images. In The 2nd Internet Vision Workshop at Conference on Computer Vision and Pattern Recognition (CVPR) (2009), IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  2. Crandall, D., Backstrom, L., Huttenlocher, D., Kleinberg, J. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web (WWW) (2009), 761--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dalal, N., Triggs, B. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Volume 1 (2005), IEEE, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Doersch, C., Gupta, A., Efros, A.A. Mid-level visual element discovery as discriminative mode seeking. In Advances in Neural Information Processing Systems (NIPS). Volume 26 (2013), 494--502.Google ScholarGoogle Scholar
  5. Fiss, J., Agarwala, A., Curless, B. Candid portrait selection from video. ACM Trans. Graph. (SIGGRAPH Asia) 30, 6 (2011), 128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hays, J., Efros, A. Im2gps: Estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., Hertzmann, A. Image sequence geolocation with human travel priors. In IEEE 12th International Conference on Computer Vision (ICCV) (2009), IEEE, 253--260.Google ScholarGoogle ScholarCross RefCross Ref
  8. Knopp, J., Sivic, J., Pajdla, T. Avoiding confusing features in place recognition. In European Conference on Computer Vision (ECCV) (2010), Springer, 748--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lee, Y.J., Efros, A.A., Hebert, M. Style-aware mid-level representation for discovering visual connections in space and time. In IEEE 14th International Conference on Computer Vision (ICCV) (2013), IEEE, 1857--1864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M. Modeling and recognition of landmark image collections using iconic scene graphs. In European Conference on Computer Vision (ECCV) (2008), Springer, 427--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Li, Y., Crandall, D., Huttenlocher, D. Landmark classification in large-scale image collections. In IEEE 12th International Conference on Computer Vision (ICCV) (2009), IEEE, 1957--1964.Google ScholarGoogle Scholar
  12. Mueller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L. Procedural modeling of buildings. ACM Trans. Graph. (SIGGRAPH) 25, 3 (2006), 614--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Oliva, A., Torralba, A. Building the gist of a scene: The role of global image features in recognition. Prog. Brain Res. 155 (2006), 23--36.Google ScholarGoogle ScholarCross RefCross Ref
  14. Paik, K. The Art of Ratatouille. Chronicle Books, 2006.Google ScholarGoogle Scholar
  15. Quack, T., Leibe, B., Van Gool, L. World-scale mining of objects and events from community photo collections. In Proceedings of the International Conference on Content-based Image and Video Retrieval (CIVR) (2008), 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A. Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006), IEEE, 1605--1614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Schindler, G., Brown, M., Szeliski, R. City-scale location recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007), IEEE, 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  18. Shrivastava, A., Malisiewicz, T., Gupta, A., Efros, A.A. Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. (SIGGRAPH Asia) 30, 6 (2011), 154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Simon, I., Snavely, N., Seitz, S.M. Scene summarization for online image collections. In IEEE 11th International Conference on Computer Vision (ICCV) (2007), IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  20. Singh, S., Gupta, A., Efros, A.A. Unsupervised discovery of mid-level discriminative patches. In European Conference on Computer Vision (ECCV) (2012), Springer, 73--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sivic, J., Zisserman, A. Video google: A text retrieval approach to object matching in videos. In IEEE 9th International Conference on Computer Vision (ICCV) (2003), IEEE, 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Teboul, O., Simon, L., Koutsourakis, P., Paragios, N. Segmentation of building facades using procedural shape priors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010), IEEE, 3105--3112.Google ScholarGoogle ScholarCross RefCross Ref
  23. Torralba, A., Oliva, A. Statistics of natural image categories. Netw. Comput. Neural Syst. 14, 3 (2003), 391--412.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. What makes Paris look like Paris?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 58, Issue 12
        December 2015
        115 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/2847579
        • Editor:
        • Moshe Y. Vardi
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 November 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDFChinese translation

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format