Abstract
Given a large repository of geo-tagged imagery, we seek to automatically find visual elements, for example windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically informed image retrieval.
- Berg, T., Berg, A. Finding iconic images. In The 2nd Internet Vision Workshop at Conference on Computer Vision and Pattern Recognition (CVPR) (2009), IEEE, 1--8.Google ScholarCross Ref
- Crandall, D., Backstrom, L., Huttenlocher, D., Kleinberg, J. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web (WWW) (2009), 761--770. Google ScholarDigital Library
- Dalal, N., Triggs, B. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Volume 1 (2005), IEEE, 886--893. Google ScholarDigital Library
- Doersch, C., Gupta, A., Efros, A.A. Mid-level visual element discovery as discriminative mode seeking. In Advances in Neural Information Processing Systems (NIPS). Volume 26 (2013), 494--502.Google Scholar
- Fiss, J., Agarwala, A., Curless, B. Candid portrait selection from video. ACM Trans. Graph. (SIGGRAPH Asia) 30, 6 (2011), 128. Google ScholarDigital Library
- Hays, J., Efros, A. Im2gps: Estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), IEEE, 1--8.Google ScholarCross Ref
- Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., Hertzmann, A. Image sequence geolocation with human travel priors. In IEEE 12th International Conference on Computer Vision (ICCV) (2009), IEEE, 253--260.Google ScholarCross Ref
- Knopp, J., Sivic, J., Pajdla, T. Avoiding confusing features in place recognition. In European Conference on Computer Vision (ECCV) (2010), Springer, 748--761. Google ScholarDigital Library
- Lee, Y.J., Efros, A.A., Hebert, M. Style-aware mid-level representation for discovering visual connections in space and time. In IEEE 14th International Conference on Computer Vision (ICCV) (2013), IEEE, 1857--1864. Google ScholarDigital Library
- Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M. Modeling and recognition of landmark image collections using iconic scene graphs. In European Conference on Computer Vision (ECCV) (2008), Springer, 427--440. Google ScholarDigital Library
- Li, Y., Crandall, D., Huttenlocher, D. Landmark classification in large-scale image collections. In IEEE 12th International Conference on Computer Vision (ICCV) (2009), IEEE, 1957--1964.Google Scholar
- Mueller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L. Procedural modeling of buildings. ACM Trans. Graph. (SIGGRAPH) 25, 3 (2006), 614--623. Google ScholarDigital Library
- Oliva, A., Torralba, A. Building the gist of a scene: The role of global image features in recognition. Prog. Brain Res. 155 (2006), 23--36.Google ScholarCross Ref
- Paik, K. The Art of Ratatouille. Chronicle Books, 2006.Google Scholar
- Quack, T., Leibe, B., Van Gool, L. World-scale mining of objects and events from community photo collections. In Proceedings of the International Conference on Content-based Image and Video Retrieval (CIVR) (2008), 47--56. Google ScholarDigital Library
- Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A. Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006), IEEE, 1605--1614. Google ScholarDigital Library
- Schindler, G., Brown, M., Szeliski, R. City-scale location recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007), IEEE, 1--7.Google ScholarCross Ref
- Shrivastava, A., Malisiewicz, T., Gupta, A., Efros, A.A. Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. (SIGGRAPH Asia) 30, 6 (2011), 154. Google ScholarDigital Library
- Simon, I., Snavely, N., Seitz, S.M. Scene summarization for online image collections. In IEEE 11th International Conference on Computer Vision (ICCV) (2007), IEEE, 1--8.Google ScholarCross Ref
- Singh, S., Gupta, A., Efros, A.A. Unsupervised discovery of mid-level discriminative patches. In European Conference on Computer Vision (ECCV) (2012), Springer, 73--86. Google ScholarDigital Library
- Sivic, J., Zisserman, A. Video google: A text retrieval approach to object matching in videos. In IEEE 9th International Conference on Computer Vision (ICCV) (2003), IEEE, 1470--1477. Google ScholarDigital Library
- Teboul, O., Simon, L., Koutsourakis, P., Paragios, N. Segmentation of building facades using procedural shape priors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010), IEEE, 3105--3112.Google ScholarCross Ref
- Torralba, A., Oliva, A. Statistics of natural image categories. Netw. Comput. Neural Syst. 14, 3 (2003), 391--412.Google ScholarCross Ref
Index Terms
- What makes Paris look like Paris?
Recommendations
What makes Paris look like Paris?
Given a large repository of geotagged imagery, we seek to automatically find visual elements, e. g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously ...
PARIS: Partial instance and training set selection. A new scalable approach to multi-label classification
AbstractMulti-label classification has recently attracted research interest as a data mining task. Many current applications in data mining address problems that have instances belonging to more than one class. This requires the development of new ...
Highlights- This paper presents a new framework for instance selection for multi-label datasets.
- The method improves several state-of-the-art methods.
- The method is scalable to large multi-label datasets.
- Interesting new research lines are ...
Finding good coffee in paris
FUN'12: Proceedings of the 6th international conference on Fun with AlgorithmsFinding a good cup of coffee in Paris is difficult even among its world-renowned cafés, at least according to author David Downie (2011). We propose a solution that would allow tourists to create a map of the Paris Métro system from scratch that shows ...
Comments