ABSTRACT
Predicting geographic location using exclusively the visual content of images holds the promise of greatly benefiting users' access to media collections. In this paper, we present a visual-content-based approach that predicts where in the world a social image was taken. We employ a ranking method that assigns a query photo the geo-location of its most likely geo-visual neighbor in the social image collection. The novelty of the approach is that ranking makes use not only of the photos themselves, but also their geo-visual neighbors. In contrast to other approaches, we do not restrict the locations we predict to landmarks or specific cities. The approach is evaluated on a set of 3 million geo-tagged photos from Flickr, released by MediaEval 2012. Experiments show that the proposed system delivers a substantive performance improvement compared with previously proposed, related visual content-based approaches. The discussion illustrates how photo densities, geo-visual redundancy and uploader patterns characteristic of social image collections impacts the performance.
- H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (SURF). Computer vision and image understanding, 110(3):346--359, 2008. Google ScholarDigital Library
- D. Chen et al. City-scale landmark identification on mobile devices. In Proc. CVPR '11, 2011. Google ScholarDigital Library
- D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In Proc. WWW '09, 2009. Google ScholarDigital Library
- G. Friedland, J. Choi, H. Lei, and A. Janin. Multimodal location estimation on Flickr videos. In Proc. WSM '11, 2011. Google ScholarDigital Library
- Q. Hao et al. Travelscope: standing on the shoulders of dedicated travelers. In Proc. MM '09, 2009. Google ScholarDigital Library
- C. Hauff and G. Houben. Placing images on the world map: a microblog-based enrichment approach. In Proc. SIGIR '12, 2012. Google ScholarDigital Library
- J. Hays and A. Efros. IM2GPS: estimating geographic information from a single image. In Proc. CVPR '08, 2008.Google ScholarCross Ref
- H. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87(3):316--336, 2010. Google ScholarDigital Library
- L. Juan and O. Gwun. A comparison of SIFT, PCA-SIFT and SURF. International Journal of Image Processing (IJIP), 3(4):143--152, 2009.Google Scholar
- L. S. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In Proc. WWW '08, 2008. Google ScholarDigital Library
- H. Kretzschmar, C. Stachniss, C. Plagemann, and W. Burgard. Estimating landmark locations from geo-referenced photographs. In Proc. IROS '08, 2008.Google ScholarCross Ref
- M. Larson et al. Automatic tagging and geotagging in video collections and communities. In Proc. ICMR '11, 2011. Google ScholarDigital Library
- Y. Li, D. Crandall, and D. Huttenlocher. Landmark classification in large-scale image collections. In Proc. ICCV '09, 2009.Google Scholar
- D. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004. Google ScholarDigital Library
- J. Luo, D. Joshi, J. Yu, and A. Gallagher. Geotagging in multimedia and computer vision-a survey. Multimedia Tools Appl., 51(1):187--211, 2011. Google ScholarDigital Library
- K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. International journal of computer vision, 60(1):63--86, 2004. Google ScholarDigital Library
- O. A. B. Penatti, L. T. Li, J. Almeida, and R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. In Proc. ICMR '12, 2012. Google ScholarDigital Library
- A. Rae and P. Kelm. Working notes for the Placing Task at MediaEval 2012. In MediaEval 2012 Workshop, 2012.Google Scholar
- T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from Flickr tags. In Proc. SIGIR '07, 2007. Google ScholarDigital Library
- P. Serdyukov, V. Murdock, and R. van Zwol. Placing Flickr photos on a map. In Proc. SIGIR '09, 2009. Google ScholarDigital Library
- J. Sivic and A. Zisserman. Video Google: a text retrieval approach to object matching in videos. In Proc. ICCV '03, 2003. Google ScholarDigital Library
- U. Steinhoff et al. How computer vision can help in outdoor positioning. In Proc. AmI '07, 2007. Google ScholarDigital Library
- O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using language models and similarity search. In Proc. ICMR '11, 2011. Google ScholarDigital Library
- K. Yannis et al. VIRaL: Visual image retrieval and localization. Multimedia Tools and Applications, 51:555--592, 2011. Google ScholarDigital Library
- A. R. Zamir and M. Shah. Accurate image localization based on Google Maps street view. In Proc. ECCV '10, 2010. Google ScholarDigital Library
- W. Zhang and J. Kosecka. Image based localization in urban environments. In Proc. 3DPVT '06, 2006. Google ScholarDigital Library
Index Terms
- Geo-visual ranking for location prediction of social images
Recommendations
Global-Scale Location Prediction for Social Images Using Geo-Visual Ranking
We propose an automatic method that addresses the challenge of predicting the geo-location of social images using only the visual content of those images. Our method is able to generate a geo-location prediction for an image globally . In this respect, it ...
Location Prediction of Social Images via Generative Model
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia RetrievalThe vast amount of geo-tagged social images has attracted great attention in research of predicting location using the plentiful content of images, such as visual content and textual description. Most of the existing researches use the text-based or ...
Preserving location and absence privacy in geo-social networks
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementOnline social networks often involve very large numbers of users who share very large volumes of content. This content is increasingly being tagged with geo-spatial and temporal coordinates that may then be used in services. For example, a service may ...
Comments