skip to main content
research-article

Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

Published:13 June 2018Publication History
Skip Abstract Section

Abstract

Location extraction, also called “toponym extraction,” is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five “best-of-class” location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third-party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach, and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses, and a detailed failure analysis for the approaches and suggest concrete areas for further research.

References

  1. Ritesh J. Agrawal and James G. Shanahan. 2010. Location disambiguation in local searches using gradient boosted decision trees. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS'10). ACM, New York, NY, 129--136. http://doi.acm.org/10.1145/1869790.1869811 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Einat Amitay, Nadav Har'El, Ron Sivan, and Aya Soffer. 2004. Web-a-where: Geotagging web content. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY, 273--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kalina Bontcheva, Leon Derczynski, Adam Funk, Mark A. Greenwood, Diana Maynard, and Niraj Aswani. 2013. TwitIE: An open-source information extraction pipeline for microblog text. In Proceedings of Recent Advances in Natural Language Processing. 83--90.Google ScholarGoogle Scholar
  4. Davide Buscaldi and Paolo Rosso. 2008. A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographical Information Systems 22, 3 (2008), 301--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guoray Cai and Ye Tian. 2016. Towards geo-referencing infrastructure for local news. In Proceedings of the 10th Workshop on Geographic Information Retrieval (GIR’16). ACM, New York, Article 9, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: A content-based approach to geo-locating Twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 759--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jaeyoung Choi, Bart Thomee, Gerald Friedland, Liangliang Cao, Karl Ni, Damian Borth, Benjamin Elizalde, Luke Gottlieb, Carmen Carrano, Roger Pearce, and Doug Poland. 2014. The placing task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia (GeoMM’14). ACM, New York, NY, 27--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wingyan Chung. 2014. BizPro: Extracting and categorizing business intelligence factors from textual news articles. International Journal of Information Management 34, 2 (2014), 272--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paul Earle, Daniel C. Bowden, and Michelle Guy. 2011. Twitter earthquake detection: Earthquake monitoring in a social world. Annals of Geophysics 54, 6 (2011), 708--715.Google ScholarGoogle Scholar
  11. David Flatow, Mor Naaman, Ke Eddie Xie, Yana Volkovich, and Yaron Kanza. 2015. On the accuracy of hyper-local geotagging of social media content. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM’15). ACM, New York, NY, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Judith Gelernter and Shilpa Balaji. 2013. An algorithm for local geoparsing of microtext. Geoinformatica 17, 4 (2013), 635--667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Judith Gelernter and Wei Zhang. 2013. Cross-lingual geo-parsing for non-structured data. In Proceedings of the 7th Workshop on Geographic Information Retrieval. ACM, New York, NY, 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2015. CERTH/CEA LIST at MediaEval Placing Task 2015. MediaEval 2015.Google ScholarGoogle Scholar
  15. Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2015. Geotagging social media content with a refined language modelling approach. In Proceedings of the Pacific-Asia Workshop on Intelligence and Security Informatics.Google ScholarGoogle ScholarCross RefCross Ref
  16. Giorgos Kordopatis-Zilos, Adrian Popescu, Symeon Papadopoulos, and Yannis Kompatsiaris. 2016. Placing images with refined language models and similarity search with PCA-reduced VGG features. In Proceedings of MediaEval Workshop 2016.Google ScholarGoogle Scholar
  17. Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2017. Geotagging text content with language models and feature mining. Proceedings of the IEEE 105, 10 (2017), 1971--1986.Google ScholarGoogle ScholarCross RefCross Ref
  18. Olivier Van Laere, Jonathan Quinn, Steven Schockaert, and Bart Dhoedt. 2014. Spatially aware term selection for geotagging. IEEE Transactions on Knowledge and Data Engineering 26, 1 (2014), 221--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Niels Buus Lassen, Rene Madsen, and Ravi Vatrapu. 2014. Predicting iPhone sales from iPhone tweets. In Proceedings of the 2014 IEEE 18th International Enterprise Distributed Object Computing Conference (EDOC’14). IEEE, Los Alamitos, CA, 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sunshin Lee, Mohamed Farag, Tarek Kanan, and Edward A. Fox. 2015. Read between the lines: A machine learning approach for disambiguating the geo-location of tweets. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’15). ACM, New York, NY, 273--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jochen L. Leidner. 2008, Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Ph.D. Dissertation. School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  22. Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. 2014. Home location identification of Twitter users. ACM Transactions on Intelligent Systems and Technologies 5, 3, Article 47, 21 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Stuart E. Middleton, Lee Middleton, and Stefano Modafferi. 2014. Real-time crisis mapping of natural disasters using social media. IEEE Intelligent Systems 29, 2 (2104), 9--17.Google ScholarGoogle ScholarCross RefCross Ref
  24. Stuart E. Middleton and Vadims Krivcovs. 2016. Geoparsing and geosemantics for social media: Spatio-temporal grounding of content propagating rumours to support trust and veracity analysis during breaking news. ACM Transactions on Information Systems 34, 3, Article 16, 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ludovic Moncla, Walter Renteria-Agualimpia, Javier Nogueras-Iso, and Mauro Gaio. 2014. Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’14). ACM, New York, NY, 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pavlos Paraskevopoulos and Themis Palpanas. 2016. Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Social Network Analysis and Mining 6 (2016), 89.Google ScholarGoogle ScholarCross RefCross Ref
  27. Robert C. Pasley, Paul D. Clough, and Mark Sanderson. 2007. Geo-tagging for imprecise regions of different sizes. In Proceedings of the 4th ACM Workshop on Geographical Information Retrieval (GIR’07). ACM, New York, NY, 77--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Adrian Popescu and Nicolas Ballas. 2013. CEA LIST's participation at MediaEval 2013 placing task. In Proceedings of MediaEval Workshop 2013.Google ScholarGoogle Scholar
  29. Ross S. Purves, Paul Clough, Christopher B. Jones, Avi Arampatzis, Benedicte Bucher, David Finch, Gaihua Fu, Hideo Joho, Awase Khirni Syed, Subodh Vaid, and Bisheng Yang. 2007. The design and implementation of SPIRIT: A spatially aware search engine for information retrieval on the Internet. International Journal of Geographical Information Science 21, 7 (2007), 717--745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Compton Ryan, David Jurgens, and David Allen. 2014. Geotagging one hundred million Twitter accounts with total variation minimization. In Proceedings of the IEEE International Conference on Big Data (Big Data’14).Google ScholarGoogle Scholar
  31. Pavel Serdyukov, Vanessa Murdock, and Roelof van Zwol. 2009. Placing Flickr photos on a map. In Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, New York, NY, 484--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Craig Silverman. 2014. Verification Handbook: A Definitive Guide to Verifying Digital Content for Emergency Coverage. European Journalism Centre.Google ScholarGoogle Scholar
  33. David A. Smith and Gregory Crane. 2001. Disambiguating geographic names in a historical digital library. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL’01), Panos Constantopoulos and Ingeborg Sølvberg (Eds.). Springer-Verlag, London, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Russell Swan and James Allan. 1999. Extracting significant time varying features from text. In Proceedings of the 8th International Conference on Information and Knowledge Management (CIKM’99). ACM, New York, NY, 38--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tasnia Tahsin, Davy Weissenbacher, Robert Rivera, Rachel Beard, Mari Firago, Garrick Wallstrom, Matthew Scotch, and Graciela Gonzalez. 2016. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records. Journal of the American Medical Information Association 23, 5 (2016), 934--941.Google ScholarGoogle ScholarCross RefCross Ref
  36. Dan Tasse, Zichen Liu, Alex Sciuto, and Jason I. Hong. 2017. State of the geotags: Motivations and recent changes. In Proceedings of the 11th International Conference on Weblogs and Social Media (ICWSM’17). 250--259.Google ScholarGoogle Scholar
  37. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Communications of the ACM 59, 2 (2016), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Marc Verhagen, Roser Saur, Tommaso Caselli, and James Pustejovsky. 2010. SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval’10), 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xingguang Wang, Yi Zhang, Min Chen, Xing Lin, Hao Yu, and Yu Liu. 2010. An evidence-based approach for toponym disambiguation. In 18th International Conference on Geoinformatics (Geoinformatics’10). Article 5567805.Google ScholarGoogle ScholarCross RefCross Ref
  40. Stefanie Wiegand and Stuart E. Middleton. 2016. Veracity and velocity of social media content during breaking news: Analysis of November 2015 Paris shootings. In Proceedings of the 3rd Workshop on Social News on the Web (SNOW’16), Companion of the 25th International World Wide Web Conference WWW’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Benjamin P. Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (HLT’11), Vol. 1. 955--964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jiangwei Yu Rafiei and Davood Rafiei. 2016. Geotagging named entities in news and online documents. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM’16). ACM, New York, NY, 1321--1330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wei Zhang and Judith Gelernter. 2014. Geocoding location expressions in Twitter messages: A preference learning method. Journal of Spatial Information Science 9 (2014), 37--70.Google ScholarGoogle Scholar

Index Terms

  1. Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 36, Issue 4
        October 2018
        365 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/3211967
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2018
        • Accepted: 1 March 2018
        • Revised: 1 February 2018
        • Received: 1 April 2017
        Published in tois Volume 36, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader