ABSTRACT
Our project involves building a platform able to retrieve, map and analyze the occurrences of place names in fictional novels published between 1800 and 1914 and whose action occurs wholly or partly in Paris. We describe a proof of concept using queries made via the TXM textual analysis platform for the extraction of street names. Then, we propose a fully automatic process using the named entity recognition (NER) components of the PERDIDO platform. This paper describes some encouraging initial results obtained by combining NLP approaches (NER methods) with textometric tools for the automated geoparsing of street names.
- Beatrice Alex, Kate Byrne, Claire Grover, and Richard Tobin. 2015. Adapting the Edinburgh geoparser for historical georeferencing. International Journal of Humanities and Arts Computing 9, 1 (2015), 15--35.Google ScholarCross Ref
- Beatrice Alex, Claire Grover, Jon Oberlander, Tara Thomson, Miranda Anderson, James Loxley, Uta Hinrichs, and Ke Zhou. 2016. Palimpsest: Improving assisted curation of loco-specific literature. Digital Scholarship in the Humanities 32, 1 (2016), i4--i16.Google Scholar
- Miranda Anderson and James Loxley. 2016. The Digital Poetics of Place-Names in Literary Edinburgh. Literary Mapping in the Digital Age (2016), 47.Google Scholar
- Frédéric Béchet, Benoît Sagot, and Rosa Stern. 2011. Coopération de méthodes statistiques et symboliques pour l'adaptation non-supervisée d'un système d'étiquetage en entités nommées. In TALN'2011 - Traitement Automatique des Langues Naturelles. https://hal.inria.fr/inria-00617068/documentGoogle Scholar
- Noémie Boeglin, Michel Depeyre, Thierry Joliveau, and Yves-Francois Le Lay. 2016. Pour une cartographie romanesque de Paris au XIXe siècle. Proposition méthodologique. In Actes de la conférence SAGEO'2016 - Spatial Analysis and GEOmatics. Nice, France, 76--90.Google Scholar
- TEI Consortium (Ed.). 2016. TEI P5: Guidelines for Electronic Text Encoding and Interchange. http://www.tei-c.org/Guidelines/P5/ (accessed July 2017). P5, version 3.1.0. Last updated on 15th December 2016.Google Scholar
- David Cooper, Christopher Donaldson, and Patricia Murrieta-Flores. 2016. Literary mapping in the digital age. Routledge.Google Scholar
- Nathalie Friburger and Denis Maurel. 2004. Finite-state transducer cascades to extract named entities in texts. Theoretical Computer Science 313, 1 (2004), 93--104. Google ScholarDigital Library
- Mauro Gaio and Ludovic Moncla. 2017. Extended Named Entity Recognition Using Finite-State Transducers: An Application to Place Names. In 9th International Conference on Advanced Geographic Information Systems, Applications, and Services. Nice, France.Google Scholar
- Ian Gregory and Christopher Donaldson. 2016. Geographical text analysis: Digital cartographies of Lake District literature. Literary Mapping in the Digital Age (2016), 67--87.Google Scholar
- Ian Gregory, Christopher Donaldson, Patricia Murrieta-Flores, and Paul Rayson. 2015. Geoparsing, GIS, and Textual Analysis: Current Developments in Spatial Humanities Research. International Journal of Humanities and Arts Computing 9, 1 (March 2015), 1--14.Google ScholarCross Ref
- Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham, and Nigel Collier. 2017. What's missing in geographical parsing? Language Resources and Evaluation (07 Mar 2017).Google Scholar
- Serge Heiden. 2010. The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In 24th Pacific Asia Conference on Language, Information and Computation, Otoguro Ryo, Ishikawa Kiyoshi, Umemoto Hiroshi, Yoshimoto Kei, and Harada Yasunari (Eds.). Institute for Digital Enhancement of Cognitive Development, Waseda University, Sendai, Japan, 389--398. https://halshs.archives-ouvertes.fr/halshs-00549764Google Scholar
- Ryan Heuser, Mark Algee-Hewitt, Van Tran, Annalise Lockhart, and Erik Steiner. 2015. Mapping the emotions of London in fiction, 1700-1900: A crowdsourcing experiment. Proceedings of the Digital Humanities (2015).Google Scholar
- Linda L Hill. 2006. Georeferencing: The geographic associations of information. Mit Press. Google ScholarDigital Library
- Kerstin Jonasson. 1994. Le nom propre. Duculot, Belgique, Louvain-la-Neuve.Google Scholar
- Alexei Lavrentiev, Serge Heiden, and Matthieu Decorde. 2013. Analyzing TEI encoded texts with the TXM platform. In The Linked TEI: Text Encoding in the Web. TEI Conference and Members Meeting 2013.Google Scholar
- Monica Matei-Chesnoiu. 2015. Geoparsing early modern English drama. Springer.Google Scholar
- Andrew McCallum and Wei Li. 2003. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4 (CONLL '03). Association for Computational Linguistics, Stroudsburg, PA, USA, 188--191. Google ScholarDigital Library
- Ludovic Moncla and Mauro Gaio. 2015. A Multi-layer Markup Language for Geospatial Semantic Annotations. In Proceedings of the 9th Workshop on Geographic Information Retrieval (GIR '15). ACM, New York, NY, USA, Article 5, 10 pages. Google ScholarDigital Library
- Ludovic Moncla, Mauro Gaio, Javier Nogueras-Iso, and Sébastien Mustière. 2016. Reconstruction of itineraries from annotated text with an informed spanning tree algorithm. International Journal of Geographical Information Science 30, 2 (2016). Google ScholarDigital Library
- Ludovic Moncla, Walter Renteria-Agualimpia, Javier Nogueras-Iso, and Mauro Gaio. 2014. Geocoding for Texts with Fine-grain Toponyms: An Experiment on a Geoparsed Hiking Descriptions Corpus. In 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL '14). ACM, Dallas,TX, USA, 183--192. Google ScholarDigital Library
- Franco Moretti. 1999. Atlas of the European novel, 1800--1900. Verso.Google Scholar
- Franco Moretti. 2005. Graphs, maps, trees: abstract models for a literary history. Verso.Google Scholar
- PERDIDO. 2017. Extended Named Entity Annotation Service. http://erig.univ-pau.fr/PERDIDO/api.jsp. (2017). {accessed 2017-07-9}.Google Scholar
- Barbara Piatti, Hans Rudolf Bär, Anne-Kathrin Reuschel, Lorenz Hurni, and William Cartwright. 2009. Mapping literature: Towards a geography of fiction. Cartography and art (2009), 1--16.Google Scholar
- Thierry Poibeau. 2003. In Extraction automatique d'information: du texte brut au web sémantique. Hermès Lavoisier.Google Scholar
- Thierry Poibeau. 2011. Traitement automatique du contenu textuel. Lavoisier.Google Scholar
- Lisa F. Rau. 1991. Extracting Company Names from Text. In Artificial Intelligence Applications. IEEE, Miami Beach, 29--32.Google Scholar
- Erik Rauch, Michael Bukatin, and Kenneth Baker. 2003. A Confidence-based Framework for Disambiguating Geographic Terms. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References - Volume 1 (HLT-NAACL-GEOREF '03). Association for Computational Linguistics, Stroudsburg, PA, USA, 50--54. Google ScholarDigital Library
- Unitex. 2017. Unitex/GramLab: an open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite. http://www-igm.univ-mlv.fr/~unitex/. (2017). {accessed 2017-01-12}.Google Scholar
- Barney Warf and Santa Arias. 2008. The spatial turn: Interdisciplinary perspectives. Routledge.Google Scholar
- GuoDong Zhou and Jian Su. 2002. Named Entity Recognition Using an HMM-based Chunk Tagger. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Association for Computational Linguistics, Stroudsburg, PA, USA, 473--480. Google ScholarDigital Library
Index Terms
- Automated Geoparsing of Paris Street Names in 19th Century Novels
Recommendations
On the Ambiguity and Relevance of Place Names in Scientific Text
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020How hard is it to systematically identify and disambiguate place names in scientific text? In order to address this question, we applied MapAffil, a toponymic search interface, on a random sample of 500 place name sentences from PubMed abstracts.
The ...
A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics
AbstractEmpirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage ...
Learning Recognition of Ambiguous Proper Names in Hindi
ICMLA '11: Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01An ambiguous proper name is a name which is also a valid dictionary word with a meaning of its own when used in the text. For example in English, the word 'bush' in 'Mr. Bush' is a proper name whereas in 'a dense bush' it is a lexical entity. Almost all ...
Comments