ABSTRACT
Textual data ranging from corpora of digitized historic documents to large collections of news feeds provide a rich source for temporal and geographic information. Such types of information have recently gained a lot of interest in support of different search and exploration tasks, e.g., by organizing news along a timeline or placing the origin of documents on a map. However, for this, temporal and geographic information embedded in documents is often considered in isolation. We claim that through combining such information into (chronologically ordered) event-like features interesting and meaningful search and exploration tasks are possible. In this paper, we present a framework for the extraction, exploration, and visualization of event information in document collections. For this, one has to identify and combine temporal and geographic expressions from documents, thus enriching a document collection by a set of normalized events. Traditional search queries then can be enriched by conditions on the events relevant to the search subject. Most important for our event-centric approach is that a search result consists of a sequence of events relevant to the search terms and not just a document hit-list. Such events can originate from different documents and can be further explored, in particular events relevant to a search query can be ordered chronologically. We demonstrate the utility of our framework by different (multilingual) search and exploration scenarios using a Wikipedia corpus.
- D. Ahn. The Stages of Event Extraction. In Proc. of the Workshop on Annotating and Reasoning about Time and Events, pages 1--8, 2006. Google ScholarDigital Library
- O. Alonso, M. Gertz, and R. Baeza-Yates. On the Value of Temporal Information in Information Retrieval. SIGIR Forum, 41(2):35--41, 2007. Google ScholarDigital Library
- O. Alonso, J. Strotgen, R. Baeza-Yates, and M. Gertz. Temporal Information Retrieval: Challenges and Opportunities. In Proceedings of the 1st International Temporal Web Analytics Workshop, pages 1--8, 2011.Google Scholar
- Y.-F. R. Chen, G. Di Fabbrizio, D. Gibbon, S. Jora, B. Renger, and B. Wei. GeoTracker: Geospatial and Temporal RSS Navigation. In WWW '07, pages 41--50, 2007. Google ScholarDigital Library
- H. L. Chieu and Y. K. Lee. Query based Event Extraction along a Timeline. In SIGIR '04, pages 425--432, 2004. Google ScholarDigital Library
- F. Gey, R. Larson, N. Kando, J. Machado, and T. Sakai. NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search. In Proceedings of NTCIR-8, 2010.Google Scholar
- GuTime. http://timeml.org/site/tarsqi/modules/gutime.Google Scholar
- J. L. Leidner, G. Sinclair, and B. Webber. Grounding Spatial Named Entities for Information Extraction and Question Answering. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 31--38, 2003. Google ScholarDigital Library
- H. Li, R. K. Srihari, C. Niu, and W. Li. Location Normalization for Information Extraction. In COLING'02, pages 1--7, 2002. Google ScholarDigital Library
- S. Liao and R. Grishman. Using Document Level Cross-Event Inference to Improve Event Extraction. In ACL'10, pages 789--797, 2010. Google ScholarDigital Library
- M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. STEWARD: Architecture of a Spatio-textual Search Engine. In GIS '07, pages 186--193, 2007. Google ScholarDigital Library
- Lucene. http://lucene.apache.org/.Google Scholar
- I. Mani, J. Pustejovsky, and R. Gaizauskas, editors. The Language of Time. Oxford University Press, 2005.Google Scholar
- C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
- B. Martins, H. Manguinhas, and J. Borbinha. Extracting and Exploring the Geo-Temporal Semantics of Textual Resources. In ICSC'08, 2008. Google ScholarDigital Library
- F. Mata and C. Claramunt. GeoST: Geographic, Thematic and Temporal Information Retrieval from Heterogeneous Web Data Sources. In W2GIS'11, pages 5--20, 2011. Google Scholar
- P. Mazur and R. Dale. WikiWars: A New Corpus for Research on Temporal Expressions. In EMNLP'10, pages 913--922, 2010. Google ScholarDigital Library
- MetaCarta. http://www.metacarta.com/.Google Scholar
- OpenNLP. http://opennlp.sourceforge.net/.Google Scholar
- V. Petras, R. R. Larson, and M. Buckland. Time Period Directories: A Metadata Infrastructure for Placing Events in Temporal and Geographic Context. In JCDL'06, pages 151--160, 2006. Google ScholarDigital Library
- D. Pfoser, A. Efentakis, T. Hadzilacos, S. Karagiorgou, and G. Vasiliou. Providing Universal Access to History Textbooks: A Modified GIS Case. In W2GIS'11, pages 87--102, 2009. Google Scholar
- R. Purves, P. Clough, and C. Jones, editors. GIR '10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, 2010. Google ScholarCross Ref
- J. Pustejovsky, R. Knippen, J. Littman, and R. Sauri. Temporal and Event Information in Natural Language Text. Language Resources and Evaluation, 39(2-3):123--164, 2005.Google ScholarCross Ref
- G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the Spatial Reader Scopes of News Sources using Local Lexicons. In GIS'10, pages 43--52, 2010. Google ScholarDigital Library
- J. Strotgen and M. Gertz. HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions. In SemEval'10, pages 321--324, 2010. Google ScholarDigital Library
- J. Strotgen and M. Gertz. TimeTrails: A System for Exploring Spatio-Temporal Information in Documents. In VLDB'10, pages 1569--1572, 2010. Google ScholarDigital Library
- J. Strotgen and M. Gertz. WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions. In GSCL'11, pages 129--134, 2011.Google Scholar
- J. Strotgen and M. Gertz. Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, accepted for journal publication, 2012.Google Scholar
- J. Strotgen, M. Gertz, and C. Junghans. An Event-centric Model for Multilingual Document Similarity. In SIGIR'11, pages 953--962, 2011. Google ScholarDigital Library
- M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky. SemEval-2010 Task 13: TempEval-2. In SemEval'10, pages 57--62, 2010. Google ScholarDigital Library
- Y. Wang, B. Yang, S. Zoupanos, M. Spaniol, and G. Weikum. Scalable Spatio-temporal Knowledge Harvesting. In WWW'11, pages 143--144, 2011. Google ScholarDigital Library
- Wikipedia Featured Articles. http://en.wikipedia.org/wiki/Wikipedia:FA.Google Scholar
- Yahoo! Placemaker. http://developer.yahoo.com/geo/placemaker/.Google Scholar
- M. Yamamoto, Y. Takahashi, H. Iwasaki, S. Oyama, H. Ohshima, and K. Tanaka. Extraction and Geographical Navigation of Important Historical Events in the Web. In W2GIS'11, pages 21--35, 2011. Google Scholar
Index Terms
- Event-centric search and exploration in document collections
Recommendations
An event-centric model for multilingual document similarity
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalDocument similarity measures play an important role in many document retrieval and exploration tasks. Over the past decades, several models and techniques have been developed to determine a ranked list of documents similar to a given query document. ...
Interactive visualization for opportunistic exploration of large document collections
Finding relevant information in a large and comprehensive collection of cross-referenced documents like Wikipedia usually requires a quite accurate idea where to look for the pieces of data being sought. A user might not yet have enough domain-specific ...
Proximity2-aware ranking for textual, temporal, and geographic queries
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementTemporal and geographic information needs are frequent and important but not well served by standard IR systems. Recent approaches address such needs by extracting and normalizing temporal and geographic expressions from documents. They calculate ...
Comments