skip to main content
10.1145/2232817.2232859acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Event-centric search and exploration in document collections

Published:10 June 2012Publication History

ABSTRACT

Textual data ranging from corpora of digitized historic documents to large collections of news feeds provide a rich source for temporal and geographic information. Such types of information have recently gained a lot of interest in support of different search and exploration tasks, e.g., by organizing news along a timeline or placing the origin of documents on a map. However, for this, temporal and geographic information embedded in documents is often considered in isolation. We claim that through combining such information into (chronologically ordered) event-like features interesting and meaningful search and exploration tasks are possible. In this paper, we present a framework for the extraction, exploration, and visualization of event information in document collections. For this, one has to identify and combine temporal and geographic expressions from documents, thus enriching a document collection by a set of normalized events. Traditional search queries then can be enriched by conditions on the events relevant to the search subject. Most important for our event-centric approach is that a search result consists of a sequence of events relevant to the search terms and not just a document hit-list. Such events can originate from different documents and can be further explored, in particular events relevant to a search query can be ordered chronologically. We demonstrate the utility of our framework by different (multilingual) search and exploration scenarios using a Wikipedia corpus.

References

  1. D. Ahn. The Stages of Event Extraction. In Proc. of the Workshop on Annotating and Reasoning about Time and Events, pages 1--8, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Alonso, M. Gertz, and R. Baeza-Yates. On the Value of Temporal Information in Information Retrieval. SIGIR Forum, 41(2):35--41, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Alonso, J. Strotgen, R. Baeza-Yates, and M. Gertz. Temporal Information Retrieval: Challenges and Opportunities. In Proceedings of the 1st International Temporal Web Analytics Workshop, pages 1--8, 2011.Google ScholarGoogle Scholar
  4. Y.-F. R. Chen, G. Di Fabbrizio, D. Gibbon, S. Jora, B. Renger, and B. Wei. GeoTracker: Geospatial and Temporal RSS Navigation. In WWW '07, pages 41--50, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. L. Chieu and Y. K. Lee. Query based Event Extraction along a Timeline. In SIGIR '04, pages 425--432, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Gey, R. Larson, N. Kando, J. Machado, and T. Sakai. NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search. In Proceedings of NTCIR-8, 2010.Google ScholarGoogle Scholar
  7. GuTime. http://timeml.org/site/tarsqi/modules/gutime.Google ScholarGoogle Scholar
  8. J. L. Leidner, G. Sinclair, and B. Webber. Grounding Spatial Named Entities for Information Extraction and Question Answering. In Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, pages 31--38, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Li, R. K. Srihari, C. Niu, and W. Li. Location Normalization for Information Extraction. In COLING'02, pages 1--7, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Liao and R. Grishman. Using Document Level Cross-Event Inference to Improve Event Extraction. In ACL'10, pages 789--797, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. D. Lieberman, H. Samet, J. Sankaranarayanan, and J. Sperling. STEWARD: Architecture of a Spatio-textual Search Engine. In GIS '07, pages 186--193, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lucene. http://lucene.apache.org/.Google ScholarGoogle Scholar
  13. I. Mani, J. Pustejovsky, and R. Gaizauskas, editors. The Language of Time. Oxford University Press, 2005.Google ScholarGoogle Scholar
  14. C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Martins, H. Manguinhas, and J. Borbinha. Extracting and Exploring the Geo-Temporal Semantics of Textual Resources. In ICSC'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Mata and C. Claramunt. GeoST: Geographic, Thematic and Temporal Information Retrieval from Heterogeneous Web Data Sources. In W2GIS'11, pages 5--20, 2011. Google ScholarGoogle Scholar
  17. P. Mazur and R. Dale. WikiWars: A New Corpus for Research on Temporal Expressions. In EMNLP'10, pages 913--922, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. MetaCarta. http://www.metacarta.com/.Google ScholarGoogle Scholar
  19. OpenNLP. http://opennlp.sourceforge.net/.Google ScholarGoogle Scholar
  20. V. Petras, R. R. Larson, and M. Buckland. Time Period Directories: A Metadata Infrastructure for Placing Events in Temporal and Geographic Context. In JCDL'06, pages 151--160, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Pfoser, A. Efentakis, T. Hadzilacos, S. Karagiorgou, and G. Vasiliou. Providing Universal Access to History Textbooks: A Modified GIS Case. In W2GIS'11, pages 87--102, 2009. Google ScholarGoogle Scholar
  22. R. Purves, P. Clough, and C. Jones, editors. GIR '10: Proceedings of the 6th Workshop on Geographic Information Retrieval, Zurich, Switzerland, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Pustejovsky, R. Knippen, J. Littman, and R. Sauri. Temporal and Event Information in Natural Language Text. Language Resources and Evaluation, 39(2-3):123--164, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Quercini, H. Samet, J. Sankaranarayanan, and M. D. Lieberman. Determining the Spatial Reader Scopes of News Sources using Local Lexicons. In GIS'10, pages 43--52, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Strotgen and M. Gertz. HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions. In SemEval'10, pages 321--324, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Strotgen and M. Gertz. TimeTrails: A System for Exploring Spatio-Temporal Information in Documents. In VLDB'10, pages 1569--1572, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Strotgen and M. Gertz. WikiWarsDE: A German Corpus of Narratives Annotated with Temporal Expressions. In GSCL'11, pages 129--134, 2011.Google ScholarGoogle Scholar
  28. J. Strotgen and M. Gertz. Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, accepted for journal publication, 2012.Google ScholarGoogle Scholar
  29. J. Strotgen, M. Gertz, and C. Junghans. An Event-centric Model for Multilingual Document Similarity. In SIGIR'11, pages 953--962, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky. SemEval-2010 Task 13: TempEval-2. In SemEval'10, pages 57--62, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Wang, B. Yang, S. Zoupanos, M. Spaniol, and G. Weikum. Scalable Spatio-temporal Knowledge Harvesting. In WWW'11, pages 143--144, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wikipedia Featured Articles. http://en.wikipedia.org/wiki/Wikipedia:FA.Google ScholarGoogle Scholar
  33. Yahoo! Placemaker. http://developer.yahoo.com/geo/placemaker/.Google ScholarGoogle Scholar
  34. M. Yamamoto, Y. Takahashi, H. Iwasaki, S. Oyama, H. Ohshima, and K. Tanaka. Extraction and Geographical Navigation of Important Historical Events in the Web. In W2GIS'11, pages 21--35, 2011. Google ScholarGoogle Scholar

Index Terms

  1. Event-centric search and exploration in document collections

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
        June 2012
        458 pages
        ISBN:9781450311540
        DOI:10.1145/2232817

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate415of1,482submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader