ABSTRACT
A seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted extraction patterns or domain-specific knowledge, allowing for the acquisition of named entities pertaining to various classes of interest to Web search users. Inherently noisy search queries are shown to be a highly valuable, albeit little explored, resource for Web-based named entity discovery.
- E. Brill and P. Resnik. A transformation-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), pages 1198--1204, Kyoto, Japan, 1994. Google ScholarDigital Library
- M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005. Google ScholarDigital Library
- M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 189--196, College Park, Maryland, 1999.Google Scholar
- S. Cucerzan and D. Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 90--99, College Park, Maryland, 1999.Google Scholar
- H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002. Google ScholarDigital Library
- A. Klementiev and D. Roth. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 817--824, Sydney, Australia, 2006. Google ScholarDigital Library
- L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL-99), pages 25--32, College Park, Maryland, 1999. Google ScholarDigital Library
- M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006. Google ScholarDigital Library
- K. McCarthy and W. Lehnert. Using decision trees for coreference resolution. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), pages 1050--1055, Montreal, Quebec, 1995. Google ScholarDigital Library
- R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005. Google ScholarDigital Library
- M. Paşca. Acquisition of categorized named entities for Web search. In Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM-04), Washington, D.C., 2004. Google ScholarDigital Library
- M. Paşca. Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07), pages 101--110, Banff, Canada, 2007. Google ScholarDigital Library
- P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.Google Scholar
- E. Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044--1049, Portland, Oregon, 1996. Google ScholarDigital Library
- E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 474--479, Orlando, Florida, 1999. Google ScholarDigital Library
- L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, 2006. Google ScholarDigital Library
- Y. Shinyama and S. Sekine. Named entity discovery using comparable news articles. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 848--853, Geneva, Switzerland, 2004. Google ScholarDigital Library
- M. Stevenson and R. Gaizauskas. Using corpus-derived name lists for named entity recognition. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), Seattle, Washington, 2000. Google ScholarDigital Library
- P. Talukdar, T. Brants, M. Liberman, and F. Pereira. A context pattern induction method for named entity extraction. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pages 141--148, New York, New York, 2006. Google ScholarDigital Library
- M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-02), pages 214--221, Philadelphia, Pennsylvania, 2002. Google ScholarDigital Library
- Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), pages 860--861, Arlington, Virginia, 2006. Google ScholarDigital Library
Index Terms
- Weakly-supervised discovery of named entities using web search queries
Recommendations
Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora
Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the ...
From names to entities using thematic context distance
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementName ambiguity arises from the polysemy of names and causes uncertainty about the true identity of entities referenced in unstructured text. This is a major problem in areas like information retrieval or knowledge management, for example when searching ...
Detecting candidate named entities in search queries
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalThe information extraction task of Named Entities Recognition (NER) has been recently applied to search engine queries, in order to better understand their semantics. Here we concentrate on the task prior to the classification of the named entities (NEs)...
Comments