skip to main content
10.1145/1321440.1321536acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Weakly-supervised discovery of named entities using web search queries

Published:06 November 2007Publication History

ABSTRACT

A seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted extraction patterns or domain-specific knowledge, allowing for the acquisition of named entities pertaining to various classes of interest to Web search users. Inherently noisy search queries are shown to be a highly valuable, albeit little explored, resource for Web-based named entity discovery.

References

  1. E. Brill and P. Resnik. A transformation-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), pages 1198--1204, Kyoto, Japan, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 189--196, College Park, Maryland, 1999.Google ScholarGoogle Scholar
  4. S. Cucerzan and D. Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 90--99, College Park, Maryland, 1999.Google ScholarGoogle Scholar
  5. H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Klementiev and D. Roth. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 817--824, Sydney, Australia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL-99), pages 25--32, College Park, Maryland, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. McCarthy and W. Lehnert. Using decision trees for coreference resolution. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), pages 1050--1055, Montreal, Quebec, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Paşca. Acquisition of categorized named entities for Web search. In Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM-04), Washington, D.C., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Paşca. Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07), pages 101--110, Banff, Canada, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.Google ScholarGoogle Scholar
  14. E. Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044--1049, Portland, Oregon, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 474--479, Orlando, Florida, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Shinyama and S. Sekine. Named entity discovery using comparable news articles. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 848--853, Geneva, Switzerland, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Stevenson and R. Gaizauskas. Using corpus-derived name lists for named entity recognition. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), Seattle, Washington, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Talukdar, T. Brants, M. Liberman, and F. Pereira. A context pattern induction method for named entity extraction. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pages 141--148, New York, New York, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-02), pages 214--221, Philadelphia, Pennsylvania, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), pages 860--861, Arlington, Virginia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Weakly-supervised discovery of named entities using web search queries

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader