skip to main content
10.3115/1119355.1119383dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
Article
Free Access

Improved automatic keyword extraction given more linguistic knowledge

Published:11 July 2003Publication History

ABSTRACT

In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the PoS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.

References

  1. Ken Barker and Nadia Cornacchia. 2000. Using noun phrase heads to extract document keyphrases. In Canadian Conference on AI.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Branimir Boguraev and Christopher Kennedy. 1999. Applications of term identification technology: Domain description and content characterisation. Natural Language Engineering, 5(1): 17--44.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Didier Bourigault, Christian Jacquemin, and Marie-Claude L'Homme, editors. 2001. Recent Advances in Computational Terminology. John Benjamins Publishing Company, Amsterdam.]]Google ScholarGoogle Scholar
  4. Leo Breiman. 1996. Bagging predictors. Machine Learning, 24(2): 123--140.]] Google ScholarGoogle ScholarCross RefCross Ref
  5. Béatrice Daille, Éric Gaussier, and Jean-Marc Langé. 1994. Towards automatic extraction of monolingual and bilingual terminology. In Proceedings of COLING-94, pages 515--521, Kyoto, Japan.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David K. Evans, Judith L. Klavans, and Nina Wacholder. 2000. Document processing with LinkIT. In Proceedings of the RIAO Conference, Paris, France.]]Google ScholarGoogle Scholar
  7. Christopher Fox. 1992. Lexical analysis and stoplists. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 102--130. Prentice-Hall, New Jersey.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'99), pages 668--673, Stockholm, Sweden.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John S. Justeson and Slava M. Katz. 1995. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):9--27.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. Martin Porter. 1980. An algorithm for suffix stripping. Program, 14(3):130--137.]]Google ScholarGoogle ScholarCross RefCross Ref
  11. Ralf Steinberger. 2001. Cross-lingual keyword assignment. In Proceedings of the XVII Conference of the Spanish Society for Natural Language Processing (SEPLN'2001), pages 273--280, Jaén, Spain.]]Google ScholarGoogle Scholar
  12. Peter D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    EMNLP '03: Proceedings of the 2003 conference on Empirical methods in natural language processing
    July 2003
    224 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 11 July 2003

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate73of234submissions,31%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader