Article

Free Access

Improved automatic keyword extraction given more linguistic knowledge

Author:
Anette Hulth

Stockholm University, Sweden

Stockholm University, Sweden
View Profile

EMNLP '03: Proceedings of the 2003 conference on Empirical methods in natural language processingJuly 2003Pages 216–223https://doi.org/10.3115/1119355.1119383

Published:11 July 2003Publication History

EMNLP '03: Proceedings of the 2003 conference on Empirical methods in natural language processing

Pages 216–223

ABSTRACT

In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the PoS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.

References

Ken Barker and Nadia Cornacchia. 2000. Using noun phrase heads to extract document keyphrases. In Canadian Conference on AI.]] Google ScholarDigital Library
Branimir Boguraev and Christopher Kennedy. 1999. Applications of term identification technology: Domain description and content characterisation. Natural Language Engineering, 5(1): 17--44.]] Google ScholarDigital Library
Didier Bourigault, Christian Jacquemin, and Marie-Claude L'Homme, editors. 2001. Recent Advances in Computational Terminology. John Benjamins Publishing Company, Amsterdam.]]Google Scholar
Leo Breiman. 1996. Bagging predictors. Machine Learning, 24(2): 123--140.]] Google ScholarCross Ref
Béatrice Daille, Éric Gaussier, and Jean-Marc Langé. 1994. Towards automatic extraction of monolingual and bilingual terminology. In Proceedings of COLING-94, pages 515--521, Kyoto, Japan.]] Google ScholarDigital Library
David K. Evans, Judith L. Klavans, and Nina Wacholder. 2000. Document processing with LinkIT. In Proceedings of the RIAO Conference, Paris, France.]]Google Scholar
Christopher Fox. 1992. Lexical analysis and stoplists. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 102--130. Prentice-Hall, New Jersey.]] Google ScholarDigital Library
Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'99), pages 668--673, Stockholm, Sweden.]] Google ScholarDigital Library
John S. Justeson and Slava M. Katz. 1995. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):9--27.]]Google ScholarCross Ref
Martin Porter. 1980. An algorithm for suffix stripping. Program, 14(3):130--137.]]Google ScholarCross Ref
Ralf Steinberger. 2001. Cross-lingual keyword assignment. In Proceedings of the XVII Conference of the Spanish Society for Natural Language Processing (SEPLN'2001), pages 273--280, Jaén, Spain.]]Google Scholar
Peter D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336.]] Google ScholarDigital Library

Recommendations

Automatic Keyword Extraction Using Linguistic Features
ICDMW '06: Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops

This paper describes a novel keyword extraction algorithm Position Weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence ...
Read More
Keyword Extraction Using Word Co-occurrence
DEXA '10: Proceedings of the 2010 Workshops on Database and Expert Systems Applications

A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a ...
Read More
Thesaurus Based Term Ranking for Keyword Extraction
DEXA '10: Proceedings of the 2010 Workshops on Database and Expert Systems Applications

In many cases keywords from a restricted set of possible keywords have to be assigned to texts. A common way to find the best keywords is to rank terms occurring in the text according to their tf.idf value. This requires a corpus of texts from which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '03: Proceedings of the 2003 conference on Empirical methods in natural language processing
July 2003
224 pages
Program Chairs:
Michael Collins
MIT AI Lab
,
Mark Steedman
University of Edinburgh
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 July 2003
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 122
  Total Citations
  View Citations
- 5,266
  Total Downloads
- Downloads (Last 12 months)392
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03: Proceedings of the 2003 conference on Empirical methods in natural language processing

ABSTRACT

References

Cited By

Recommendations

Automatic Keyword Extraction Using Linguistic Features

Keyword Extraction Using Word Co-occurrence

Thesaurus Based Term Ranking for Keyword Extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improved automatic keyword extraction given more linguistic knowledge

EMNLP '03: Proceedings of the 2003 conference on Empirical methods in natural language processing

ABSTRACT

References

Cited By

Recommendations

Automatic Keyword Extraction Using Linguistic Features

Keyword Extraction Using Word Co-occurrence

Thesaurus Based Term Ranking for Keyword Extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media