skip to main content
10.5555/1857999.1858055dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free Access

For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

Published:02 June 2010Publication History

ABSTRACT

We report on work in progress on extracting lexical simplifications (e.g., "collaborate" → "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.

References

  1. R. Chandrasekar, B. Srinivas. Automatic induction of rules for text simplification. Knowledge-Based Systems, 1997.Google ScholarGoogle Scholar
  2. L. Deléger, P. Zweigenbaum. Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora. Workshop on Building and Using Comparable Corpora, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Devlin, J. Tait. The use of a psycholinguistic database in the simplification of text for aphasic readers. In Linguistic Databases, 1998.Google ScholarGoogle Scholar
  4. N. Elhadad, K. Sutaria. Mining a lexicon of technical terms and lay equivalents. Workshop on BioNLP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Beigman Klebanov, K. Knight, D. Marcu. Text simplification for information-seeking applications. OTM Conferences, 2004.Google ScholarGoogle Scholar
  6. R. Nelken, S. M. Shieber. Towards robust context-sensitive sentence alignment for monolingual corpora. EACL, 2006.Google ScholarGoogle Scholar
  7. R. Nelken, E. Yamangil. Mining Wikipedia's article revision history for training computational linguistics algorithms. WikiAI, 2008.Google ScholarGoogle Scholar
  8. E. Shnarch, L. Barak, I. Dagan. Extracting lexical reference rules from Wikipedia. ACL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Siddharthan, A. Nenkova, K. McKeown. Syntactic simplification for improving content selection in multi-document summarization. COLING, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Vickrey, D. Koller. Sentence simplification for semantic role labeling/ ACL, 2008.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
    June 2010
    1070 pages
    ISBN:1932432655

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 2 June 2010

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate240of768submissions,31%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader