research-article

Free Access

For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

Authors:
Mark Yatskar

View Profile

,
Bo Pang

View Profile

,
Cristian Danescu-Niculescu-Mizil

View Profile

,
Lillian Lee

View Profile

HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational LinguisticsJune 2010Pages 365–368

Published:02 June 2010Publication History

HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Pages 365–368

ABSTRACT

We report on work in progress on extracting lexical simplifications (e.g., "collaborate" → "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.

References

R. Chandrasekar, B. Srinivas. Automatic induction of rules for text simplification. Knowledge-Based Systems, 1997.Google Scholar
L. Deléger, P. Zweigenbaum. Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora. Workshop on Building and Using Comparable Corpora, 2009. Google ScholarDigital Library
S. Devlin, J. Tait. The use of a psycholinguistic database in the simplification of text for aphasic readers. In Linguistic Databases, 1998.Google Scholar
N. Elhadad, K. Sutaria. Mining a lexicon of technical terms and lay equivalents. Workshop on BioNLP, 2007. Google ScholarDigital Library
B. Beigman Klebanov, K. Knight, D. Marcu. Text simplification for information-seeking applications. OTM Conferences, 2004.Google Scholar
R. Nelken, S. M. Shieber. Towards robust context-sensitive sentence alignment for monolingual corpora. EACL, 2006.Google Scholar
R. Nelken, E. Yamangil. Mining Wikipedia's article revision history for training computational linguistics algorithms. WikiAI, 2008.Google Scholar
E. Shnarch, L. Barak, I. Dagan. Extracting lexical reference rules from Wikipedia. ACL, 2009. Google ScholarDigital Library
A. Siddharthan, A. Nenkova, K. McKeown. Syntactic simplification for improving content selection in multi-document summarization. COLING, 2004. Google ScholarDigital Library
D. Vickrey, D. Koller. Sentence simplification for semantic role labeling/ ACL, 2008.Google Scholar

Recommendations

Simplicity matters: user evaluation of the Slovene reference corpus

The latest reference corpus of written Slovene, the Gigafida corpus, was created as part of the `Communication in Slovene' project. In the same project, a web concordancer was designed for the broadest possible use, and tailored to the needs and ...
Read More
SAKE: A Graph-Based Keyphrase Extraction Method Using Self-attention
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence
Abstract
Keyphrase extraction is a text analysis technique that automatically extracts the most used and most important words and expressions from a text. It helps summarize the content of texts and recognize the main topics discussed. The majority of the ...
Read More
Fast Multi-View Clustering Via Ensembles: Towards Scalability, Superiority, and Simplicity
Despite significant progress, there remain three limitations to the previous multi-view clustering algorithms. First, they often suffer from high computational complexity, restricting their feasibility for large-scale datasets. Second, they typically fuse ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
June 2010
1070 pages
ISBN:1932432655
General Chair:
Ronald M. Kaplan
Microsoft Bing
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 2 June 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate240of768submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 378
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Simplicity matters: user evaluation of the Slovene reference corpus

SAKE: A Graph-Based Keyphrase Extraction Method Using Self-attention

Fast Multi-View Clustering Via Ensembles: Towards Scalability, Superiority, and Simplicity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Simplicity matters: user evaluation of the Slovene reference corpus

SAKE: A Graph-Based Keyphrase Extraction Method Using Self-attention

Fast Multi-View Clustering Via Ensembles: Towards Scalability, Superiority, and Simplicity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media