ABSTRACT
We report on work in progress on extracting lexical simplifications (e.g., "collaborate" → "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.
- R. Chandrasekar, B. Srinivas. Automatic induction of rules for text simplification. Knowledge-Based Systems, 1997.Google Scholar
- L. Deléger, P. Zweigenbaum. Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora. Workshop on Building and Using Comparable Corpora, 2009. Google ScholarDigital Library
- S. Devlin, J. Tait. The use of a psycholinguistic database in the simplification of text for aphasic readers. In Linguistic Databases, 1998.Google Scholar
- N. Elhadad, K. Sutaria. Mining a lexicon of technical terms and lay equivalents. Workshop on BioNLP, 2007. Google ScholarDigital Library
- B. Beigman Klebanov, K. Knight, D. Marcu. Text simplification for information-seeking applications. OTM Conferences, 2004.Google Scholar
- R. Nelken, S. M. Shieber. Towards robust context-sensitive sentence alignment for monolingual corpora. EACL, 2006.Google Scholar
- R. Nelken, E. Yamangil. Mining Wikipedia's article revision history for training computational linguistics algorithms. WikiAI, 2008.Google Scholar
- E. Shnarch, L. Barak, I. Dagan. Extracting lexical reference rules from Wikipedia. ACL, 2009. Google ScholarDigital Library
- A. Siddharthan, A. Nenkova, K. McKeown. Syntactic simplification for improving content selection in multi-document summarization. COLING, 2004. Google ScholarDigital Library
- D. Vickrey, D. Koller. Sentence simplification for semantic role labeling/ ACL, 2008.Google Scholar
Recommendations
Simplicity matters: user evaluation of the Slovene reference corpus
The latest reference corpus of written Slovene, the Gigafida corpus, was created as part of the `Communication in Slovene' project. In the same project, a web concordancer was designed for the broadest possible use, and tailored to the needs and ...
SAKE: A Graph-Based Keyphrase Extraction Method Using Self-attention
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial IntelligenceAbstractKeyphrase extraction is a text analysis technique that automatically extracts the most used and most important words and expressions from a text. It helps summarize the content of texts and recognize the main topics discussed. The majority of the ...
Fast Multi-View Clustering Via Ensembles: Towards Scalability, Superiority, and Simplicity
Despite significant progress, there remain three limitations to the previous multi-view clustering algorithms. First, they often suffer from high computational complexity, restricting their feasibility for large-scale datasets. Second, they typically fuse ...
Comments