research-article

Free Access

Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora

Authors:
Matteo Negri

FBK-irst, Trento, Italy

FBK-irst, Trento, Italy
View Profile

,
Luisa Bentivogli

FBK-irst, Trento, Italy

FBK-irst, Trento, Italy
View Profile

,
Yashar Mehdad

FBK-irst and University of Trento, Trento, Italy

FBK-irst and University of Trento, Trento, Italy
View Profile

,
Danilo Giampiccolo

CELCT, Trento, Italy

CELCT, Trento, Italy
View Profile

,
Alessandro Marchetti

CELCT, Trento, Italy

CELCT, Trento, Italy
View Profile

Authors Info & Claims

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingJuly 2011Pages 670–679

Published:27 July 2011Publication History

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Pages 670–679

ABSTRACT

We address the creation of cross-lingual textual entailment corpora by means of crowd-sourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing the need of large-scale annotation efforts for textual entailment, our work aims to: i) tackle the scarcity of data available to train and evaluate systems, and ii) promote the recourse to crowdsourcing as an effective way to reduce the costs of data collection without sacrificing quality. We show that a complex data creation task, for which even experts usually feature low agreement scores, can be effectively decomposed into simple subtasks assigned to non-expert annotators. The resulting dataset, obtained from a pipeline of different jobs routed to Amazon Mechanical Turk, contains more than 1,600 aligned pairs for each combination of texts-hypotheses in English, Italian and German.

References

Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The Fifth PASCAL Recognizing Textual Entailment Challenge. Proceedings of TAC 2009.Google Scholar
Luisa Bentivogli, Elena Cabrio, Ido Dagan, Danilo Giampiccolo, Medea Lo Leggio, and Bernardo Magnini. 2010. Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference. Proceedings of LREC 2010.Google Scholar
Michael Bloodgood and Chris Callison-Burch. 2010. Using Mechanical Turk to Build Machine Translation Evaluation Sets. Proceedings of the NAACL 2010 Workshop on Creating Speech and Language Data With Amazons Mechanical Turk. Google ScholarDigital Library
Johan Bos, Fabio Massimo Zanzotto, and Marco Pennacchiotti. 2009. Textual Entailment at EVALITA 2009. Proceedings of EVALITA 2009.Google Scholar
Chris Callison-Burch and Mark Dredze. 2010. Creating Speech and Language Data With Amazons Mechanical Turk. Proceedings NAACL-2010 Workshop on Creating Speech and Language Data With Amazons Mechanical Turk. Google ScholarDigital Library
Ido Dagan and Oren Glickman. 2004. Probabilistic textual entailment: Generic applied modeling of language variability. Proceedings of the PASCAL Workshop of Learning Methods for Text Understanding and Mining.Google Scholar
Yashar Mehdad, Matteo Negri, and Marcello Federico. 2010. Towards Cross-Lingual Textual Entailment. Proceedings of NAACL-HLT 2010. Google ScholarDigital Library
Yashar Mehdad, Matteo Negri, and Marcello Federico. 2011. Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment. Proceedings of ACL-HLT 2011. Google ScholarDigital Library
Rada Mihalcea and Carlo Strapparava. 2009. The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language. Proceedings of ACL 2009. Google ScholarDigital Library
Joanna Mrozinski, Edward Whittaker, and Sadaoki Furui. 2008. Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System. Proceedings of ACL 2008.Google Scholar
Matteo Negri and Yashar Mehdad. 2010. Creating a Bilingual Entailment Corpus through Translations with Mechanical Turk: $100 for a 10-day Rush. Proceedings of the NAACL 2010 Workshop on Creating Speech and Language Data With Amazons Mechanical Turk. Google ScholarDigital Library
Mark Sammons, V. G. Vinod Vydiswaran, and Dan Roth. 2010. Ask Not What Textual Entailment Can Do for You... Proceedings of ACL 2010. Google ScholarDigital Library
Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. 2008. Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. Proceedings of EMNLP 2008. Google ScholarDigital Library
Rui Wang and Chris Callison-Burch. 2010. Cheap Facts and Counter-Facts. Proceedings of the NAACL 2010 Workshop on Creating Speech and Language Data With Amazons Mechanical Turk. Google ScholarDigital Library

Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

A Divide-Conquer Strategy for Both English and Chinese Text Chunking
ALPIT '07: Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)

The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not ...
Read More
A divide-and-conquer strategy for shallow parsing of German free texts
ANLC '00: Proceedings of the sixth conference on Applied natural language processing

We present a divide-and-conquer strategy based on finite state technology for shallow parsing of real-world German texts. In a first phase only the topological structure of a sentence (i.e., verb groups, subclauses) are determined. In a second phase the ...
Read More
Chinese Text Chunking Using Divide-Conquer Model
FSKD '08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04

Traditional Chinese text chunking approach is to identify phrases using only one model and same features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing
July 2011
1647 pages
ISBN:9781937284114
General Chair:
Paola Merlo
University of Geneva
,
Program Chairs:
Regina Barzilay
Massachusetts Institute of Technology
,
Mark Johnson
Macquarie University
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 27 July 2011
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 425
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

A Divide-Conquer Strategy for Both English and Chinese Text Chunking

A divide-and-conquer strategy for shallow parsing of German free texts

Chinese Text Chunking Using Divide-Conquer Model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

A Divide-Conquer Strategy for Both English and Chinese Text Chunking

A divide-and-conquer strategy for shallow parsing of German free texts

Chinese Text Chunking Using Divide-Conquer Model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media