research-article

Free Access

GENEVAL: a proposal for shared-task evaluation in NLG

Authors:
Ehud Reiter

University of Aberdeen, UK

University of Aberdeen, UK
View Profile

,
Anja Belz

University of Brighton, UK

University of Brighton, UK
View Profile

Authors Info & Claims

INLG '06: Proceedings of the Fourth International Natural Language Generation ConferenceJuly 2006Pages 136–138

Published:15 July 2006Publication History

INLG '06: Proceedings of the Fourth International Natural Language Generation Conference

Pages 136–138

ABSTRACT

We propose to organise a series of sharedtask NLG events, where participants are asked to build systems with similar input/output functionalities, and these systems are evaluated with a range of different evaluation techniques. The main purpose of these events is to allow us to compare different evaluation techniques, by correlating the results of different evaluations on the systems entered in the events.

References

Srinavas Bangalore, Owen Rambow, and Steve Whit-taker. 2000. Evaluation metrics for generation. In Proceedings of INLG-2000, pages 1--8. Google ScholarDigital Library
Anja Belz and Adam Kilgarriff. 2006. Shared-task evaluations in HLT: Lessons for NLG. In Proceedings of INLG-2006. Google ScholarDigital Library
Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of NLG systems. In Proceedings of EACL-2006, pages 313--320.Google Scholar
Lynette Hirschman. 1998. The evolution of evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language, 12:283--285.Google ScholarCross Ref
Anna Law, Yvonne Freer, Jim Hunter, Robert Logie, Neil McIntosh, and John Quinn. 2005. Generating textual summaries of graphical time series data to support medical decision making in the neonatal intensive care unit. Journal of Clinical Monitoring and Computing, 19:183--194.Google ScholarCross Ref
Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of ACL-2002, pages 311--318. Google ScholarDigital Library
Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press. Google ScholarDigital Library
Ehud Reiter and Somayajulu Sripada. 2002. Should corpora texts be gold standards for NLG? In Proceedings of INLG-2002, pages 97--104.Google Scholar
Ehud Reiter, Roma Robertson, and Liesl Osman. 2003. Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence, 144:41--58. Google ScholarDigital Library
Somayajulu Sripada, Ehud Reiter, Jim Hunter, and Jin Yu. 2003. Exploiting a parallel text-data corpus. In Proceedings of Corpus Linguistics 2003, pages 734--743.Google Scholar

GENEVAL: a proposal for shared-task evaluation in NLG
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Multi-attribute comprehensive evaluation of individual research output based on published research papers

This paper proposes a multi-attribute comprehensive evaluation method of individual research output (IRO). It highlights the fact that a single index can never give more than a rough approximation to IRO, and the evaluation of IRO is a multi-attribute ...
Read More
Experimental teaching quality evaluation practice based on AHP-fuzzy comprehensive evaluation model
ICIC'13: Proceedings of the 9th international conference on Intelligent Computing Theories and Technology

In this thesis, we use the integration method of AHP and fuzzy comprehensive evaluation as the evaluation model for the experimental teaching evaluation system. First, we build a hierarchy model and calculate the weigh of evaluation factor by AHP, and ...
Read More
Quantitative Evaluation Study of Public Libraries
ISIE '11: Proceedings of the 2011 International Conference on Intelligence Science and Information Engineering

Based on the theory of sustainable development, the evaluation indexes system of public libraries has been established. The evaluation indexes system was established including 12 indexes. Both principal component analysis and comprehensive evaluation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
INLG '06: Proceedings of the Fourth International Natural Language Generation Conference
July 2006
132 pages
ISBN:1932432728
Program Chairs:
Nathalie Colineau
CSIRO - ICT Centre, Australia
,
Cécile Paris
CSIRO - ICT Centre, Australia
,
Stephen Wan
CSIRO - ICT Centre and Macquarie University, Australia
,
Robert Dale
Macquarie University, Australia
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 15 July 2006
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 166
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GENEVAL: a proposal for shared-task evaluation in NLG

INLG '06: Proceedings of the Fourth International Natural Language Generation Conference

ABSTRACT

References

Cited By

Recommendations

Multi-attribute comprehensive evaluation of individual research output based on published research papers

Experimental teaching quality evaluation practice based on AHP-fuzzy comprehensive evaluation model

Quantitative Evaluation Study of Public Libraries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

GENEVAL: a proposal for shared-task evaluation in NLG

INLG '06: Proceedings of the Fourth International Natural Language Generation Conference

ABSTRACT

References

Cited By

Recommendations

Multi-attribute comprehensive evaluation of individual research output based on published research papers

Experimental teaching quality evaluation practice based on AHP-fuzzy comprehensive evaluation model

Quantitative Evaluation Study of Public Libraries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media