skip to main content
10.5555/1706269.1706298dlproceedingsArticle/Chapter ViewAbstractPublication PagesinlgConference Proceedingsconference-collections
research-article
Free Access

GENEVAL: a proposal for shared-task evaluation in NLG

Published:15 July 2006Publication History

ABSTRACT

We propose to organise a series of sharedtask NLG events, where participants are asked to build systems with similar input/output functionalities, and these systems are evaluated with a range of different evaluation techniques. The main purpose of these events is to allow us to compare different evaluation techniques, by correlating the results of different evaluations on the systems entered in the events.

References

  1. Srinavas Bangalore, Owen Rambow, and Steve Whit-taker. 2000. Evaluation metrics for generation. In Proceedings of INLG-2000, pages 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anja Belz and Adam Kilgarriff. 2006. Shared-task evaluations in HLT: Lessons for NLG. In Proceedings of INLG-2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of NLG systems. In Proceedings of EACL-2006, pages 313--320.Google ScholarGoogle Scholar
  4. Lynette Hirschman. 1998. The evolution of evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language, 12:283--285.Google ScholarGoogle ScholarCross RefCross Ref
  5. Anna Law, Yvonne Freer, Jim Hunter, Robert Logie, Neil McIntosh, and John Quinn. 2005. Generating textual summaries of graphical time series data to support medical decision making in the neonatal intensive care unit. Journal of Clinical Monitoring and Computing, 19:183--194.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of ACL-2002, pages 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ehud Reiter and Somayajulu Sripada. 2002. Should corpora texts be gold standards for NLG? In Proceedings of INLG-2002, pages 97--104.Google ScholarGoogle Scholar
  9. Ehud Reiter, Roma Robertson, and Liesl Osman. 2003. Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence, 144:41--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Somayajulu Sripada, Ehud Reiter, Jim Hunter, and Jin Yu. 2003. Exploiting a parallel text-data corpus. In Proceedings of Corpus Linguistics 2003, pages 734--743.Google ScholarGoogle Scholar
  1. GENEVAL: a proposal for shared-task evaluation in NLG

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        INLG '06: Proceedings of the Fourth International Natural Language Generation Conference
        July 2006
        132 pages
        ISBN:1932432728

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 15 July 2006

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader