Automatic Procedures in MT Evaluation

MT Summit XI, Copenhagen - 11 September 2007

Workshop organised by the ELRA Evaluation Committee

  • Gregor Thurmair, Linguatec
  • Khalid Choukri, ELDA
  • Bente Maegaard, University of Copenhagen

The purpose of this workshop was to discuss automatic evaluation procedures in MT. Among the discussion points were:

  • What do the scores really measure?
  • What kind of implicit assumptions do they make?
  • What kind initial effort do they require (e.g.: pre-translate test corpus)?
  • What kind of resources do they need (e.g.: third party grammars)?
  • Are they biased towards specific MT technologies?
  • What kind of diagnostic support can they give? (where to improve the system)
  • What kind of evaluation criteria (e.g. related to the FEMTI framework) do they support (adequacy, fluency, …)

The objective of the workshop was to have a better understanding of the strengths and limitations of the respective approaches, and perhaps make steps towards defining a common methodology for MT output evaluation.

Programme

(Click the title to view/download the presentation)

9.00 Welcome and introduction

9.20 The place of automatic evaluation metrics in external quality models for machine translation (pdf , 104 KB, 19 slides)
Andrei Popescu-Belis, University of Geneva

10.00 Evaluating Evaluation --- Lessons from the WMT’07 Shared Task (pdf , 420 KB, 38 slides)
Philipp Koehn, University of Edinburgh

10.30 Coffee break

11.00 Investigating Why BLEU Penalizes Non-Statistical Systems (pdf , 261 KB, 10 slides)
Eduard Hovy, University of Southern California

11.30 Edit distance as an evaluation metric (pdf , 997 KB, 34 slides)
Christopher Cieri, Linguistic Data Consortium

12.00 Experience and conclusions from the CESTA evaluation project (pdf , 102 KB, 22 slides)
Olivier Hamon, ELDA

12.30 Lunch

13.30 Automatic Evaluation in MT system production (pdf , 147 KB, 28 slides)
Gregor Thurmair, Linguatec

14.00 Sensitivity of performance-based and proximity-based models for MT evaluation (pdf , 144 KB, 22 slides)
Bogdan Babych, Univ. Leeds

14.30 Automatic & human Evaluations of MT in the framework of a speech to speech communication (pdf , 178 KB, 33 slides)
Khalid Choukri, ELDA

15.00 Coffee break

15.30 Discussion and conclusions

17.00 Close