Pinguine, Stadtpark--Wien Open Source MT evaluation toolkit

University of Leeds
Centre for Translation Studies and School of Computing

Authors: Bogdan Babych, Anthony Hartley

v01-1, 2004-05-17

Download:

Zip: wnm-01-1.zip, Perl script only: wnm-01-1.pl

Description:

WNM (weighted N-gram model) - automated MT evaluation toolkit based on Weighted N-gram Model
implements a rough model of legitimate translation variation (LTV)
v01.1, 2004-05-17
#########################################################
purpose:	scores Machine Translation output; the scores correlate with human evaluations of
		adequacy and fluency
usage: 		perl wnm-01-1.pl <evaluated-text> <human-reference-text> <corpusFrequencyFile>
example:	perl wnm-01-1.pl te01sysA-base.txt te00humanRef.txt wnm-frqEnglish-darpa94e.txt >> wnm-results.txt

requires: a corpus statistics file in the following format:
word;FrequencyInCorpus;NumberOfTextsWhereFound

the header of the corpus statistics file should be:
<CorpStat>NumberOfTokensInCorpus;NumberOfTextsInCorpus

2 corpus statistics files for English are included (the 2 files are created on 2 different human reference translations)

sample output:
MT-TEXT:tw01sysA-base.txt;wnm-RECALL-ADEQUACY:0.2221;wnm-FSCORE-FLUENCY:0.2788
;DETAILS:
;tw01sysA-base.txt;bP:0.2896;bR:0.3442;bF:0.3146
;tw01sysA-base.txt;wP:0.3745;wR:0.2221;wF:0.2788


#########################################################
# NOTE: v01-1 AT THE MOMENT EACH FILE IS TREATED AS A SINGLE TEXT (THEREFORE NO TEXT/SEGMENT MARKUP IS REQUIRED)
# IF YOU EVALUATE A LARGE COLLECTION OF TEXTS, PUT EACH TEXT INTO A DIFFERENT FILE AND COMPUTE AVERAGE SCORES
#########################################################

#########################################################
authors:
	Bogdan Babych <bogdan <at> comp.leeds.ac.uk>
	Tony Hartley <a.hartley <at> leeds.ac.uk>
		Centre for Translation Studies,
		University of Leeds, England, UK

Principle of evaluation:

The tool implements a method of MT evaluation that combines BLEU (Papineni et al., 2002) with weights of statistical salience from vector space model, such as S-scores (Babych, Hartley, Atwell, 2003), which are similar to TF.IDF scores (Salton, Lesk, 1968)

The method is described in (Babych, 2004), (Babych, Hartley, 2004a), (Babych, Hartley, 2004b). The paper (Babych, Hartley, 2004a) deals with the relation of the frequency salience weights and legitimate tranlation variation (LTV)

The method has been tested for correlation with human scores on DARPA 94 MT evaluation corpus (White et al, 1994) and a corpus of e-mails / EU White Paper document (Babych, Hartley, Atwell, 2004)

References

Babych B, Hartley A, Atwell E. 2003. Statistical Modelling of MT output corpora for Information Extraction. In: Proceedings of the Corpus Linguistics 2003 conference, edited by Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery. Lancaster University (UK), 28 - 31 March 2003. Pp. 62-70. PDF, DOC

Babych B, Hartley A. 2004a. Modelling legitimate translation variation for automatic evaluation of MT quality, LREC 2004 (forthcoming). PDF, DOC

Babych B, Hartley A. 2004b. Extending BLEU MT Evaluation Method with Frequency Weighting, ACL 2004 (forthcoming). PDF, DOC

Babych B. 2004. Weighted N-gram model for evaluating Machine Translation output. CLUK `04. Proceedings of the 7th Annual Colloquium for the UK Special Interest Group for Computational Linguistics. Unviersity of Birmingham 6-7 January, 2004. pp. 15-22. PDF, DOC

Papineni K, Roukos S, Ward T, Zhu W-J. 2002 BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for the Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311-318.

Salton, G. and M.E. Lesk. 1968. Computer evaluation of indexing and text processing. Journal of the ACM, 15(1) , 8-36.

White, J., T. OConnell and F. OMara. 1994. The ARPA MT evaluation methodologies: evolution, lessons and future approaches. Proceedings of the 1st Conference of the Association for Machine Translation in the Americas. Columbia, MD, October 1994. pp. 193-205.

Previous versions


Last update: 17.05.2004

Hit Counter
mba course