go back
Previous versions
Source AWK script
Works under GAWK v3.1.3 or higher, TAWK v5.0.
Warning: produces incorrect results under GAWK v3.0.4
Syntax:
gawk -f ltv-mt-eval.awk <file with tested MT corpus> <file with corpus of reference human translations>
The following annotation is assumed for delimiting texts within the files and aligning them:
<DOC doc_ID="someUniqueName" sys_ID="usually_file_name">
</DOC>
Corresponding texts in the second file should have the same doc_ID identifiers.
The output is the score for a tested system, based on evaluating the enrire corpus.
Tested on DARPA-94 MT evaluation corpus containing 100 news reports, about 350 words each.