Automatic
Mapping Among Lexico-Grammatical Annotation Models (AMALGAM)
This standard representation is still crude and appears unfair to some schemes, particularly dependency grammar which has no grammatical classes! Also, it assumes the parser produces a single "correct" parse-tree - is it fair to parsers (eg ANLT) which produce a "forest" of possible parses?
However, it at least allows us to compare parser outputs more directly. It should also be possible to combine or merge syntactic information from different parsers straightforwardly. Our next task is to see how difficult it is to map each of these VSG formats onto an interlingua, the EAGLES baseline "lowest common factor". This will allow us to evaluate parser output against EAGLES standard.
NLP researchers have not agreed a standard lexico-grammatical annotation model for English. As there is no single Standard, the AMALGAM project is developing means to map between rival schemes, and to combine currently-incompatible annotated Corpora into a single, reusable resource.
We have trained the Brill Tagger with several lexico-grammatical annotation models, to enable it to annotate according to several rival models. To map from one tagging scheme to another, we first strip the "source" tags, and re-tag the text with "target" tags. The "source" tags can then be used to guide a postprocessing error-correction phase.
We have been able to extract a set of generative context-free grammar rules from each Treebank or parsed corpus. These can be passed to a Chart Parser, to annotate new text with parse-trees of the desired "target" parsing scheme. However, the Context-Free grammars extracted in this way are too large for our chart parser to use in realistic time. So, we are developing an alternative N-gram-like representation for structural annotations.
Our main achievements are:
This site developed and maintained by Eric Atwell (eric@comp.leeds.ac.uk)