Automatic Mapping Among Lexico-Grammatical Annotation Models (AMALGAM)


AMALGAM Home PagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE

A neutral representation for grammatical structure

Mapping between full parsing schemes is much harder. Although the parsing schemes used in several treebanks have a familial similarity, ideally we would also like to be able to include the output of robust parsers from outside this ICAME heritage, such as the IPSM MULTITREEBANK (a corpus of sentences each of which is annotated with several rival syntax trees).

Our approach uses Machine Learning of syntax, assuming each annotated Corpus is definitive of the grammatical annotation scheme to be learnt. This suggests our approach to mapping between PARSING SCHEMES, for example to reparse SEC text with the POW parsing scheme:

This requires a parsing-scheme-neutral way of representing rival parse-trees, to simplify comparison of delicacy. We have tried extracting all CONTEXT-FREE RULES from each Treebank, to use in a CHART PARSER. However, this yields upwards of 8,000 context-free grammar rules from each Corpus parsing scheme; our current chart-parsing system cannot cope with such a large grammar in reasonable time. So, we are also experimenting with alternative representations.

For WORDTAGGED corpora, we assume sequence of word+wordtag pairs; this is amenable to N-gram-like modelling. For PARSE-TREES, an analogous N-gram-like model is used in the VERTICAL STRIP PARSER (VSP): a Vertical Strip Grammar.

For example, take the parse-tree in the EAGLES basic parsing scheme:

[S[VP select [NP the text [CL[NP you NP][VP want [VP to protect
 VP]VP]CL]NP]VP] . S]



                   S
                   | \
                   |  \
                  VP   \
                 / |    \
                /  |     \
               /   NP     \
              /   //|      \
             /   // |       \
            /   //  CL       \
           /   //   | \       \
          /   //    |  \       \
         /   / |   NP  VP       \
        /   /  |   |   | \       \
       /   /   |   |   |  \       \
      /   /    |   |   |   VP      \
     /    |    |   |   |   | \      \
    /     |    |   |   |   |  \      \
  select the text you want to protect .


Another way of drawing the same tree, using only vertical and horizontal lines, is:

     S________________________________
     |                                |
     VP___                            |
     |    |                           |
     |    NP_______                   |
     |    |   |    |                  |
     |    |   |    CL__               |
     |    |   |    |   |              |
     |    |   |    NP  VP___          |
     |    |   |    |   |    |         |
     |    |   |    |   |    VP__      |
     |    |   |    |   |    |   |     |
  select the text you want to protect .

This can be chopped into a series of Vertical Strips, one for each path from root S to each leaf:

     S    S    S    S    S    S    S    S
     |    |    |    |    |    |    |    |
     VP   VP   VP   VP   VP   VP   VP   .
     |    |    |    |    |    |    |
  select  NP   NP   NP   NP   NP   NP
          |    |    |    |    |    |
         the  text  CL   CL   CL   CL
                    |    |    |    |
                    NP   VP   VP   VP
                    |    |    |    |
                   you  want  VP   VP
                              |    |
                              to protect

This Vertical Strip representation is highly redundant, as the "top" of each strip shares its path from the root with its predecessor. So, the final VSG representation only records the path to each leaf from the point of divergence from the previous Strip:

     S
     |                                  |
     VP                                 .
     |    |
  select  NP
          |    |    |
         the  text  CL
                    |    |
                    NP   VP
                    |    |    |
                   you  want  VP
                              |    |
                              to protect

AMALGAM Home PagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE


This site developed and maintained by Eric Atwell (eric@comp.leeds.ac.uk)