Automatic
Mapping Among Lexico-Grammatical Annotation Models (AMALGAM)
Most Natural Language Processing applications assume that a first step in processing is GRAMMATICAL ANALYSIS or PARSING of each input sentence. This involves assigning a lexico-grammatical wordclass annotation to each word, and assigning a grammatical structure annotation to the sentence. On the face of it, this might seem to be a simple "preprocessing" step. However, a major problem is that NLP researchers cannot agree exactly how to parse a sentence. The problem is not just which ALGORITHM to use in the parser; we cannot agree on the TARGET, what the PARSING SCHEME should look like. To iilustrate the problem, before reading on, try writing down your own analysis of the following English sentence. Then see if your parse matches any of the "standards"!
Grammatical analysis is standardly divided into two levels or phases:
Even at the lower level, there is great diversity of annotation schemes or models. Here is the example sentence wordtagged according to several rival tagging schemes, vertically aligned:
Brown ICE LLC LOB PARTS POW SEC UPenn
~~~~~ ~~~ ~~~ ~~~ ~~~~~ ~~~ ~~~ ~~~~~
select VB V(montr,imp) VA+0 VB adj M VB VB
the AT ART(def) TA ATI art DD ATI DT
text NN N(com,sing) NC NN noun H NN NN
you PPSS PRON(pers) RC PP2 pron HP PP2 PRP
want VB V(montr,pres) VA+0 VB verb M VB VBP
to TO PRTCL(to) PD TO verb I TO TO
protect VB V(montr,infin) VA+0 VB verb M VB VB
. . PUNC(per) . . . . . .
Note the differences in REPRESENTATION; more crucially, in DELICACY or level of detail in grammatical classification. Delicacy is a factor in EVALUATION: a delicate analysis is more difficult, so a 'skeletal parser' will score higher. An INDELICATE annotation is sufficient for many NLP syntax applications, eg:
However, this sort of "parsing" is inappropriate for some applications. In linguistic terms, the Speech Recognition grammar model has insufficient delicacy (or no delicacy at all!)
This site developed and maintained by Eric Atwell (eric@comp.leeds.ac.uk)