Automatic Mapping Among Lexico-Grammatical Annotation Models (AMALGAM)




AMALGAM HomepagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE

Introduction

Most Natural Language Processing applications assume that a first step in processing is GRAMMATICAL ANALYSIS or PARSING of each input sentence. This involves assigning a lexico-grammatical wordclass annotation to each word, and assigning a grammatical structure annotation to the sentence. On the face of it, this might seem to be a simple "preprocessing" step. However, a major problem is that NLP researchers cannot agree exactly how to parse a sentence. The problem is not just which ALGORITHM to use in the parser; we cannot agree on the TARGET, what the PARSING SCHEME should look like. To iilustrate the problem, before reading on, try writing down your own analysis of the following English sentence. Then see if your parse matches any of the "standards"!

Example sentence for you to parse:


'Select the text you want to protect .'


Lexico-grammatical word classes

Grammatical analysis is standardly divided into two levels or phases:

Even at the lower level, there is great diversity of annotation schemes or models. Here is the example sentence wordtagged according to several rival tagging schemes, vertically aligned:

        Brown ICE           LLC  LOB PARTS POW SEC UPenn
        ~~~~~ ~~~           ~~~  ~~~ ~~~~~ ~~~ ~~~ ~~~~~
select  VB   V(montr,imp)   VA+0 VB  adj   M   VB  VB
the     AT   ART(def)       TA   ATI art   DD  ATI DT
text    NN   N(com,sing)    NC   NN  noun  H   NN  NN
you     PPSS PRON(pers)     RC   PP2 pron  HP  PP2 PRP
want    VB   V(montr,pres)  VA+0 VB  verb  M   VB  VBP
to      TO   PRTCL(to)      PD   TO  verb  I   TO  TO
protect VB   V(montr,infin) VA+0 VB  verb  M   VB  VB
.       .    PUNC(per)      .    .   .     .   .   .

Note the differences in REPRESENTATION; more crucially, in DELICACY or level of detail in grammatical classification. Delicacy is a factor in EVALUATION: a delicate analysis is more difficult, so a 'skeletal parser' will score higher. An INDELICATE annotation is sufficient for many NLP syntax applications, eg:

However, this sort of "parsing" is inappropriate for some applications. In linguistic terms, the Speech Recognition grammar model has insufficient delicacy (or no delicacy at all!)

AMALGAM HomepagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE


This site developed and maintained by Eric Atwell (eric@comp.leeds.ac.uk)