Automatic Mapping Among Lexico-Grammatical Annotation Models (AMALGAM)

AMALGAM TAGGER - HOW IT WORKS




AMALGAM Home PagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE




This program is effectively a wrapper for Eric Brill's Rule-based tagger, retrained at Leeds with 8 alternative tagging schemes. The tagger works by reading in the lexicon, bigram lists and rules from external files. AMALGAM's tagger works by redirecting Brill's tagger to read in alternative versions of these defining files so that it can annotate according to the following eight schemes:

  1. Brown Corpus
  2. International Corpus of English
  3. Lundon-Lund Corpus
  4. Lancaster-Oslo/Bergen Corpus
  5. UNIX parts
  6. Polytechnic of Wales Corpus
  7. Spoken English Corpus
  8. University of Pennsylvania Corpus

(Please note that the tagger is intended for English text - it will not wo rk for languages other than English.)

By sending a blank message to amalgam-tagger@comp.leeds.ac.uk with "help" as the subject you will receive a help file instructing you how to use the multi-tagger. We are also expecting to add a web browser version soon. Watch this space!

The text to be tagged is first passed through a tokeniser which applies various formatting rules to the text. This can be turned off and on when mailing amalgam-tagger. Again, more details are in the help file.

A description of the training procedure for Brill's tagger that allows it to acquire each new scheme is available.




AMALGAM Home PagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE


This site developed and maintained by Eric Atwell (eric@comp.leeds.ac.uk)