Automatic Mapping Among Lexico-Grammatical Annotation Models (AMALGAM)




AMALGAM Home PagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE

The London-Lund Corpus Tag-set

Listed alphabetically below are the tags used in an adapted version of one of several tag-sets used to annotate the London-Lund corpus. Each tag has examples of the tokens that were annotated with that tag. The examples are taken directly from AMALGAM's adapted London-Lund corpus. If the list of examples ends with an ellipsis marker then the tag category can be assumed to be an open class.

At AMALGAM, we wanted to produce a version of the London-Lund annotation scheme, designed for tagging transcribed dialogues, so that it could be applied to written texts. We took the ten dialogue corpora used by Mats Eeg-Olofsson removed the dialogue annotation such as pause markers, added punctuation (all of the punctuation tags below are AMALGAM's own as there was no syntactic punctuation in the original corpus) and generally 'cleaned up' the texts so they resembled a written story rather than transcribed dialogue. We also split up all of the combined tags into parts. Then we trained Eric Brill's Transformation-Based Part-of-Speech Tagger using the adapted London-Lund texts to produce a tagger that could be used on written texts. The adapted London-Lund corpus was greatly transformed by AMALGAM which made it unrepresentative of the original transcribed dialogues. However, this does not matter as the it was not used for any other purpose than as a source of syntactic information for training the tagger.

As was the case for the Brown corpus, London-Lund uses `combined tags' for word such as don't (VD+0*AN) and I've (RA*VH+0). All combined tags have the same form: an asterisk separates the tags for the different tokens that make up the complete combined word. This makes it a trivial task to split the combined tags. AMALGAM's version of the London-Lund tagger annotates with combined tags only if the tokeniser is switched off. This is done on the e-mail server by appending "notoken" to the start of the subject line of the e-mail message. If the tokeniser is used the combined words are split into their constituent parts and the tags applied to each part. So, don't (VD+0*AN) becomes do (VD+0) plus n't (AN) and I've (RA*VH+0) becomes I (RA) plus 've (VH+0). The structure of the tags allows for some possibilities that are not represented here because they were not seen in the corpus. For example, the tag NP+2+Z would be used for plural, genitive, proper nouns such as Gods'.

Further information on the London-Lund corpus can be found at the International Computer Archive of Modern English (ICAME) corpus collection.

Further reading:

Eeg-Olofsson, M. 1991. Word-class tagging: Some computational tools. PhD thesis. Department of Linguistics and Phonetics, University of Lund, Sweden.

Tag

Description

Examples

!

exclamation mark

!

"

double quotation mark

"

'

single quotation mark

'

(

opening parenthesis

(

)

closing parenthesis

)

,

comma

,

-

dash

-

.

full stop

.

...

ellipsis

...

:

colon

:

;

semicolon

;

?

question mark

?

AB

adverb, WH-word

how when where wherever why

AB*VB+3

adverb, WH-word + verb "to be", present tense, 3rd person singular

how's when's where's

AB*VM+8

adverb, WH-word + verb, modal, ability

where'll

AC

adverb, closed class

about abroad after afterwards again ahead all almost alone along also altogether always another any anyhow anymore anyway around away back backwards before beforehand besides ...

AC*VB+3

adverb, closed class + verb "to be", present tense, 3rd person singular

here's

AC+R

adverb, closed class, comparative

better closer earlier easier further later less longer

AC+T

adverb, closed class, superlative

best

AE

adverb, postpositional

ago enough

AF

adverb, more

less more or

AG

adverb, most

most

AH

adverb, conjunct so

so

AI

adverb, very

very

AM

adverb, much

much

AN

adverb, not

not n't

AP

adverbial particle

about across along around away back behind down forward in off on over past round through to together up

AQ

phrasal intensifiers

a bit little lot

AR

adverb, no

no

AS

adverb, as

as

AT

adverb, too

too

AW

adverb, open class

absolutely actually administratively apparently artificially badly basically beautifully bitterly briefly casually certainly cleverly closely completely conceivably correctly ...

AX

existential there

there

AX*VB+3

existential there + verb "to be", present tense, 3rd person singular

there's

AX*VB+4

existential there + verb "to be", present tense, 2nd person singular or all persons plural

there're

AX*VM+8

existential there + verb, modal, ability

there'll

AX*VM+9

existential there + verb, modal, suggestion

there'd

AZ

intensifier

so that

CA

conjunction, coordinating, and

and

CB

conjunction, coordinating, but

but

CC

conjunction, subordinating

after although as because before case cos even except far for if in long once providing since so that though till unless when whereas whether while

CD

conjunction, subordinating, that

that

CE

correlative

as rather than

CF

conjunction, subordinating with that

in order

CR

conjunction, coordinating, or

or

CS

double conjunction

both either neither

DA

discourse, apology

'scuse pardon me I'm sorry

DB

discourse, smooth-over

never mind

DC

discourse, hedge

kind sort of thing the as it were

DE

discourse, expletive

God bless arse bloody bright cor crumbs dammit damnation damned dear earth enough fair for God's sake good gosh heavens hell knows ooh phew sakes sigh sod you

DE*VB+3

discourse, expletive + verb "to be", present tense, 3rd person singular

hell's

DG

discourse, greeting

good afternoon goodbye hello

DI

discourse, initiator

actually anyhow anyway now oh

DL

discourse, attention

hey look

DN

discourse, no

no

DO

discourse, order

come on give over go shut up

DP

discourse, politeness

please

DQ

discourse, question

right, eh, really

DR

discourse, response

I see I'm sure OK absolutely ah aha all right certainly exactly fine good great ha oh ooh quite really that's true uhuh very

DS

discourse, softener

I mean you know mark mind see

DT

discourse, thanks

thank you thanks

DW

discourse, well

well

DX

discourse, exemplifier

say

DY

discourse, yes

mhm yea yeah yep yes yup

EC

pre-determiner, all

all

ED

pre-determiner, half

half

EE

pre-determiner, both

both

EF

pre-determiner, double

double

EJ

pre-determiner, many

many

EK

pre-determiner, such or what

such what

EL

pre-determiner, quite or rather

quite rather

GA

pronoun, relative, who or which

which who

GA*VB+3

pronoun, relative, who or which + verb "to be", present tense, 3rd person singular

who's

GA*VH+0

pronoun, relative, who or which + verb "to have", base form

who've

GA*VH+3

pronoun, relative, who or which + verb "to have", present tense, 3rd person singular

who's

GA*VH+D

pronoun, relative, who or which + verb "to have", past tense

who'd

GB

pronoun, relative, whom

whom

GC

pronoun, relative, what

what

GC*VB+3

pronoun, relative, what + verb "to be", present tense, 3rd person singular

what's

GD

pronoun, relative, that

that

GD*VB+3

pronoun, relative, that + verb "to be", present tense, 3rd person singular

that's

GD*VM+8

pronoun, relative, that + verb, modal, ability

that'll

GD*VM+9

pronoun, relative, that + verb, modal, suggestion

that'd

JA

adjective

Brahmsian Catholic Central Middle Old Primary Venetian abortive absorbed abstract absurd academic acceptable accurate acting additional adjectival administrative adult afraid aged aggressive almighty alternate amazing ...

JA+R

adjective, comparative

better bigger broader cheaper easier further happier higher larger latter longer older quicker safer simpler smaller squarer tougher worse younger

JA+T

adjective, superlative

best biggest cheapest foggiest nearest nicest oldest safest slightest

JB

adjective, catenative

able about

JE

adjective, postposed

present elect

JM

adjective, quantifying post-determiner

a bit few lot lots many more of own plenty same several such

JN

adjective, nationality

Afghan African American Anglo-Saxon Arcadian Athenian Australian British Canadian Crimean Dutch English Estonian European French Greek Irish Latin North Persian Polish Roman Scottish South Yiddish aboriginal non-English

JP

adjective as noun phrase head

accused dead doubtful right same schizoid whole

JP+T

adjective as noun phrase head, superlative

best

JQ

adjective, ordinal

eighteenth eighth fifteenth first fourteenth fourth nineteenth second seventeenth seventh sixth tenth third twentieth twenty-fifth twenty-ninth twenty-third

JR

adjective, cardinal

eight eighteen eighty eighty-four fifteen fifty fiftyish fifty-eight five forty forty-five four fourteen hundred nine nought one seven six ten thirteen thirty- three twelve twenty two zero ...

JR*VB+3

adjective, cardinal + verb "to be", present tense, 3rd person singular

one's ten's twenty-three's

JS

adjective, post-determiner

last next other

NC

noun, common, singular

Admiral Boots Church Custodian Engineering Gold Language Lord Marmite O-level Stick Sub-Committee Test Tower Waiting ability abstract academic acceptability accident accommodation account actress address admin ...

NC*VB+3

noun, common, singular + verb "to be", present tense, 3rd person singular

academic's bedroom's fellow's group's hill's intake's money's morning's painting's paper's print's room's survey's water's

NC*VH+3

noun, common + verb "to have", present tense, 3rd person singular

housekeeper's term's

NC+2

noun, common, plural

Lords Romantics X-rays academics accounts activities adverbs advisers aerials ages aids alterations amusements angles animals answers applicants applications appointments areas arenas arms ...

NC+2+Z

noun, common, plural, genitive

birds' boys' butlers' days' examiners' girls' ladies' months' servants' students' years'

NC+Z

noun, common, singular, genitive

chemist's day's emu's father's girl's immigrant's lady's lawyer's month's ship's world's

NN

noun, nationality, singular

Britisher Dutchman Englishman Irishman Welshman

NN+2

noun, nationality, plural

Americans Australians Biafrans Europeans Londoners Turks

NP

noun, proper, singular

A Adams Aeschylus Africa Agamemnon Aldershot Aldo Alec America Andrew Antigone Appleby April Arcadie Ariel Aristophanes Arms Asbestos Ashton Association August Auschwitz Australia Austria Baker Band Bandra Banks Bards ...

NP*VB+3

noun, proper, singular + verb "to be", present tense, 3rd person singular

Australia's Cuckfield's Delaney's Farnham's Hartney Haywards Hodgson's Horsham's Mallet's Mervyn's Millicent's Ponsonby's

NP*VH+3

noun, proper, singular + verb "to have", present tense, 3rd person singular

Frank's Hart's Hogg's Mallet's Marilyn's Neasden's Tim's Tom's

NP*VM+8

noun, proper, singular + verb, modal, ability

Lambert'll

NP+2

noun, proper, plural

Alluysons Authorities Burtons Fortunes Gods Messrs Saturdays Sundays Victorians Vyses

NP+Z

noun, proper, singular, genitive

Ajax's Allenby's Annabel's Bard's Bennett's Bill's Churchill's Dan's David's Davis's Dilys's Dvorak's Electra's Elizabeth's King's Knott's Leslie's Mallet's Marina's Mendelssohn's Mervyn's Reith's ...

NX

noun, abbreviation, singular

ABC AV BA BBC CSC FC H L LPO LSO MA MG MP NFO NUS PP PS PhD UC

NX*VH+0

noun, abbreviation, singular + verb "to have", base form

RPO've

NX+2

noun, abbreviation, plural

Bs Fs Rs

NX+Z

noun, abbreviation, singular, genitive

NUS

PA

preposition

a about according across after against along among apart around as because before behind below beside between beyond by down during for from front grounds in inside instead into like ...

PD

infinitive marker

to

RA

pronoun, personal, nominative

I he she they we

RA*VB+1

pronoun, personal, nominative + verb "to be", present tense, 1st person singular

I'm

RA*VB+3

pronoun, personal, nominative + verb "to be", present tense, 3rd person singular

he's she's

RA*VB+4

pronoun, personal, nominative + verb "to be", present tense, 2nd person singular or all persons plural

they're we're

RA*VH+0

pronoun, personal, nominative + verb "to have", base form

I've they've we've

RA*VH+3

pronoun, personal, nominative + verb "to have", present tense, 3rd person singular

he's she's

RA*VH+D

pronoun, personal, nominative + verb "to have", past tense

I'd he'd she'd they'd we'd

RA*VM+8

pronoun, personal, nominative + verb, modal, ability

I'll he'll she'll they'll we'll

RA*VM+9

pronoun, personal, nominative + verb, modal, suggestion

I'd he'd she'd they'd

RB

pronoun, personal, accusative

'em her him me them us

RC

pronoun, personal, unmarked for case

it you

RC*VB+3

pronoun, personal, unmarked for case + verb "to be", present tense, 3rd person singular

it's

RC*VB+4

pronoun, personal, unmarked for case + verb "to be", present tense, 2nd person singular or all persons plural

you're

RC*VH+0

pronoun, personal, unmarked for case + verb "to have", base form

you've

RC*VH+3

pronoun, personal, unmarked for case + verb "to have", present tense, 3rd person singular

it's

RC*VH+D

pronoun, personal, unmarked for case + verb "to have", past tense

you'd

RC*VM+8

pronoun, personal, unmarked for case + verb, modal, ability

it'll you'll

RC*VM+9

pronoun, personal, unmarked for case + verb, modal, suggestion

it'd you'd

RD

pronoun, demonstrative that

that

RD*VB+3

pronoun, demonstrative that + verb "to be", present tense, 3rd person singular

that's

RD*VH+3

pronoun, demonstrative that + verb "to have", present tense, 3rd person singular

that's

RD*VM+8

pronoun, demonstrative that + verb, modal, ability

that'll

RD*VM+9

pronoun, demonstrative that + verb, modal, suggestion

that'd

RE

pronoun, possessive

her his mine ours theirs yours

RE*VB+3

pronoun, possessive + verb "to be", present tense, 3rd person singular

mine's

RF

pronoun, interrogative

what whatever which who whoever

RF*VB+3

pronoun, interrogative + verb "to be", present tense, 3rd person singular

what's who's

RF*VH+3

pronoun, interrogative + verb "to have", present tense, 3rd person singular

what's

RG

pronoun, demonstrative this

this

RH

pronoun, demonstrative plural

these those

RJ

pronoun, one

one ones

RJ*VB+3

pronoun, one + verb "to be", present tense, 3rd person singular

one's

RJ+Z

pronoun, one, genitive

one's

RM

pronoun, compound indefinite

any anybody anyone anything everybody everyone everything no nobody one somebody someone something thing

RM+Z

pronoun, compound indefinite, genitive

somebody's someone's

RN

pronoun, else

else

RO

pronoun, reciprocal

each other one another

RO+Z

pronoun, reciprocal, genitive

each other's

RP

pronoun, quantifying

all another any both certain each enough little many more most much other others some such

RP*VB+3

pronoun, quantifying + verb "to be", present tense, 3rd person singular

other's

RQ

pronoun, pro-form

so

RR

pronoun, reflexive

herself himself itself myself ourselves themselves yourself

TA

article, definite

the

TB

determiner/pronoun, possessive

her his its my our their whatever which your

TC

determiner, any, enough or some

any enough some

TD

determiner, demonstrative, singular

that this

TE

determiner, demonstrative, plural

these those

TF

article, indefinite

a an the

TG

determiner, quantifying

another each either every

TH

determiner, much

much

VA+0

verb, lexical, base form

abandon accelerate accord accuse add advise affect afford agree alter amount analyse answer appear apply appoint appreciate argue arrive ask associate assume baby-sit backtrack bear become ...

VA+0*RB

verb, lexical, base form + pronoun, personal, accusative

let's

VA+3

verb, lexical, present, 3rd person singular

affronts amounts amuses approaches becomes begins believes breaks causes changes comes contacts contains crops crosses deals descants draws drives echoes enters equals expects expresses feels finishes ...

VA+D

verb, lexical, past tense

advertised advised affected agreed amused appealed applied arrived assured astonished attended banded became bent borrowed bought broadcast brought built burnt burst called came captained carried caused ...

VA+G

verb, lexical, -ing form

abstracting addressing adopting aiming amusing analysing appealing appointing arranging arriving asking assuming backsliding banking becoming begging bellringing blinding blowing boasting boiling ...

VA+G*PD

verb, lexical, -ing form + infinitive marker

gonna

VA+N

verb, lexical, past participle

accepted accused acted added advertised agreed allotted allowed annoyed applied arrived asked associated baled become bedevilled bogged boiled bored born borne borrowed bought bound brought built ...

VB+0

verb "to be", infinitive or imperative

be

VB+1

verb "to be", present tense, 1st person singular

am

VB+3

verb "to be", present tense, 3rd person singular

is

VB+3*AN

verb "to be", present tense, 3rd person singular + adverb, not

isn't

VB+4

verb "to be", present tense, 2nd person singular or all persons plural

are

VB+4*AN

verb "to be", present tense, 2nd person singular or all persons plural + adverb, not

aren't

VB+5

verb "to be", past tense, 1st and 3rd person singular

was

VB+5*AN

verb "to be", past tense, 1st and 3rd person singular + adverb, not

wasn't

VB+6

verb "to be", past tense, 2nd person singular or all persons plural

were

VB+6*AN

verb "to be", past tense, 2nd person singular or all persons plural + adverb, not

weren't

VB+G

verb "to be", -ing form

being

VB+N

verb "to be", past participle

been

VD+0

verb "to do", base form

do

VD+0*AN

verb "to do", base form + adverb, not

don't

VD+0*RC

verb "to do", base form + pronoun, personal, unmarked for case

d'you

VD+3

verb "to do", present tense, 3rd person singular

does

VD+3*AN

verb "to do", present tense, 3rd person singular + adverb, not

doesn't

VD+D

verb "to do", past tense

did

VD+D*AN

verb "to do", past tense + adverb, no

didn't

VD+G

verb "to do", -ing form

doing

VD+N

verb "to do", past participle

done

VH+0

verb "to have", base form

have

VH+0*AN

verb "to have", base form + adverb, no

haven't

VH+3

verb "to have", present tense, 3rd person singular

has

VH+3*AN

verb "to have", present tense, 3rd person singular + adverb, no

hasn't

VH+D

verb "to have", past tense

had

VH+D*AN

verb "to have", past tense + adverb, no

hadn't

VH+G

verb "to have", -ing form

having

VH+N

verb "to have", past participle

had

VM+8

verb, modal, ability

can may must need ought shall will

VM+8*AN

verb, modal, ability + adverb, no

can't cannot daren't mustn't needn't oughtn't shan't won't

VM+9

verb, modal, suggestion

could might should used would

VM+9*AN

verb, modal, suggestion + adverb, no

couldn't shouldn't wouldn't

VM+9*VH+0

verb, modal, suggestion + verb "to have", base form

should've

XA

metalanguage, cited words

like no very worth worthwhile

XX

foreign words, formulae or separate letters

A B C Caglia Cosi D E F Fille G Gardee H I J K L La M Mal N O P R S Tutte W Y Z ad cum fan gauleiters generis hominem ie major variorum

XZ

general 'ragbag'

Shmerican




AMALGAM Home PagePrevious PageUp A LevelNext Page

AMALGAM HOMEPAGE | PREVIOUS PAGE | UP A LEVEL | NEXT PAGE