Majdi Sawalha @ Leeds

 
Home> Tag Set
DESIGN OF A THEORY-NEUTRAL STANDARD TAG SET EXPOUNDING TRADITIONAL MORPHOLOGICAL FEATURES FOR ARABIC LANGUAGE PART-OF-SPEECH TAGGING

Abstract

The Morphological Features Tag Set captures long-established traditional morphological features of Arabic, in a compact yet transparent notation. A tag consists of 22 characters; each position represents a feature and the letter at that location represents a value or attribute of the morphological feature; the dash “-“ represents a feature not relevant to a given word. The first character shows the main Parts of Speech, from: noun, verb, particle, punctuation, residual; these last two are an extension to the traditional three classes to handle modern texts. The characters 2, 3, and 4 are used to represent subcategories; traditional Arabic grammar recognizes 29 subclasses of noun (letter 2), 3 subclasses of verb (letter 3), 13 subclasses of particle (letter 4). residuals and punctuations are represented in letters 5 and 6 respectively. The next letters represent traditional morphological features: gender (7), number (8), person (9), morphology (10) case & mood (11), case & mood markers (12), definiteness (13), voice (14), emphasized (15), transitivity (16), humanness (17), Variability & Conjungation (18). Finally there are four characters representing morphological information which is useful in Arabic text analysis, although not all linguists would count these as traditional features: augmented and unaugmented (19), number of root letters (20), verb internal structure (21), noun finals (22). The Morphological Features Tag Set is not tied to a specific tagging algorithm or theory, and other tag sets could be mapped onto this standard, to simplify and promote comparisons between and reuse of Arabic taggers and tagged corpora.

Details of the morphological features Part-of-Speech Tag Set for Arabic language

Morphological Features Categories

Position 

Morphological Features Categories

1

Main Part-of-Speech

أَقسام الكلام الرئيسيَّة

2

Part-of-Speech of Noun

أقسام الكلام الفرعيَّة (الاسم)

3

Part-of-Speech of Verb

أقسام الكلام الفرعيَّة (الفعل)

4

Part-of-Speech of Particle

أقسام الكلام الفرعيَّة (الحرف)

5

Residuals

أقسام الكلام الفرعيَّة (أخرى)

6

Punctuation marks

أقسام الكلامالفرعيَّة ( علامات الترقيم)

7

Gender

الجنس

8

Number

العدد

9

Person

الشخص

10

Morphology

الصَّرف

11

Case and Mood

الحالة الإعرابية للاسم أو الفعل

12

Case and Mood marks

علامة الإعراب أو البناء

13

Definiteness

المَعْرِفَةوالنَّكِرَة

14

Voice

المَبْني لِلمَعْلُوم و المَبْني لِلمَجْهُول

15

Emphasise

المُؤكَّد وغيرُ المُؤكَّد

16

Transitivity

اللازم والمتعدي

17

Humanness

العاقل وغير العاقل

18

Variability & Conjungation

التَّصريف

19

Augmented and Unaugmented

المجرَّد والمزيد

20

Root letters

المجرَّد والمزيد

21

Verb Internal Structure

بُنية الفعل

22

Noun finals

أقسام الأسم تبعاً للفظ آخره

References


AL-GHALAYINI, A.-S. M. الغلابيني ا. م.  (2005) Jami' Al-Duroos Al-Arabia "جامع الدروس العربية ", Saida - Lebanon, Al-Maktaba Al-Asriyiah "المكتبة العصرية ".


ALQRAINTY, S. (2008) A Morphological-Syntactical Analysis Approach For Arabic Textual Tagging. 2008. Leicester, UK, De Montfort University.


ALQRAINY, S. AND AYESH, A. (2006) Developing a Tagset for Automated POS Tagging in Arabic. WSEAS TRANSACTIONS on COMPUTERS, 5, 2787-2792.


AL-SHAMSI, F. AND GUESSOUM, A. (2006) A Hidden Markov Model-Based POS Tagger for Arabic. 8es Journees internationales d'Analyse statistique des Donnees Textuelles.


AL-SULAITI, L. AND ATWELL, E. (2005) aConCorde: Towards a Proper Concordance of Arabic. Corpus Linguistics conference 2005. University of Birmingham, UK.


ATWELL, E. (2008) Development of tag sets for part-of-speech tagging. IN LUDELING, A. AND KYTO, M. (Eds.) Corpus Liuistics: An International Handbook Volume 1. Mouton de Gruyter.


ATWELL, E., AL-SULAITI, L., AL-OSAIMI, S. AND SHAWAR, B. A. (2004) A Review of Arabic Corpus Analysis Tools. IN BEL, B. M., I (Ed.) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles.


ATWELL, E., DEMETRIOU, G., HUGHES, J., SCHIFFRIN, A., SOUTER, C. AND WILCOCK, S. (2000) A comparative evaluation of modern English corpus grammatical annotation schemes. ICAME Journal, International Computer Archive of Modern and medieval English, Bergen, 24, 7-23.


BAMMAN, D. AND CRANE, G. (2008) Building a Dynamic Lexicon from a Digital Library. Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008). Pittsburgh.


BRILL, E. (1995) Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, 21, 543-565.


DAHDAH, A. (1987) A dictionary of Arabic Grammer in Charts and Tables " معجم قواعد اللغة العربيه – في جداول ولوحات ", Beirut, Lebanon, Librairie du Liban publisher.


DAHDAH, A. (1993) A dictionary of Arabic Grammatical nomenclature Arabic – English " معجم لغة النحو العربي عربي-انكليزي ", Beirut, Lebanon, Librairie du Liban publishers.


DIAB, M., HACIOGLU, K. AND JURAFSKY, D. (2004) Automatic Tagging of Arabic Text: From raw text to Base Phrase Chunks. Proceedings of HLT-NAACL


DIAB, M. T. (2007) Towards an Optimal POS Tag Set for Arabic Processing. Proc RANLP.


DUH, K. AND KIRCHHOFF, K. (2005) POS Tagging of Dialectal Arabic: A Minimally Approach. ACL-05, Computational Approaches to Semitic Languages Workshop Proceedings. University of Michigan Ann Arbor, Michigan, USA.


FREEMAN, A. (2001) Brill's POS Tagger and a Morphology Parser for Arabic. NAACL 2001 Student Rersearch Workshop, Lancaster University.
HABASH, N., FARAJ, R. AND ROTH, R. (2009) Syntactic Annotation in Columbia Arabic Treebank. 2nd International Cnference on Arabic Language Resources AND Tools MEDAR 2009. Cairo, Egypt.


HABASH, N. AND RAMBOW, O. (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Ann Arbor, Michigan, Association for Computational Linguistics.


HARMAIN, H. M. (2004) Arabic Part-of-Speech Tagging. The Fifth Annual U.A.E. University Research Conference. United Arab Emirats.


KANAAN, G., AL-SHALABI, R. AND SAWALHA, M. (2003) Full Automatic Arabic Text Tagging System. International Conference on Information Technology and Natural Sciences ICITNS. Al-Zaytoonah University, Amman, Jordan.


KHOJA, S. (2001) APT: Arabic Part-of-Speech Tagger. Student Workshop at the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL2001). Carnegie Mellon University, Pittsburgh, Pennsylvania.


KHOJA, S., GARSIDE, P. AND KNOWLES, G. (2001) A tagset for the morphosynactic tagging of Arabic. Corpus Linguistics 2001. Lancaster University, Lancaster, UK.


KHOJA, S. (2003) APT: An Automatic Arabic Part-of-Speech Tagger. Computing Department. Lancaster, UK, Lancaster University.
MAAMOURI, M. AND BIES, A. (2004) Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools. Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004).


MARSI, E., BOSCH, A. V. D. AND SOUDI, A. (2005) Memory-based morphological analysis generation and part-of-speech tagging of Arabic.
MED EL AMINE ABDERRAHIM AND REGUIG, F. B. (2008) A Morphological Analyzer for Vocalized or Not Vocalized Arabic Language. Journal of Applied Sciences, 8, 984-991.


ROTH, R., RAMBOW, O., HABASH, N., DIAB, M. AND RUDIN, C. (2008) Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. Preceedings of ACL. Columbus, OH.


SAWALHA, M. AND ATWELL, E. (2008) Comparative evaluation of Arabic language morphological analysers and stemmers. Proceedings of COLING 2008 22nd International Conference on Comptational Linguistics.


SCHMID, H. AND LAWS, F. (2008) Estimation of Conditional Probabilities with Decision Trees and an Application to Fine-Grained POS Tagging. COLING'08. Manchester,UK.


SHAROFF, S., KOPOTEV, M., ERJAVECY, T., FELDMANZ, A. AND DIVJAK, D. (2008) Designing and Evaluating a Russian Tagset. LREC 2008: In Proceedings of the sixth international conference on Language Resources and Evaluation.


SOUDI, A., BOSCH, A. V. D. AND NEUMANN, G. (Eds.) (2007) Arabic Computational Morphology: Knowledge-Based and Empirical Methods, Springer Netherlands.


Teahan, Bill (1998), Modelling English Text. PhD Thesis, Department of Computer Science, University of Waikato, New Zealand.


TLILI-GUIASSA, Y. (2006) Hybrid Method for Tagging Arabic Text. Journal of Computer Science, 2, 245-248.


VOUTILAINEN, A. (2003) Part-of-Speech Taging. IN MITKOV, R. (Ed.) The Oxford Handbook of Computational Linguistics. Oxford University Press.


YAMINA, T.-G. (2005) Tagging by Combining Rules-Based Methods and Memory-Based Learning. Preceedings of world academy of science, engineering and technology, Volume 6 June 2005.


ZIBRI, C. B. O., TORJMEN, A. AND AHMAD, M. B. (2006) An Efficient Multi-agent system Combining POS-Taggers for Arabic Texts. CICLing 2006,, LNCS 3878.

 

Home Page | POS Tag Set | Blog | Arabic @ Leeds | University of Leeds