ARABIC LANGUAGE COMPUTING RESEARCH
The Language research group in the School of Computing has an ongoing interest in corpus-based research on Arabic. Language research in Computing is also known as Natural Language Processing , Computational Linguistics , or Language Engineering . Central to our research is the computational modelling of language data; a CORPUS is a text dataset representative of the language to be analysed. Latifa Al-Sulaiti has developed a new free-to-download Arabic corpus, the Corpus of Contemporary Arabic; Andy Roberts has developed a free-to-download open-source concordance tool for analysis of Arabic corpus texts, aConCorde; Nora Abbas has developed a Quran "search for a concept" tool and website, Qurany; and Kais Dukes is developing an online annotated linguistic resource which shows the Arabic grammar, syntax and morphology for each word in the Holy Quran, the Quranic Arabic Corpus.CONTACT:
Eric Atwell, Senior Lecturer.
Research Interests: Corpus Linguistics, Arabic language processing,
technologies for knowledge management applied to the Quran,
making sense of surveillance and intelligence data,
Unsupervised and Supervised Machine Learning from corpora,
chatbots and their applications,
international English,
morphosyntactic and Part-of-Speech tagging, evaluation.
PUBLICATIONS.
Research Student projects in Arabic language computing
| STUDENT | RESEARCH TOPIC |
|---|---|
| Noorhan Abbas | Quran 'Search for a Concept' tool and website |
| Amal Alsaif | An Automatic analyser of Discourse structure for Arabic |
| Kais Dukes | Arabic Language Computing Applied to the Quran |
| Majdi Sawalha | Automatic Part-of-Speech Tagging of Arabic Language Text |
| Abdul-Baquee Sharaf | A Computational Model for Knowledge Representation of the Quran |
Alumni: graduates of the Arabic language computing research group
| GRADUATE | THESIS |
|---|---|
| Bayan Abu Shawar | A Corpus Based Approach to Generalise a Chatbot System |
| Latifa Al-Sulaiti | Designing and Developing a Corpus of Contemporary Arabic |
| Eric Atwell | Corpus Linguistics and Language Learning: Bootstrapping Linguistic Knowledge and Resources from Text |
| Andy Roberts | Grammatical Inference and Corpus linguistics |
Example publications on Arabic language computing
[pdf] Sawalha, Majdi; Atwell, Eric. Comparative evaluation of Arabic language morphological analysers and stemmers in: Proceedings of COLING 2008 22nd International Conference on Computational Linguistics. 2008.
[pdf] Atwell, Eric; Abbas, Noorhan; Abu Shawar, Bayan; Alsaif, Amal; Al-Sulaiti, Latifa; Roberts, Andrew; Sawalha, Majdi. Mapping Middle Eastern and North African diasporas: Arabic corpus linguistics research at the University of Leeds in: Proceedings of BRISMES Conference 2008. 2008.
[pdf] Atwell, Eric. A cross-language methodology for corpus Part-of-Speech tag-set development in: Proceedings of Corpus Linguistics 2007. 2007.
[pdf] Al-Sulaiti, Latifa; Atwell, Eric. The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, vol. 11, pp. 135-171. 2006.
[pdf] Roberts, Andrew; Al-Sulaiti, Latifa; Atwell, Eric. aConCorde: Towards an open-source, extendable concordancer for Arabic. Corpora journal, vol. 1, pp. 39-57. 2006.
[pdf] Abu Shawar, Bayan; Atwell, Eric. Using corpora in machine-learning chatbot systems. International Journal of Corpus Linguistics, vol. 10, pp. 489-516. 2005.
[pdf] Al-Sulaiti, Latifa; Roberts, Andrew; Atwell, Eric. The use of corpora and concordance in the teaching of contemporary Arabic in: Proceedings of EuroCALL 2005. 2005.
[pdf] Al-Sulaiti, Latifa; Atwell, Eric. Extending the corpus of contemporary Arabic in: Proceedings of Corpus Linguistics 2005. 2005.
[pdf] Roberts, Andrew; Al-Sulaiti, Latifa; Atwell, Eric. aConCorde: towards a proper concordance of Arabic in: Proceedings of Corpus Linguistics 2005. 2005.
[pdf] Al-Sulaiti, Latifa. The North African Experience. ElSNews: Newsletter of the European Language and Speech Research Network, Vol 13.1, pp.11-12. 2004.
[pdf] Atwell, Eric. Clustering of word types and unification of word tokens into grammatical word-classes in: Bel, B &Marlien, I (editors) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, Volume 1, pp. 27-32 ATALA. 2004.
[pdf] Abu Shawar, Bayan; Atwell, Eric. An Arabic chatbot giving answers from the Qur'an in: Bel, B & Marlien, I (editors) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, Volume 2, pp. 197-202 ATALA. 2004.
[pdf] Atwell, Eric; Al-Sulaiti, Latifa; Al-Osaimi, Saleh; Abu Shawar, Bayan. A review of Arabic corpus analysis tools in: Bel, B & Marlien, I (editors) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, Volume 2, pp. 229-234 ATALA. 2004.
[pdf] Abu Shawar, Bayan; Atwell, Eric. Evaluation of chatbot systems in: Proceedings of Eighth Maghrebian Conference on Software Engineering and Artificial Intelligence. 2004.
[pdf] Al-Sulaiti, Latifa; Atwell, Eric. Designing and developing a corpus of contemporary Arabic in: TALC 2004: Proceedings of the sixth Teaching And Language Corpora conference, pp. 92-93. 2004.
[pdf] Atwell, Eric; Abu Shawar, Bayan; Babych, Bogdan; Elliott, Debbie; Elliott, John; Gent, Paul; Hartley, Anthony; Hu, Xunlei Rose; Medori, Julia; Oba, Toshifumi; Roberts, Andy; Scharoff, Serge; Souter, Clive. Corpus Linguistics, Machine Learning and Evaluation: Views from Leeds University of Leeds, School of Computing research report 2003.02. 2003.
[pdf] Al-Sulaiti, Latifa; Atwell, Eric. The Design of a Corpus of Contemporary Arabic (CCA) University of Leeds, School of Computing research report 2003.11. 2003.
[pdf] Al-Sulaiti, Latifa. Computer Assisted Language Learning (CALL). ElSNews: Newsletter of the European Language and Speech Research Network, Vol 12.1, pp.1-3. 2003.
[pdf] Al-Sulaiti, Latifa; Knowles, Gerry. A Multimedia Arabic Course. In Proceedings of the International Symposium on the processing of Arabic, University of Manouba, Tunis, Tunisia, pp. 94-105. 2002.
[pdf] Atwell, Eric. The Language Machine., 64pp The British Council. 1999.
Brockett, A; Atwell, E S; Taylor, O; Page, M. An Arabic text database and glossary system for students in Proceedings of the Seminar on Bilingual Computing in Arabic and English, pp154-162, University of Cambridge. 1989.
Research Facilties at Leeds University
Research facilities in the School of Computing at Leeds University include a dedicated high speed network infrastructure, a wide range of corpora, and software tools for corpus management, language analysis, machine learning, and software development. Language research staff teach Research-led modules in Natural Language Processing, Technologies for Knowledge Management, Language, Computational Modelling. An invaluable additional resource at Leeds is access to related language research expertise in a wide range of departments across the University: Modern Languages and Cultures, Arabic and Middle Eastern Studies, Translation Studies, Interdisciplinary Gender Studies, English, Education, Linguistics and Phonetics, Colonial and Postcolonial Studies, African Studies, Psychological Sciences, Communications Studies, Disability Studies, The Language Centre, Centre for Joint Honours. We welcome applications to join us as PhD research students, or as research sponsors and/or collaborators.
Leeds Corpus Linguistics research seminars
International Conferences
