School of Computing

FACULTY OF ENGINEERING

 

ARABIC LANGUAGE COMPUTING RESEARCH @ Leeds


Video: Abdullah Alfaifi - Arabic Learner Corpus

Video: Kais Dukes - Arabic Language Computing applied to the Quran - ppt, wmv, doc, txt

Video: Abdul-Baquee Muhammad Sharaf - Text Mining the Quran



Wordle: Quran_othmani_script_coloured The Language research group in the School of Computing has an ongoing interest in corpus-based research on Arabic. Central to our research is the computational modelling of language data; a CORPUS is a text dataset representative of the language to be analysed.
  • Latifa Al-Sulaiti developed the first free-to-download Arabic corpus, the Corpus of Contemporary Arabic;
  • Andy Roberts developed the first free-to-download open-source concordance tool for analysis of Arabic corpus texts, aConCorde;
  • Majdi Sawalha developed the first theory-neutral Standard Arabic Linguistics Morphological Analysis tag set expounding traditional fine-grained morphological features, SALMA, with a Gold Standard SALMA-tagged sample corpus;
  • Majdi Sawalha developed an Arabic lexical resource for Arabic root-meaning search;
  • Nora Abbas developed the first Quran "search for a concept" tool and website, Qurany, and a Google-searchable index of Quran and Hadith, e.g. google "Prayers site:http://www.comp.leeds.ac.uk/nora/html/";
  • Kais Dukes developed the first online annotated linguistic resource which shows the Arabic "irab" morphology and grammar for each word and verse in the Holy Quran, the Quranic Arabic Corpus including word-by-word morphology and English gloss, and Ontology of Quranic concepts;
  • Abdul-Baquee Sharaf developed tools and resources for text mining the Quran including verse similarity, lemma concordance and collocation, and text mining the Hadeeth;
  • Serge Sharoff developed web concordance and collocation tools for Querying Arabic Corpora including 170-million-word Arabic Web Corpus, Arabic Wikipedia, Corpus of Contemporary Arabic, and specialised Arabic corpora for news, computer science, and legal texts;
  • Amal Al-Saif developed the first Leeds Arabic Discourse Treebank;
  • Eric Atwell, James Dickins and Majdi Sawalha developed Web-as-Corpus teaching resources for Arabic and Islamic Studies;
  • Eric Atwell, James Dickins, Claire Brierley, Majdi Sawalha and Tajul Islam are researching Natural Language Processing Working Together With Arabic And Islamic Studies
    wordle of Arabic-English word-aligned Quran
    Eric in Saudi dress

    CONTACT:

    Eric Atwell, Senior Lecturer. Research Interests: Corpus Linguistics, Arabic language processing, technologies for knowledge management applied to the Quran, making sense of surveillance and intelligence data, Unsupervised and Supervised Machine Learning from corpora, chatbots and their applications, national varieties of Arabic, Arab English, morphosyntactic and Part-of-Speech tagging, evaluation. PUBLICATIONS.

    Vacancies: Web Content Managers, Fixed-term 3 months, part-time 20-50%, salary Grade 6.1 point 23 (20-50% of 24,766p.a. pro-rata paid monthly)

    Research Student projects in Arabic language computing

    STUDENTRESEARCH TOPIC
    Abdullah AlfaifiBuilding an Arabic Learner Corpus (ALC) with Part-of-Speech (POS) Tagging and Error Annotation
    Amal AlsaifAn Automatic analyser of Discourse structure for Arabic
    Kais DukesArabic Language Computing Applied to the Quran
    Majdi SawalhaAutomatic Part-of-Speech Tagging of Arabic Language Text
    Abdul-Baquee Sharaf A Computational Model for Knowledge Representation of the Quran

    Alumni: graduates of the Arabic language computing research group

    GRADUATETHESIS
    Noorhan Abbas, 2009.Quran 'Search for a Concept' tool and website
    Bayan Abu Shawar, 2005. A Corpus Based Approach to Generalise a Chatbot System
    Latifa Al-Sulaiti, 2004. Designing and Developing a Corpus of Contemporary Arabic
    Eric Atwell, 2008. Corpus Linguistics and Language Learning: Bootstrapping Linguistic Knowledge and Resources from Text
    Andy Roberts, 2008. Grammatical Inference and Corpus linguistics

    Research Facilties at Leeds University

    Research facilities in the School of Computing at Leeds University include a dedicated high speed network infrastructure, a wide range of corpora (Arabic, English and many other languages), software tools for corpus analysis, language analysis, machine learning and data mining, and software development. Staff teach Research-led undergraduate and postgraduate courses, for example Natural Language Processing, Knowledge Management and Adaptive Systems, Language. Leeds University is unique in having a very wide range of language research expertise: Arabic language computing researchers can learn from and collaborate with researchers in a wide range of departments across Leeds University:
  • Arabic and Middle Eastern Studies,
  • Translation Studies,
  • Linguistics and Phonetics,
  • Modern Languages and Cultures,
  • English,
  • Education,
  • Theology and Religious Studies,
  • Classics,
  • Interdisciplinary Gender Studies,
  • Colonial and Postcolonial Studies,
  • African Studies,
  • Psychological Sciences,
  • Communications Studies,
  • Disability Studies,
  • The Language Centre,
  • Centre for Joint Honours.
    Eric in Omani dress We welcome applications to join us as PhD research students, or as research sponsors and/or collaborators.

    Our publications in Arabic language computing

    [pdf] Sawalha, M; Atwell, ES Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic in: Proceedings of LREC'2010 Language Resources and Evaluation Conference. 2010.

    [pdf] Atwell, ES; Dukes, K; Abdul Baquee, S; Habash, N; Louw, B; Abu Shawar, B; McEnery, T; Zaghouani, W; El-Haj, M Understanding the Quran: a new Grand Challenge for Computer Science and Artificial Intelligence. Proceedings of GCCR'2010 Grand Challenges in Computing Research. 2010.

    [pdf] Hassan, H; Daud, N; Atwell, ES Connectives in the World Wide Arabic corpus . Proceedings of IVACS'2010 Inter-Varietal Applied Corpus Studies Conference. 2010.

    [pdf] Sawalha, M; Atwell, ES Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text in: LREC'2010 Language Resources and Evaluation Conference. 2010.

    [pdf] Abu Shawar, B; Atwell, ES Chatbots: Can they serve as Natural Language interfaces to QA corpus? in: Proceedings of ACSE'2010: Sixth IASTED International Conference on Advances in Computer Science and Engineering. 2010.

    [pdf] Dukes, K; Atwell, ES; Abdul Baquee, S Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank. LREC'2010 Language Resources and Evaluation Conference. 2010.

    [pdf] Sawalha, M; Atwell, ES Adapting Language Grammar Rules for Building a Morphological Analyzer for Arabic Text(in Arabic). Proceedings of ALECSO Arab League Educational Cultural and Scientific Organization workshop on Arabic morphological analysis. 2009.

    [pdf] Sharaf, A; Atwell, ES A Corpus-based Computational Model for Knowledge Representation of the Quran. Proceedings of CL2009 International Conference on Corpus Linguistics. 2009.

    [pdf] Abu Shawar, B; Atwell, ES Arabic Question-Answering via Instance Based Learning from an FAQ Corpus. Proceedings of CL2009 International Conference on Corpus Linguistics. 2009.

    [pdf] Atwell, ES; Al-Sulaiti, L; Sharoff, S Arabic and Arab English in the Arab World. Proceedings of CL2009 International Conference on Corpus Linguistics. 2009.

    [pdf] Pritchard, J; Atwell, E; Newman, M; Dorling, D; Hall, F Mapping Language: From data to diaspora. Proceedings of Workshop on Research Infrastructure for Linguistic Variation. University of Oslo. 2009.

    [pdf] Sawalha, M; Atwell, ES Linguistically Informed and Corpus Informed Morphological Analysis of Arabic. Proceedings of CL2009 International Conference on Corpus Linguistics. 2009.

    [pdf] Sawalha, Majdi; Atwell, Eric. Comparative evaluation of Arabic language morphological analysers and stemmers. Proceedings of COLING 2008 22nd International Conference on Computational Linguistics. 2008.

    [pdf] Atwell, Eric; Abbas, Noorhan; Abu Shawar, Bayan; Alsaif, Amal; Al-Sulaiti, Latifa; Roberts, Andrew; Sawalha, Majdi. Mapping Middle Eastern and North African diasporas: Arabic corpus linguistics research at the University of Leeds in: Proceedings of BRISMES Conference 2008. 2008.

    [pdf] Atwell, Eric. A cross-language methodology for corpus Part-of-Speech tag-set development in: Proceedings of Corpus Linguistics 2007. 2007.

    [pdf] Al-Sulaiti, Latifa; Atwell, Eric. The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, vol. 11, pp. 135-171. 2006.

    [pdf] Roberts, Andrew; Al-Sulaiti, Latifa; Atwell, Eric. aConCorde: Towards an open-source, extendable concordancer for Arabic. Corpora journal, vol. 1, pp. 39-57. 2006.

    [pdf] Abu Shawar, Bayan; Atwell, Eric. Using corpora in machine-learning chatbot systems. International Journal of Corpus Linguistics, vol. 10, pp. 489-516. 2005.

    [pdf] Al-Sulaiti, Latifa; Roberts, Andrew; Atwell, Eric. The use of corpora and concordance in the teaching of contemporary Arabic in: Proceedings of EuroCALL 2005. 2005.

    [pdf] Al-Sulaiti, Latifa; Atwell, Eric. Extending the corpus of contemporary Arabic in: Proceedings of Corpus Linguistics 2005. 2005.

    [pdf] Roberts, Andrew; Al-Sulaiti, Latifa; Atwell, Eric. aConCorde: towards a proper concordance of Arabic in: Proceedings of Corpus Linguistics 2005. 2005.

    [pdf] Al-Sulaiti, Latifa. The North African Experience. ElSNews: Newsletter of the European Language and Speech Research Network, Vol 13.1, pp.11-12. 2004.

    [pdf] Atwell, Eric. Clustering of word types and unification of word tokens into grammatical word-classes in: Bel, B &Marlien, I (editors) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, Volume 1, pp. 27-32 ATALA. 2004.

    [pdf] Abu Shawar, Bayan; Atwell, Eric. An Arabic chatbot giving answers from the Qur'an in: Bel, B & Marlien, I (editors) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, Volume 2, pp. 197-202 ATALA. 2004.

    [pdf] Atwell, Eric; Al-Sulaiti, Latifa; Al-Osaimi, Saleh; Abu Shawar, Bayan. A review of Arabic corpus analysis tools in: Bel, B & Marlien, I (editors) Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, Volume 2, pp. 229-234 ATALA. 2004.

    [pdf] Abu Shawar, Bayan; Atwell, Eric. Evaluation of chatbot systems in: Proceedings of Eighth Maghrebian Conference on Software Engineering and Artificial Intelligence. 2004.

    [pdf] Al-Sulaiti, Latifa; Atwell, Eric. Designing and developing a corpus of contemporary Arabic in: TALC 2004: Proceedings of the sixth Teaching And Language Corpora conference, pp. 92-93. 2004.

    [pdf] Atwell, Eric; Abu Shawar, Bayan; Babych, Bogdan; Elliott, Debbie; Elliott, John; Gent, Paul; Hartley, Anthony; Hu, Xunlei Rose; Medori, Julia; Oba, Toshifumi; Roberts, Andy; Scharoff, Serge; Souter, Clive. Corpus Linguistics, Machine Learning and Evaluation: Views from Leeds University of Leeds, School of Computing research report 2003.02. 2003.

    [pdf] Al-Sulaiti, Latifa; Atwell, Eric. The Design of a Corpus of Contemporary Arabic (CCA) University of Leeds, School of Computing research report 2003.11. 2003.

    [pdf] Al-Sulaiti, Latifa. Computer Assisted Language Learning (CALL). ElSNews: Newsletter of the European Language and Speech Research Network, Vol 12.1, pp.1-3. 2003.

    [pdf] Al-Sulaiti, Latifa; Knowles, Gerry. A Multimedia Arabic Course. In Proceedings of the International Symposium on the processing of Arabic, University of Manouba, Tunis, Tunisia, pp. 94-105. 2002.

    [pdf] Atwell, Eric. The Language Machine., 64pp The British Council. 1999.

    Brockett, A; Atwell, E S; Taylor, O; Page, M. An Arabic text database and glossary system for students in Proceedings of the Seminar on Bilingual Computing in Arabic and English, pp154-162, University of Cambridge. 1989.