Job Title :

Research Grants:

Past Projects:

PhD (2011):

Natural Language Processing Group, School of Computing, Faculty of Engineering, University of Leeds

Senior Research Fellow

EPSRC: Natural Language Processing Working Together with Arabic and Islamic Studies (2013-2015)

EPSRC/ESRC/CPNI: Detecting Terrorist Activities - Making Sense (2010-2013)
JISC: e-Health GATEway to the Clouds (2012)

Prosody Resources and Symbolic Prosodic Features for Automated Phrase Break Prediction. School of Computing, University of Leeds

Publications: Click here to see my Research Publications  
CV: Click here to see my Academic Profile  

EPSRC: Natural Language Processing Working Together with Arabic and Islamic Studies
As Research Co-Investigator and principal author on this project proposal and funded project, I am part of an interdisciplinary team from the University of Leeds and the University of Jordan with a shared interest in Arabic Natural Language Processing (NLP), seeking to build on Leeds' growing reputation as a world-leader in the application of NLP to the Qur'an. As a team, we combine expertise and quantitative techniques from Computational Linguistics and Text Analytics (the ICT research stream) with more traditional, introspective approaches to the study of texts fostered in the Humanities research streams of Arabic Language and Literature, and Qur'anic and Islamic Studies. The project delivers new Computational Linguistics tools and Text Analytics techniques for stylistic analysis of "implicit prosody" in text. Implicit prosody denotes our tendency to project our knowledge of the spoken language onto text, treating it as part of the input, even during silent reading (cf. Fodor, J. “Psycholinguistics cannot escape prosody.” Proc. Speech Prosody 2002). Tajwid or Qur'anic recitation is a sub-field and taught module in Islamic Studies programmes at Jordan and Leeds, and our original insight in this project is to view Tajwid mark-up of prosodic phrase boundaries and salient pitch accents (and hence lexis) in the Qur'an as additional sources of text-based data for computational and semantic analysis. Our tools and techniques will be re-usable and transferable, but are here applied to Arabic language data in the Qur'an as an exemplary speech corpus (i.e. a machine-readable and annotated text) and a strategically important societal and cultural domain. Dissemination of outputs will also be multidisciplinary and will target researchers in: NLP and Artificial Intelligence; Arabic Language and Literature; Qur'anic and Islamic Studies; Corpus Linguistics and Digital Humanities; Lexicogrpahy; Linguistics and Phonetics; and Psychology.

Making Sense Project: This is a consortium project funded by EPSRC/ESRC/CPNI in the field of Visual Analytics involving nine UK universities, with a remit to create an interactive, visualization-based decision support assistant as an aid to intelligence analysts. Leeds is responsible for the Text Extraction work package, where we make a novel application of the Corpus Linguistics technique of Keyword Extraction to gist or summarise large quantities of non-standard, text-based intelligence data as one of the modalities that needs to be integrated, prior to discovering links in fused data, and visualizing results to support interactive query and search. Statistically significant keywords and phrases thus extracted, and/or their computed weights, are used to profile specific text domains, and as classificatory features for automatic retrieval of similar, conceptually-linked texts from a mass of unseen material.
e-Health GATEway Project: Specification of guidelines, plus algorithm design, for anonymising the free text elements in electronic patient records as a necessary pre-requisite step to further text analytic investigation of same.
Doctorate: My PhD in the Computing sub-field of Natural Language Processing is entitled: "Prosody Resources and Symbolic Prosodic Features for Automated Phrase Break Prediction." The focus is on: (i) the development of a lexicon and prosody and part-of-speech annotation tool for English; (ii) text mining and significance testing of symbolic/categorical prosodic correlates of phrase breaks or rhythmic juncture; (iii) evaluation of statistically significant features thus discovered via supervised machine learning experiments involving automatic binary classification of lexical items (or alternatively whitespaces) as breaks or non-breaks.

Research Interests:

  • Prosody and prosody modelling - especially exploration of stress-time, and phonetic and rhythmic correlates of phrase juncture in text
  • Prosodic-syntactic chunking and the machine learning task of automated phrase break prediction
  • Stylometry - interpreted as algorithmic formulation of rhythmic patterns in poetry and prose to characterise genre etc
  • Text Analytics - especially application of keyword extraction to identify features for automatic genre classification etc
  • Corpus Linguistics, Digital Humanities, and Literary and Linguistic Computing
  • Lexicography - especially symbolic representation of phonetic and rhythmic properties of words for pronunciation dictionaries
  • Speech and language technologies for virtual characters in education, therapy and entertainment (e.g. video games and interactive story/drama), and the study of ensuing human-agent interactions
  • Applied Linguistics - TESOL and TAFL

Version: May 2013