|
JUSTIN WASHTELL'S RESEARCH PAGE @ LEEDS
Emerging Research Interests This page is due for an update to reflect my present research interests; the text below is from some years ago. In the meantime, you can follow my research closely on my research blog.
Research Interest
My PhD research is presently focused on scale-independent methods in Lexical Semantics, but my motivating interest is perhaps better defined more broadly as reductionist/cognitive language modelling (see below), and broader still as general principles of statistical and computational modelling. The present avenue of exploration began with my BSc Environmental Biogeoscience dissertation, examining scale-independent techniques in that domain, and developed via a MSc Multidisciplinary Informatics dissertation in which I explored the extension of one such technique to applications within computational linguistics. Granularity in language: order, dimension, scale... Lexical Semantics is concerned with inferring meaning from symbols, and can be considered a branch of Semiotics. Popularly, it is concerned with attributing meanings to words in text based upon contextual clues in the language. One important such contextual clue is how a word is used with other words. As discourse represents a flow of meaning over time, words which tend to occur together can be considered to be meaningfully associated (syntagmatic association). Extending this principle, words which share common such associations, or appear in similar contexts, can be considered to share similar meanings (paradigmatic association). This latter is the Distributional Hypothesis, expounded by Harris (1954). Extending the same principle further, we can identify words which exhibit similar relationship to similar words, and so on. The distinction each time is one of order: one is built from the next. The fundamental organizing principle is therefore syntagm. Paradigmatic interpretations of language (such as extol the word class and the sense) are limiting. Even the mono-dimensional similarity metrics of state-of-the-art Lexical Semantics belie the fact that it is commonplace for two words, say, to be paradigmatically near-identical in some respects and in some contexts, but quite unrelated in others. Antonymy, which is notoriously difficult to identify, is a case in point. Psycholinguistic studies have suggested that it may be a characteristically Western, and acquired, perspective that emphasizes paradigm over syntagm (Yoneoka, 1987) Some other studies have challenged this. If a functional picture of the bare anatomy of language is not yet forthcoming, it is not surprising: we operate in an unparalleled regime where language is simultaneously an expression of our understanding of the world, a tool for developing that understanding, and the phenomenon being studied - a regime where “similarity” and “relatedness” are still used synonymously (sic) in the literature. Although Lexical Semantics considers the lexeme, the lexeme is not always well defined, and common organizing principles can be observed at many scales. Combinations of words form mutli-word units, often with idiomatic meanings, which become reusable currencies exhibiting their own syntagmatic (and paradigmatic) associations. Within words, morphemes interact syntagmatically with items at the lexical and multi-word scales. Beyond well-behaved morphology, phonetic patterns are observable as phonaesthemes (Firth, 1930). The distinction between these phenomena is one of granularity rather than of any demonstrably fundamental nature. It makes sense that this is so: these common behaviours allow for creativity and efficiency in generating and interpreting language at all scales of expression. Consequently an utterance might be considered as much defined by its internal structure as by its external context: the scale at which we are trying to attribute meaning determining which is considered which. Observations which transcend the traditional boundaries of scale and class can have a tendency to be considered abstract, but not necessarily useful. This is perhaps in part due to conflicts between our academic and innate perspectives of language, and in part due to a lack of sophistication with representational and computational methods - the two being somewhat mutually-perpetuating. But the gains of embracing these observations are potentially great; statistical machine translation is a case in point: it presently works very well with words and phrases which have been taught, but falls apart when morphology is used creatively, or idioms reconstructed. One way to set about integrating the broader insights of linguists such as Harris and Firth with applications-oriented NLP, is to explore mathematical representations and measures of language which are appropriately independent of parameters such as scale, dimension and order. |
Publications
Awards & Prizes
Links
| ||||||||||||||||||||||||||||||||
| The opinions expressed on this page, and the content of any resources linked-to, are not necessarily shared or endorsed by the University of Leeds. Just so you know! | |||||||||||||||||||||||||||||||||