JUSTIN WASHTELL'S RESEARCH PAGE @ LEEDS

Emerging Research Interests

This page is due for an update to reflect my present research interests; the text below is from some years ago. In the meantime, you can follow my research closely on my research blog.


Research Interest

My PhD research is presently focused on scale-independent methods in Lexical Semantics, but my motivating interest is perhaps better defined more broadly as reductionist/cognitive language modelling (see below), and broader still as general principles of statistical and computational modelling.

The present avenue of exploration began with my BSc Environmental Biogeoscience dissertation, examining scale-independent techniques in that domain, and developed via a MSc Multidisciplinary Informatics dissertation in which I explored the extension of one such technique to applications within computational linguistics.

Granularity in language: order, dimension, scale...

Lexical Semantics is concerned with inferring meaning from symbols, and can be considered a branch of Semiotics. Popularly, it is concerned with attributing meanings to words in text based upon contextual clues in the language. One important such contextual clue is how a word is used with other words.

As discourse represents a flow of meaning over time, words which tend to occur together can be considered to be meaningfully associated (syntagmatic association). Extending this principle, words which share common such associations, or appear in similar contexts, can be considered to share similar meanings (paradigmatic association). This latter is the Distributional Hypothesis, expounded by Harris (1954). Extending the same principle further, we can identify words which exhibit similar relationship to similar words, and so on. The distinction each time is one of order: one is built from the next. The fundamental organizing principle is therefore syntagm.

Paradigmatic interpretations of language (such as extol the word class and the sense) are limiting. Even the mono-dimensional similarity metrics of state-of-the-art Lexical Semantics belie the fact that it is commonplace for two words, say, to be paradigmatically near-identical in some respects and in some contexts, but quite unrelated in others. Antonymy, which is notoriously difficult to identify, is a case in point. Psycholinguistic studies have suggested that it may be a characteristically Western, and acquired, perspective that emphasizes paradigm over syntagm (Yoneoka, 1987) Some other studies have challenged this. If a functional picture of the bare anatomy of language is not yet forthcoming, it is not surprising: we operate in an unparalleled regime where language is simultaneously an expression of our understanding of the world, a tool for developing that understanding, and the phenomenon being studied - a regime where “similarity” and “relatedness” are still used synonymously (sic) in the literature.

Although Lexical Semantics considers the lexeme, the lexeme is not always well defined, and common organizing principles can be observed at many scales. Combinations of words form mutli-word units, often with idiomatic meanings, which become reusable currencies exhibiting their own syntagmatic (and paradigmatic) associations. Within words, morphemes interact syntagmatically with items at the lexical and multi-word scales. Beyond well-behaved morphology, phonetic patterns are observable as phonaesthemes (Firth, 1930). The distinction between these phenomena is one of granularity rather than of any demonstrably fundamental nature. It makes sense that this is so: these common behaviours allow for creativity and efficiency in generating and interpreting language at all scales of expression. Consequently an utterance might be considered as much defined by its internal structure as by its external context: the scale at which we are trying to attribute meaning determining which is considered which.

Observations which transcend the traditional boundaries of scale and class can have a tendency to be considered abstract, but not necessarily useful. This is perhaps in part due to conflicts between our academic and innate perspectives of language, and in part due to a lack of sophistication with representational and computational methods - the two being somewhat mutually-perpetuating. But the gains of embracing these observations are potentially great; statistical machine translation is a case in point: it presently works very well with words and phrases which have been taught, but falls apart when morphology is used creatively, or idioms reconstructed. One way to set about integrating the broader insights of linguists such as Harris and Firth with applications-oriented NLP, is to explore mathematical representations and measures of language which are appropriately independent of parameters such as scale, dimension and order.
Justin Washtell
NLP group
School of Computing
University of Leeds

washtell@comp.leeds.ac.uk
+44 7508 049 061

Publications
Justin Washtell (2011) "Compositional Expectation: A Purely Distributional Model of Compositional Semantics", in Proceedings of the 2011 International Conference on Computational Semantics (IWCS'11)
Justin Washtell (2010) "Expectation Vectors: A Semiotics Inspired Approach to Geometric Lexical-Semantic Representation", in Proceedings of the 2010 Workshop on GEometric Models of Semantics (GEMS'10)
Justin Washtell & Katja Markert (2009) "A Comparison of Windowless and Window-Based Computational Association Measures as Predictors of Syntagmatic Human Associations", in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP'10), pages 628-637
Justin Washtell (2009) "Co-dispersion: A Windowless Approach to Lexical Association", in Proceedings of the 12th Conference of the European Chapter of the ACL (EACL'09), pages 861-869
Justin Washtell, Stephen Carver and Katherine Arrell (2009) "A Viewshed Based Classification of Landscapes Using Geomorphometrics", in Proceedings of Geomopheometry 2009
Lex Comber, Steve Carver, Steffen Fritz, Robert McMorran and Justin Washtell (2008) "Mapping Uncertainty in Perceptions of Landscape 'Wildness'", in Proceedings of the 16th Conference of GIS Research UK, pages 90-93
Eric Atwell, Junaid Arshad, Chien-Ming Lai, Lan Nim, Noushin Rezapour Asheghi, Josiah Wang and Justin Washtell (2007) "Which English Dominates the World Wide Web, British or American?", in Proceedings of the 4th Conference on Corpus Linguistics (CL2007), pages 90-93
Lex Comber, Steve Carver, Stefan Fritz, Robert McMorran, Justin Washtell, P Fisher. "Evaluating alternative mappings of wildness using fuzzy MCE and Dempster-Shafer in support of decision making." Geographical Analysis (to appear)

Awards & Prizes
20011/12 EPSRC Doctoral Prize fellowship (previously "PhD Plus")
2006/07 MSc Multidisciplinary Informatics Prize for best dissertation: "Co-dispersion by Nearest Neighbour: Adapting a Spatial Statistic for the Development of Domain-Independent Language Tools and Metrics"
2005/06 Sally Macgill Memorial Prize for best undergraduate dissertation: "Estimating Habitat Area and Related Ecological Metrics: From Theory Towards Best Practice"

Links
Interactive Fiction
http://jerz.setonhill.edu/if/
An introduction to Conway's Game of Life
http://www.math.com/students/​wonders/​life/​life.html
The Romanesco Cauliflower
http://images.google.co.uk/​images?​hl=en&um=1&​q=romanesco+cauliflower
Dolphin Cognition and Communication
http://www.dolphin-institute.org/​our_research/​index.htm
The opinions expressed on this page, and the content of any resources linked-to, are not necessarily shared or endorsed by the University of Leeds. Just so you know!