Latifa Al-Sulaiti's Homepage
Arabic Web Concordancing
-Leeds Internet Corpora
These large untagged corpora have been developed by Serge Sharoff at the Centre for Translation Studies, University of Leeds. They are developed from the Web using automated search engine queries. The corpora are available with a Web interface, and they can be accessed [here].
Some tips on the use of Leeds Internet Corpora:
1. For basic queries type the word in the space and press Submit Query. For example, كتاب gives all the occurrences of the word. If you type كتاب.* you get examples such as: كتابي , كتابك , كتاباتي ...etc. If you type .*كتاب.*, you get examples such as الكتاب , الكتابه ,للكتاب ...etc.
2. If the word you are searching for has different spellings such as 'Google', use the following syntax: (جوجل)|(قوقل)|(غوغل).
3. Concordances lines are selected randomly with the first word is the most frequent, but there is no statistics to show its frequency.
4. Frequency of words only appear in the collocations.
5. There is sometimes inaccuracy of frequency counts because some pages of the Internet are duplicated. Users need to check manually.
6. The Web interface gives 'keyword in context', with an option of retrieving the source document by clicking on the side arrow or Invert all.
7. There is no option of saving query results. Users have to save results manually by copy and paste.
8. Only single words can be queried.
9. The Web interface provides statistical information such as T-score, Mutual Information (MI) score, and Log-Likelihood score. Very useful information on the use of statistical measures of lexical associations: Mutual Information score, and T-score can be found in Biber, D. et al (1998) p. 265-8.
This untagged corpus was developed by Dilworth Parkinson. It is large and it can be accessed on this site: Words can be search in Arabic or Latin script. The website provides detailed instructions on the search.
Users need to register before using it, but not necessarily have to pay.
In order to use corpora for research, students and professional researchers have to follow the steps of corpus-based approach:
Selected Books and Online-Articles on Corpora and Language
These resources introduce you to some of the most common statistical techniques used in corpus-based studies and how to report the results. Different sample studies which cover lexicography, grammar, discourse, and register variation are presented.
Biber, D. et al. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press.
Hunston, S. (2002). Corpora in Applied Linguistics (Cambridge Applied Linguistics). Cambridge University Press.
Ghadessy, M. et al. (eds). (2001) Small Corpus Studies and ELT: Theory and Pracrice John Benjamins Publishing Company.
Kennedy, G. (1998). An Introduction to Corpus Linguistics (Studies in Language and Linguistics). Longman.
McEnry, T. et al. (2005). Corpus-Based Language Studies. Routledge.
Oakes, M. (1998). Statistics for Corpus Linguistics. Columbia University Press.
Stubbs, M. (1996) Text and Corpus Analysis: Computer assisted studies of language and institutions (language in society). WileyBlackwell.
Tognini-Bonelli, E. (2001). Corpus Linguistics at Work. Amsterdam: John Benjamins Publishing Company.
Cobb, T. Is there any measurable learning from hands-on concordancing? System 25 (3), 301-315.
Hadley, G. forthcoming, 'Sensing the Winds of Change: An Introduction to Data-driven Learning'. To appear in Insights 2 .(seen online March 14, 2009).
Kennedy, C. & Miceli, T. (2001). An evaluation of intermediate students' Approaches to corpus investigation. Language Learning & Technology. Vol. 5, No. 3, September 2001, pp. 77-90.
Pinna, A. (2002). Corpus techniques at work in the ELT classromm. Annali della Facoltà di Lingue e Letterature Straniere, Vol. 2 , pp. 35-59.
Stevens, V. (1995), 'Concordancing with Language Learners: Why? When? What?', CAELL Journal 6/2, pp. 2-10.
Thompson, P. & Tribble, C. (2001). Looking at citations: using corpora in English for academic purposes. Language Learning & Technology. Vol. 5, No. 3, September 2001, pp. 91-105.
Tribble, C. (1997). 'Improvising Corpora for ELT: Quick and Dirty Ways of Developing Corpora for Language Teaching'. In B. Lewandowska-Tomaszczyk and J. Melia (eds) Proceedings of the First International conference on Practical Applications in Language Corpora.
Last Modified: March 9, 2009 9:00 AM