Sections
Linguistics

108 Sproul Hall
University of California, Davis
One Shields Avenue
Davis, CA 95616

(530) 752-9933 phone
(530) 752-3156 fax

 
Linguistics > Event Things > Using Electronic Corpora to Profile Levels of Proficiency in Second Language Acquisition: the Cambridge English Profile Project
Personal tools

Using Electronic Corpora to Profile Levels of Proficiency in Second Language Acquisition: the Cambridge English Profile Project

A joint presentation by John A. Hawkins (UC Davis & Cambridge University), Nick Saville (Cambridge Assessment), Paula Buttery (Cambridge University) and Mike McCarthy (Penn State University & Cambridge University Press). The initial presentation will be followed by a workshop given by Paula Buttery.

What Special Presentation
When March 27, 2008
from 02:00 pm to 05:00 pm
Where CMB Annex Conference Room, 202 Cousteau Place, suite 250
Contact Name John Hawkins
Contact Email
Contact Phone 530-752-0715
Add event to calendar vCal
iCal

Click here for map and directions

Click here for PDF flyer

Several decades of practical work on language testing and teaching have led to the six proficiency levels of the Common European Framework of Reference for Languages (CEFR). In this talk we ask the question: how much of the grammar, lexicon and usage conventions of English do learners actually know at each of these levels? Attempts to describe their defining characteristics have been rather general hitherto or have been couched in functional terms (the 'can-do' statements) that are compatible with numerous grammatical and lexical possibilities. Greater precision can be achieved through the use of electronic corpora. The collaborative work we report on is based on an empirical examination of the Cambridge Learner Corpus. The accessibility of items has been enhanced through part-of-speech tagging and parsing, permitting searches to be conducted that go beyond individual words. New codes have been entered into the data that facilitate these searches and that enable us to look for 'criterial features' at each proficiency level and for first language transfer effects.

A set of research hypotheses were defined at the outset of the project and illustrative findings to date testing these hypotheses will be presented. For example, missing determiners (*I spoke to President) and determiner choice errors (*Have the nice day) can now be quantified at each proficiency level, giving a determiner usage profile and revealing the impact of different first languages on second language acquisition. Speakers of first languages without definite and indefinite articles make more errors, as one might expect, and errors generally decline or are equal at each successive level. What we can now do is quantify this by level and across a range of typologically different first languages (Spanish, German, Chinese, Turkish, Japanese, Korean, Russian, etc). Similarly we can quantify morpho-syntactic errors, such as number agreement (*The three birds is singing) and finite verb morphology (*I will must take the bus), by level and first language. Even more significantly, the tagged and parsed learner corpus enables us to measure what learners can actually do grammatically and lexically, in addition to what they cannot do (i.e. errors). We can quantify their increasing exploitation of the resources of English, e.g. numbers of nouns and verbs used, the words and phrases that nouns and verbs co-occur with, the increasing complexity of a variety of syntactic structures, and so on at each level. By comparing these quantities with the corresponding properties and numbers in the British National Corpus we can measure their expanding command of English, compared with native speaker usage.

Once criterial features and transfer effects have been identified at the different proficiency levels there are significant practical benefits for English language teaching, testing and publishing, some of which will be outlined.

The initial presentation will be followed by a workshop given by Paula Buttery, which will illustrate the search possibilities, and limitations, of the current corpus. This project is at a relatively early stage and a major current focus is the collection of new data that are of relevance to the research hypotheses we have identified. Those attending the workshop are invited to give feedback and possibly to join the project in a constructive way.


« July 2008 »
Su Mo Tu We Th Fr Sa
12345
6789101112
13141516171819
20212223242526
2728293031