Standard setting

The European Survey on Language Competences (ESLC) is in many ways a ground-breaking project. The language tests which are being developed and the planned phases of alignment and standard-setting, represent the most concerted attempt yet to produce a valid measure of functional language proficiency which is consistent and comparable across the five most-learned languages of Europe.

The Survey and the CEFR

The results of the Survey are to be reported in terms of the Common European Framework of Reference (CEFR).

Since its publication in 2001 the CEFR has become a key instrument of language policy in Europe and beyond, particularly for use of the CEFR levels as the basis for defining learning targets and curricula.

Language testers in Europe are now expected to provide evidence that their exams do actually correspond to the CEFR levels that they claim to.

The Council of Europe has produced a Manual for relating language examinations to the CEFR (Council of Europe 2008). However, among the assessment community, debate continues as to what kind of evidence can convincingly link an exam result to a CEFR level, and even about what the levels actually mean.

Linking the Language Tests to the CEFR

Most of the work linking exams to the CEFR has concerned just one language, while the Survey deals with five. This has imposed a logical sequence of steps on the test design, construction and validation process:

  • First, design and construct comparable tests in each language: that is, set out to test the same things in the same way;
  • Second, align the tests across languages, so that a student’s measure on one language represents precisely the same level of functional ability as the same measure on another language;
  • Third, set standards: that is, match specific levels of ability to the four tested CEFR levels A1 to B2.

Let us look briefly at each of these areas.

Achieving comparability

The need to demonstrate comparability across languages has been taken into account from the test design stage.

  • A single test design was adopted. Sets of testable subskills were developed for Reading, Listening and Writing, drawn from the descriptor scales of the CEFR at levels A1 to B2.
  • Each subskill was mapped to a specific task type, e.g. multiple choice tasks.
  • Writing is made directly comparable by using what is essentially the same set of tasks rendered into the different languages: for example, writing a holiday postcard, or a review of a film.
  • For Reading and Listening a subset of tasks have also been adapted across languages. Trials indicate general comparability in terms of resulting difficulty.

Aligning the languages to each other

Writing is the easiest skill to align across languages because samples of written performance can be directly compared with each other.

The method SurveyLang will use is based on ranking – that is, asking judges to put samples in order of quality. The ranking data can be analysed using a form of the Rasch model to put all the performances in all languages onto a single ability scale. This technique was successfully trialled by SurveyLang partner CIEP in June 2008 (CIEP 2008, Jones 2009).

Setting standards against the CEFR

Successful alignment across languages should greatly simplify the final stage of setting standards in terms of CEFR levels: if the alignment is credible, then standards should need to be set only once, for all languages, rather than for each language separately.

However, given that much of the work involved in developing the Survey is either unique or making new use of existing methods, it is appropriate that we should apply and cross-validate a variety of procedures.

Again, Writing performance is the easiest to deal with, because both students and tasks can be put on a measurement scale and their features matched against corresponding features in the CEFR’s descriptor scales. We can characterise a student’s level in terms of how well they can address particular Writing tasks.

Reading and Listening will make use of the task-based standard setting procedure described in the CEFR Manual mentioned above (that is, based on raters’ estimates of whether a student at a given CEFR level should respond correctly or incorrectly to test tasks).

This will be further informed by employing 16 self-assessment Can Do statements in the Student Questionnaire. Such self-assessments have been found useful in previous studies on interpreting CEFR-like frameworks (Jones 2002).

Future developments

A set of tests measuring functional proficiency in five languages and authoritatively linked to the CEFR will be an important outcome of the European Survey on Language Competences. You can read about how the results of the Survey will be analysed on the Analysis and results page.

It will provide a core set of tests of well-known languages to which other languages can readily be aligned. This should facilitate extension of the scope of the Survey in its next iteration (which in principle could include all the languages of Europe).

Other developments considered for the next administration of the Survey are including a Speaking test and the possibility of including additional languages and extending the scales to cover all six levels of the CEFR.