Pilot testing and Pretesting

Language Testing Pilot Study and Pretesting

Piloting and pretesting are important stages in the Language Test development process. Each of these stages contributed to the preparation for the Field Trial in February/March 2010.

A Pilot Study was carried out in October 2008 and had the key goals of:

obtaining feedback on the proposed language test task types and test design
testing SurveyLang’s innovative and collaborative item writing and test production processes.

The Pilot Study therefore placed more emphasis on testing the concepts than the actual test items.

The Pilot Study

To achieve the goals above, a small-scale Pilot Study was carried out with a number of language schools and institutions across Europe. A total of 34 tests across the five languages were constructed and trialled:13 reading tests, 9 listening and 12 writing.

The tests covered each of the five languages to be assessed – English, French, German, Italian and Spanish and they followed the proposed format for the survey, being 30 minutes in length. The Pilot Study also trialled the use of routing tests.

This testing was carried out in parallel with the collection of qualitative feedback on task types from stakeholders such as National Research Coordinators and the European Commission, as well as from teachers in the Pilot Study schools.

Item writing processes

The SurveyLang language testing group worked together closely to develop test items across the 5 languages not only to the same timescale, but importantly also to the same specification and level of difficulty.

SurveyLang developed new collaborative processes to meet these demands and the Pilot Study provided an important opportunity to trial the success of these processes.

An important element in this process was cross-language vetting. In cross-language vetting, tasks from each language are vetted by at least two other language partners in addition to vetting by the original language partner. Experienced, multi-lingual item writers vetted tasks from other languages to ensure that the tasks, items and options were operating correctly and were of a comparable level of difficulty to tasks in other languages. Cross-language vetting promoted a lot of useful discussion among the language test development team and was extremely helpful in encouraging the sharing of best practice. Cross-language vetting also received a lot of positive feedback from stakeholders.

Pilot Study outcomes

Analysis of the Pilot Study data contributed to the finalisation of the test specifications and task types. These were subsequently agreed with the Commission, participating countries and other important stakeholders before the main item writing work started in January 2009 in preparation for pretesting.

Being able to confirm the practical viability of the item-writing process developed for the project was an important part of the Pilot Study, but this Pilot Study also produced other important findings, including:

adapting tasks across languages is a practical means of ensuring comparability
the proposed task types will work with the target student population
the proposed range and types of topics, texts, graphics, etc. are appropriate for the target students
the level of difficulty of the various tasks used in the Pilot Study appears to be appropriate for the students. There are grounds for confidence that the level of difficulty of all new tasks created, once pretested and Field Trialled, will be also be appropriate.

Pretesting

Pretesting focuses on extensive analysis of the level and quality of test tasks and items.

Following the sign-off of test specifications and task types, a lengthy process of item writing, editing and cross-language vetting was undertaken. A total of 145 pretests across the five languages have been constructed to test reading, listening and writing. Schools in countries participating in the Survey took the pretests in October 2009 together with other selected educational institutions. Following the pretesting session, extensive analysis of the level and quality of test tasks and items was carried out. Further editing of tasks was carried out, with the best quality tasks selected for the Field Trial and Main Study.