Pretest Construction: 生成先期测试题
Pretesting: 进行先期测试
IELTS pretests are very similar to the tests that will be used in live administrations. The tasks are in their final form including task rubrics (instructions) and examples. Listening pretests are professionally recorded to ensure that they are of acceptable quality. Listening and Reading pretests are administered to IELTS candidates at selected centres or to prospective candidates on IELTS preparation courses. The pretests are marked at Cambridge ESOL and statistically analysed. Writing and Speaking pretests are administered to representative samples of candidates to assess the appropriateness of this material for use in live tests, and to establish that the tasks are capable of eliciting an adequate sample of language to allow for the assessment of candidates against the scoring criteria.
解读:一旦编写工作初步完成,先期测试题就将生成并进行现场测试以检验试题是否达到标准。先期测试题和实际的考试用题非常相似,其中听读两个部分会在选定的考点或正参加雅思备考课程的潜在考生中进行,而说和写则通过代表性的考生样本群体来测试,为试题效果分析提供数据。
Pretest Review:先期测试评估
The Validation Unit at Cambridge ESOL collates and analyses the pretest material.
Listening and Reading pretests
All candidate responses are analysed to establish the technical measurement characteristics of the material, i.e. to find out how difficult the items are, and how they distinguish between stronger and weaker candidates. Both classical item statistics and latent trait models are used in order to evaluate the effectiveness of the material. Classical item statistics are used to identify the performance of a particular pretest in terms of the facility and discrimination of the items in relation to the sample that was used. Rasch analysis is used to locate items on the IELTS common scale of difficulty. In addition, the comments on the material by the staff at pretest centres and the immediate response of the pretest candidates are taken into account.
At a pretest review meeting, the statistics, feedback from candidates and teachers and any additional information are reviewed and informed decisions are made on whether texts and items can be accepted for construction into potential live versions. Material is then stored in an item bank to await test construction.
Writing and Speaking pretests
Separate batches of Writing pretest scripts are marked by IELTS Principal Examiners and Assistant Principal Examiners. At least two reports on the task performance and its suitability for inclusion in live versions are produced. On the basis of these reports, tasks may be banked for live use, amended and sent for further pretesting or rejected.
Feedback on the trialling of the Speaking tasks is reviewed by experienced examiners, who deliver the trialling tasks, and members of the item writing team who are present at the trialling sessions. The subsequent reports are then assessed by the paper chair and Cambridge ESOL staff.
解读:在进行先期测试采集分析数据的基础上,剑桥考试委员会试题复核小组(Validation Unit)将汇总各项测试数据进行分析。所有考生的回答都将进行分析以确认试题的技术特征——也即试题的难度及区分度——而这将用到一系列专业的数据分析统计方法和工具。通过验证的试题则将被收入题库,用于将来正式的考试。
Banking of Material: 试题入库
Cambridge ESOL has developed its own item banking software for managing the development of new live tests. Each section or task is banked with statistical information as well as comprehensive content description. This information is used to ensure that the tests that are constructed have the required content coverage and the appropriate level of difficulty.
解读:剑桥大学考试委员拥有自己专门的题库软件用于管理新的试卷生成。每个考试项目在题库中都备注有详细的内容说明及统计分析信息,用以确保所生成的题目在涉及的内容和难度水平上符合要求。
Standards Fixing Construction: 评分标准校正
Standards fixing ensures that there is a direct link between the standard of established and new versions before they are released for use at test centres around the world.
Different versions of the test all report results on the same underlying scale, but band scores do not always correspond to the same percentage of items correct on every test form. Before any test task is used to make important decisions, we must first establish how many correct answers on each Listening or Reading test equate to each of the nine IELTS bands. This ensures that band scores on each test indicate the same measure of ability.
解读:新的现场考试题目生成并全球发布之前,评分标准校正环节的存在确保了新的考试与已经进行过的考试之间在评分标准上有着直接联系。由于每次不同的考试的结果都同样反映在雅思考试的9分制评分体系上,评分标准校正环节确保了每次考试虽然分数段对应的正确率不一定相同,但同样的总分能够反映出同样的能力。
Live Test Construction and Grading: 生成现场测试
Live Test Release: 发布现场测试
At regular test construction meetings, Listening and Reading papers are constructed according to established principles. Factors taken into account are:
· the difficulty of complete test versions and the range of difficulty of individual items
· the balance of topic and genre
· the balance of gender and accent in the Listening versions
· the balance of item format (i.e. the relative number of multiple choice and other item-types across versions)
· the range of Listening/Reading skills tested.
The item banking software allows the test constructor to model various test construction scenarios in order to determine which tasks should be combined to create tests that meet the requirements.
Data are collected routinely from live administrations and analysed both to confirm the accuracy of the initial grading process and to support additional investigations into quality assurance issues.
解读:现场考试题目根据既定的原则,在定期的试题生成会议上,雅思考试的题目根据难度、平衡度以及广度等方面的5大原则最终被生成并发布,这其中题库软件根据场景的要求可以帮助试题生成人员确定特定题目的组合。
对于广大考生而言,以上对于雅思考试官方试题开发流程的解密意味着什么?笔者认为,意义有三。
Point of Interest One:
根据官方公布的试题开发流程我们可以发现,由于雅思考试每年只进行1到2次新试题素材的委托编选,则意味着每年只会有一次到两次的试题库更新的可能,而且还只是部分的。这告诉我们雅思考试的试题的确是相对稳定的一个库,而对于机经的学习和掌握可以让我们对于短则半年,长到一年的范围内的考题有相对的熟悉,从而帮助考生备考,提高考试成绩。
Point of Interest Two:
同时,由于试卷开发流程中明确表明存在“评分标准校正”这一环节,目前国内考生中广为流传的雅思考试难度变化、评分标准变化、特定时间考试比其他考试时间更难/更易、某次考试特别难/易……等等说法,都属“庸人自扰”。事实上,由于这一校正环节的存在,即便某次考试的确有着更高一点的难度,但评分标准确定的结果则是可以以较低的正确率得到同样的最终分数,也就体现出来对能力测试的一致性。所以,朗阁海外考试研究中心建议广大考生还是应该关心自己应该关注的事情,提高语言能力,熟悉考试方法,而不是胡乱猜忌,平添烦恼。
Point of Interest Three:
我们还可以注意到,在雅思考试的试题开发的过程中,筛选、编选合乎雅思要求,合乎A类G类题材、长度、难度、文体等等方面标准的内容才会被选择留下,再经由严格的步骤最终完成一组试题的编写。所以朗阁海外考试研究中心提醒广大考生,一定不要在雅思备考过程中盲目选择应试材料,病急乱投医,这样做的后果很可能是你花了时间付出了金钱却走了弯路。忠告大家,一定要选择有实力的专业的培训机构根据雅思考试官方要求编写的材料,正确备考。