Results of pilot studies involving comparative judgement methods for capturing expert judgement for the purpose of standard maintaining.


Improving awarding: 2018/2019 pilots

Ref: Ofqual 19/6575PDF, 4.65MB, 192 pages


In the move to reformed GCSEs and A levels, our approach to maintaining standards has been for exam boards to prioritise statistics about prior attainment over examiner judgement. Now that the transition is largely complete, we are keen to make sure exam boards are able to detect any changes in student performance over time. In GCSE English and maths we now have evidence from the National Reference Test, but we have also been researching other ways of detecting changes using comparative judgement and/or rank ordering techniques. In summer 2018 we piloted several different versions of comparative judgement and in summer 2019 we ran a live pilot in GCSE English language.

Overall, the results of those pilots suggest that comparative judgement methods are very promising for capturing expert judgement for the purpose of standard maintaining. The totality of the pilots indicate that pooling a sufficiently large number of judgements over most of the mark range can give reliable outcomes and potentially increase the validity of expert judgement in standard maintaining. Further consideration needs to be given to the merits of different designs in operational contexts, and the relative weight such methods might carry in relation to statistical indicators used in standard maintaining.

Published 13 December 2019