From education to employment

AI in Assessment: What Ofqual Is Really Signalling

Kavitha Ravindran

Why trust, not technology, is emerging as the defining challenge as AI adoption accelerates across the sector

The debate around artificial intelligence in assessment is often framed in binary terms: either AI will transform marking, or it will undermine the integrity of qualifications. Recent signals from Ofqual suggest a far more nuanced position, one that is less about capability and more about system design.

While Ofqual’s formal guidance on AI in marking was released some time ago, the conversation has not stood still. If anything, it has accelerated. The rapid adoption of generative AI tools by learners, the growing complexity of assessment models, particularly in vocational and apprenticeship pathways, and increasing scrutiny around fairness and integrity have all brought this issue back into sharp focus.

In that context, Ofqual’s position feels more relevant now than when it was first published. It provides regulatory principles for AI that the sector is only just beginning to fully grapple with.

At the centre of Ofqual’s approach is a clear principle: AI cannot be the sole marker in regulated qualifications. This is not a rejection of technology. Rather, it reflects a deeper concern about what makes assessment trustworthy.

Assessment systems do not operate in isolation. They underpin a broader social contract between learners, providers, employers and the public. A qualification is only as valuable as the confidence placed in it. From this perspective, Ofqual’s emphasis on human judgement, transparency and fairness is not a constraint, it is a safeguard.

What is becoming increasingly clear is that the question is no longer whether AI can replicate aspects of marking. In many cases, it already can. The more important question is whether the decisions made within an AI-supported system can be understood, challenged and trusted.

This is where the current tension lies. AI systems can deliver consistency at scale, but they often struggle with explainability. They can produce outcomes, but not always the reasoning behind them, in a way that aligns with established marking practices. In a regulated environment, that gap matters, not just technically, but institutionally.

It matters because assessment decisions must be:

  • defensible to learners
  • transparent to providers
  • credible to employers
  • and accountable to regulators

Without this, even highly accurate systems risk undermining confidence rather than strengthening it.

Ofqual’s position implicitly reframes the role of AI. Rather than acting as a replacement for human examiners, AI is better understood as a support layer within the assessment system. Its strengths lie in areas such as:

  • identifying patterns across large volumes of responses
  • supporting quality assurance and moderation processes
  • providing indicative or formative feedback in low-stakes contexts

This reframing has important implications for awarding organisations and assessment providers. It suggests that the challenge is not simply adopting AI tools, but designing systems where human expertise and AI capability are deliberately integrated.

Human-in-the-loop models are often cited as the solution. However, this concept requires more precision. Simply inserting a human reviewer at the end of an automated process does not address the underlying issue. In some cases, it can create a false sense of assurance, where oversight exists in theory but not in practice.

The more meaningful question is where and how human judgement is applied within the system:

  • At what point are decisions escalated?
  • How are edge cases identified and handled?
  • What level of confidence is required before automation is used?
  • How are disagreements between AI outputs and human judgement resolved?

These are design questions, not technical ones. And they require a level of system thinking that goes beyond individual tools or solutions.

For the further education and skills sector, this distinction is particularly important. Assessment models are evolving. In apprenticeships, for example, there is a clear shift towards more continuous, evidence-based assessment. This brings greater authenticity, but also greater complexity.

As the volume and variety of evidence increases, written responses, observations, portfolios, and potentially multimedia submissions, the pressure on assessment systems grows. Consistency becomes harder to achieve. Quality assurance becomes more resource-intensive. The margin for variability widens.

In this context, AI is not simply an innovation, it is increasingly a necessity. But necessity does not remove the need for careful design. If anything, it heightens it.

Used well, AI can support the system by:

  • reducing variability in how evidence is interpreted
  • surfacing insights that may not be visible at scale
  • enabling more timely and consistent feedback

Used poorly, it risks introducing new forms of opacity, bias or over-reliance on automation.

This is why Ofqual’s stance matters. It anchors the conversation in first principles: validity, fairness and trust. It reminds the sector that assessment is not just a technical process, but a judgement-led system with real-world consequences.

Seen in this light, Ofqual’s position is less a restriction and more an invitation. It invites the sector to move beyond experimentation and towards intentional system design.

The next phase of AI in assessment will not be defined by what the technology can do. It will be defined by the choices the sector makes about how that technology is integrated, where human judgement sits, how decisions are explained, and how trust is maintained.

The future of assessment will not be determined by whether AI is used, but by whether the systems we build continue to deserve the confidence placed in them.

By Kavitha Ravindran, Co-founder & Chief Growth Officer, sAInaptic


Responses