Assessment in a Post-AI Era
Much of the faculty development work around AI currently underway focuses – understandably – on assessment. For many students, assessment is the curriculum and assessment drives learning. And so we have a lot to say about AI and assessment!
As teaching and learning leaders, we know what well-designed and well-supported assessments can do to motivate and support learning. And the unexpected delight in the generational challenge of AI for teaching and learning? AI is motivating the change to assessment that teaching and learning leaders have been calling for for years. Or as Michael Fullan puts it in this podcast episode “AI is the accelerator of change, but pedagogy is the driver.”
We approach the discussion of assessment in the post-AI era with the following assumptions and constraints:
- (re)designing assessments should be first about the student learning experience, and that can also include planning for AI
- There are very few ways (if any) to design an assessment that an AI tool cannot do
- There is not a reliable and fair means to detect AI generated content
- Validating and verifying individual student’s achievement of learning outcomes is a core responsibility of the institution
With those constraints and assumptions in mind, let’s turn to describing the problem and possible approaches.
The Problem for Assessment in a Post-AI Era
There are very few ways (if any) to design an assessment that an AI tool cannot do. Sure you can ‘trick’ a student by white-fonting, or you can try using an AI-detector (despite knowing in your heart that these detectors do not work very well for any student interested enough to cheat). You can create assessments that may be more challenging for an AI tool to complete (with fewer ways to do this as models improve). But the moment for assigning students homework or an assessment and having them complete it without the use of AI has passed.
We have heard anecdotally that “students at [our] institution are not using AI for their assignments.” And of course, of course, not all students are using AI for their assignments. But many of them are, and their use – and the inability to prevent or detect their use – makes the reliability of that assessment at best questionable. That some are using AI and some aren’t undermines the fundamental premise that assessment scores reflect comparable demonstrations of knowledge or skill. Put another way, the uneven playing field distorts the accuracy of our assessment because the student results likely reflect disparities in AI access and proficiency rather than authentic differences in student learning. How then are we to draw valid conclusions about individual achievement or make fair comparisons across the group?
Given these two constraints: (1) assessments cannot be designed to be “AI-proof” and (2) many students are using AI for their assessments, we arrive at the conclusion that unless the assessment is invigilated, educators should assume students will use AI for their assessments and plan the assessment accordingly.
So what to do with assessment: the two-lane approach
Helpfully, the University of Sydney has developed a framework, the “Sydney Assessment Framework” that explores this premise and offers an approach. U of Sydney describes their Framework as follows:
The new Sydney Assessment Framework categorises each assessment according (i) to their role as assessment of, for or as learning and (ii) how they are delivered and adjustments are applied. As shown in Table 1, it aligns with the ‘two-lane approach’ to assessment in the age of generative AI through the appropriate use of ‘secure’ assessments where the use of AI can be controlled (Lane 1), and the development disciplinary knowledge, skills, and dispositions alongside AI through ‘open’ assessments (Lane 2). This categorisation aims to cover all assessments at Sydney.
It is designed to ensure that constructive alignment within programs and across units is both possible and reliable, so that we can demonstrate that our graduates genuinely have met the outcomes and have the knowledge, skills, and dispositions that we state that they do, whatever pathway they take through an award and whatever their individual needs are.”
They also offer this table summarizing the different approaches:
Secure (Lane 1) | Open (Lane 2) | |
Role of assessment | Assessment of learning | Assessment for and as learning |
Level of operation | Mainly at program level | Mainly at unit level |
Assessment security | Secured, in person | ‘Open’ / unsecured |
Role of generative AI | May or may not be allowed by examiner | As relevant, use of AI scaffolded & supported |
TEQSA alignment | Principle 2 – forming trustworthy judgements of student learning | Principle 1 – equip students to participate ethically and actively in a society pervaded with AI |
Assessment categories |
|
|
The two-lane approach to assessment balances validation of student learning with developmental opportunities in an AI-rich educational landscape. Lane 1 focuses on “assessment of learning” through secured, in-person assessments at the program level that verify students have met learning outcomes without prohibited AI assistance. Lane 2 emphasizes “assessment for and as learning” primarily at the course level, with open, unsecured assessments where students can appropriately engage with AI tools as part of their learning process. This framework acknowledges both the need to validate genuine student achievement (Lane 1) and the importance of preparing students to use AI ethically and productively in their future careers (Lane 2).
How to Introduce the Two-Lane Approach at Your Institution
If you’re persuaded that adopting the two-lane approach to assessment is reasonable (at least for now), you might then ask yourself: how do I bring this approach to my institution?
Consider the earlier discussion of existing faculty development initiatives, but when it comes to rescaling assessment, we’d also like to point to a few examples of focused assessment (re)design initiatives that may be worth considering:
Assessment (re)Design Initiatives
- Host an Assessment Institute or Assessment Intensive – there are a few examples of these from across the country with the common approach being one of several days where educators work through theoretical ideas around assessment design and then actively redesign their assessments (sometimes with input from peers or student partners), leaving the intensive with redesigned assessments completed. Both the University of Guelph, Queens University and the University of Toronto have developed assessment design institutes focused on AI.
- You could consider using the Assessment Partner, built by McMaster, a tool that faculty can use to use AI to support practical redesign of existing assessments or creating new ones.
Plan and Implement Change at the Program Level
Single assessment redesign or assessment redesign within a selection of courses will only get us so far. We are interested in seeing how institutions begin to approach whole program redesign with assessment in mind.
Practically, this might mean using your institutional program review process to begin to have programs consider both AI in program learning outcomes and to have them look at curriculum mapping from the perspective of where secured, in-person assessments could take place across the curriculum. This might mean identifying one assessment in each course that is strategically linked to a program learning outcome and that will have secured assessment. Or it could mean looking across all required courses in the curriculum and identifying where PLOs could reasonably be securely assessed.
Understandably the introduction of secured assessments raises questions and concerns about the impact on wellbeing and learning of high-stakes assessment, the relative authenticity of these assessments, and barriers for students with disabilities. While some of these secured assessments will need to be invigilated test experiences, there is scope for flexibility – we hope – where the secured assessment could be a choice among a range of options, a cumulative portfolio that is presented and discussed, or a demonstration with observation.
We do have an imperative – to our students, to our funders, to our communities, to our employers – to offer some assurance that graduates of our programs have achieved the program learning outcomes. We cannot rely on honour codes, or the creativity or dedication of individual faculty members in assessment redesign or academic integrity practices, or on another technological development coming to save us.
We need to lead our institutions in the immediate and effortful work of whole-program, secure assessment. This work is challenging and you won’t get it right immediately. Expect resistance from faculty concerned about increased workload, students worried about high-stakes testing, and administrators questioning resource requirements. Some pilot programs will fail, some assessments will prove impractical, and some faculty will refuse to participate. Plan for iteration and adjustment rather than perfect implementation. And we’d be very grateful to hear here examples of where this is beginning to take place.
Why Not Just Better Assessment Design?
Before embracing the two-lane approach, many educators reasonably ask: why not simply design better assessments that emphasize critical thinking, creativity, and authentic application (the kinds of complex tasks that AI tools might struggle with)? This impulse makes sense and reflects what we want to do as educators. The challenge is that as AI models improve, the range of “AI-resistant” tasks continues to shrink. More fundamentally, asking faculty to out-innovate rapidly advancing technology places an unsustainable burden on individual educators and misses the larger pedagogical opportunity and community responsibility.
Rather than viewing AI as an assessment problem to solve, we can reframe it as a catalyst for assessment practices that many teaching and learning leaders have advocated for years: authentic, performance-based evaluation that connects to real-world contexts where students will actually use these tools, and prioritizes collaboration, inquiry, authentic engagement, and reflection.
For just one great example of this – see the recent work out of California the “Peer & AI Review + Reflection” framework and set of tools.
What Are We Actually Assessing?
The more serious concern about AI and assessment centers on learning itself. If students can use AI to complete assignments, are they actually developing the knowledge and skills we intend? And more urgently, are these the knowledge and skills we want or need them to learn?
The answer depends partly on what we’re trying to measure. If our goal is to assess students’ ability to write a five-paragraph essay from memory, then AI assistance undermines that assessment. But if our goal is to evaluate students’ ability to analyze complex information, construct persuasive arguments, and communicate effectively, skills they’ll need in their careers – and again, more urgently, skills they need to thrive as active participants in our communities – then redoubled care to what we are trying to have our students learn and why has never been more important.
This shift also requires us to closely analyze our scoring criteria and rubrics to determine if we are grading and providing feedback on what we intend to assess. Although it may be tempting to adjust rubrics to assess higher-order thinking (e.g., evaluation, synthesis), this is only appropriate if the course learning outcomes and assessment design match that level of thinking. The Oregon State Revision to Bloom’s Taxonomy attempts to distinguish human skills and AI capabilities in a revised version of Bloom’s taxonomy, which may be helpful to consult when designing assessments and evaluation criteria.
At What Cost
Changing our approaches to assessment requires significant institutional investment: faculty development, secure testing infrastructure, program-level curriculum mapping, and ongoing evaluation. In resource-constrained environments, leaders may reasonably ask whether this effort produces better educational outcomes than other possible investments.
We acknowledge this concern and recommend starting with pilot programs that can demonstrate value before scaling. Begin with one or two departments willing to experiment with program-level assessment mapping. Use existing professional development structures rather than creating entirely new ones.
Against these costs we weigh our students in what they learn, why they learn it and how they demonstrate it. And we weigh the trust of our communities in what our graduates can confidently do, know and value.