Assuring Learning When Students Work With AI
- AI allows students who put in minimal effort to create professional, well-written, comprehensive assignments that are indistinguishable from products submitted by students who did intensive work.
- The University of Newcastle Australia has developed two assessment frameworks that enable faculty to determine how students have used AI to produce work and what they have learned in the process.
- These frameworks provide credible evidence of learning, generate insights schools can use to pursue continuous learning, and prepare students to succeed in an AI-enabled workplace.
When artificial intelligence (AI) can produce sophisticated work in seconds, traditional assessment fails. The result is what we call the attribution void: a growing gap between what students submit and what we can confidently ascribe to their own learning.
Business schools worldwide are wrestling with how to conduct secure assessment in the age of AI. The solution lies in a framework that captures not just what students produce, but how they produce it and what they learn in the process.
When Products Don’t Tell the Story
Early in 2023, at the University of Newcastle in Australia, we made a decision that seemed radical at the time but now feels inevitable. In all Innovation and Entrepreneurship courses, we made AI use mandatory—not optional, not encouraged.
Here’s what happened next. Some students used AI as a sophisticated research assistant, feeding it specific prompts, critically evaluating outputs, and combining insights with their own domain analysis. Others simply asked AI to produce analyses that they turned in with minimal edits. Both types of students submitted professional, well-written, comprehensive products that were difficult to tell apart.
This taught us that traditional assurance of learning (AoL) methods such as essays, reports, and rubrics cannot tell us what students actually learn when they use AI to produce their work. For example:
- Product-only assessment can’t verify the process used or the learning that occurred.
- Proctored exams capture a moment in time, testing mostly recall rather than competence.
- Bans on AI ignore workplace reality, where there will be no separation between AI and non-AI jobs.
- Detection tools create an arms race with significant false-positive risks. In fact, detection misses the point. The question won’t be whether students used AI but whether they learned to use it competently, ethically, and responsibly.
Today’s business schools need a fundamentally different approach, one that can distinguish deep learning from zero learning, even when the products look identical. As Benjamin Stévenin and Björn Kjellander note in a 2025 article, we need verification methods that acknowledge AI’s presence while assuring learning. But how do we assure learning in the age of AI?
Lessons From Medical Education
That question led us to decades-old principles from medical education, where competence cannot be accurately captured by a single score or assessed by a single exam. It needs evidence of development over time.
In their pioneering research on programmatic assessment, medical professors Lambert Schuwirth and Cees P.M. Van der Vleuten urge moving away from high-stakes single assessments toward longitudinal, narrative-based evidence collection. As Schuwirth memorably puts it: “You wouldn’t tell a patient they’re 42 percent healthy.”
AI cannot fake a growth pattern portfolio across a semester. Programmatic evidence is particularly relevant for the AI age because it verifies authentic learning.
Business education faces the same challenge. When we can’t distinguish AI-assisted from AI-generated work, we need to gather evidence of competence growth with AI. We need an approach that transforms assessment of learning into assessment for learning. To achieve this transformation, schools can follow two key principles:
- Longitudinality: Faculty can separate evidence collection from decision-making, gather evidence across the semester through formative feedback, and then make decisions based on all accumulated evidence.
- Proportionality: Faculty can match evidence requirements to decision stakes. Low-stakes formative feedback requires quick assessments; high-stakes final decisions require comprehensive portfolios.
AI can fake a single product, but it cannot fake a growth pattern portfolio across a semester. Programmatic evidence is particularly relevant for the AI age because it verifies authentic learning.
Additional Innovations for Assessment
Programmatic design alone doesn’t solve the AI challenge—it’s just one piece of the puzzle. At the University of Newcastle, we added two further innovations.
The first is our Human-Centric AI-First (HCAIF) pedagogical framework, which guides us through four intentional stages of teaching with AI:
- Preparation (teacher + AI). Instructors select competencies, design learning experiences, and devise assessments aligned with outcomes.
- Personalized learning (student + AI). Students engage with AI content, receive 24/7 AI feedback, and develop metacognition through reflection and iteration.
- Classroom engagement (teacher + student + AI). In flipped classrooms, students engage in experiential learning, while teachers coach them through the AI learning process.
- Summative assessment (teacher + AI). Instructors evaluate outputs, document AI use, and provide comprehensive feedback.
The summative assessment relies heavily on our second innovation, the Person, Process, Product (3P) model. Like satellite triangulation, which uses multiple signals to pinpoint location accurately, our 3P model collects evidence from three separate streams:
Person. Who learned? This dimension assesses a student’s development as a learner and a professional. Students submit structured reflections on the work they did, the domain knowledge they acquired, the growth they achieved over time, the “aha” moments and breakthroughs they experienced, and the personal insights they attained. We look for:
- Critical-creative judgment demonstrated in the work.
- Ethical reasoning about AI use and its implications.
- Metacognitive insight (“What? So what? Now what?”).
- Transferability, or the ability to apply learning in new real-life contexts.
Process. How was the work produced? This dimension makes work visible, auditable, and educational. We assess transparency, decision-making, reproducibility, collaboration, problem-solving, and AI governance through four types of documentation:
- Promptbooks, which track key prompts, version history, and iterations with rationales for changes. (We coined the term internally to mean structured interaction logs where students describe their evolving interactions with generative AI.)
- Decision Ledgers, which record how students judged options, made critical choices, and weighed trade-offs. These ledgers separate AI suggestions from student decisions.
- Reproducibility documentation, which describes steps taken, files used, and methods followed. It also compiles evidence and references.
- AI provenance statements, which provide declarations of purpose, descriptions of how AI was used, and discussions of its limits.
Product. What was created? Only after reviewing Person and Process evidence do we evaluate the Product. This timing matters. We assess quality and accuracy, originality and innovation, real-world impact potential, and stakeholder value. A perfect AI-generated report with no documentation and weak reflection fails. An imperfect product showing innovation and stakeholder value, with strong process documentation and demonstrated growth, passes.
Two Students, Two Journeys
To illustrate how our assessment strategy works in practice, here are two hypothetical students in our entrepreneurship major, both tasked with developing a business model for a sustainable coffee shop venture.
Student A used AI as a research assistant and thought partner during the personalized learning phase. According to her Promptbook, she first asked AI to analyze the coffee market. When that returned only generic statistics, she crafted a new prompt that incorporated demographic data from the region and asked AI to identify underserved segments. She wrote, “That’s when I discovered the gap in specialty coffee for health-conscious professionals.”
In her Decision Ledger, she evaluated AI-generated pricing strategies against local competitor data she gathered herself, leading her to reject AI’s premium pricing recommendation of 7 USD. “My customer interviews revealed our target segment caps discretionary coffee spending at 5.50 USD. I adjusted the model to hit that price point while maintaining margins through operational efficiency.”
Our assessment framework doesn’t punish AI use. It rewards thoughtful AI co-creation while revealing when AI has substituted for rather learning, rather than supported it.
During the classroom engagement phase, Student A presented her venture to her peers. When challenged on her sustainability claims, she acknowledged limitations, saying, “I couldn’t verify carbon footprint claims for two suppliers, so I built contingency sourcing into the model.”
Her Person evidence reflection revealed genuine learning. “Week 3, I realized I’d been accepting AI outputs without questioning assumptions. The market size AI estimated assumed urban density patterns that don’t apply to our regional location, so I had to recalculate from local census data. This taught me that AI provides starting points, not answers.”
By contrast, Student B put in minimal effort and achieved minimal learning. His personalized learning phase consisted of a single prompt in which he asked AI to write a business model for a sustainable coffee shop. His Decision Ledger was sparse and showed no evidence that he questioned AI’s assumptions, evaluated alternatives, or integrated course concepts into real-life situations.
During the classroom engagement phase, he couldn’t explain why he selected a subscription model or how it aligned with the purchasing behavior of his target segment. His Person reflection included generic phrases such as “I learned about the importance of market research and empathy in entrepreneurship.” It lacked genuine insight and revealed no “aha” moments, no real-life struggles, no growth, and no personal voice.
Because both students submitted professional-looking business models, they might have received similar grades under traditional product-only assessment. Under 3P assessment, however, Student A passed and Student B failed.
The framework doesn’t punish AI use. It rewards thoughtful AI co-creation while revealing when AI has substituted for learning, rather than supported it.
The Learner Perspective
So far, 90 percent of students have shown either enthusiastic adoption of our new models or thoughtful ambivalence that itself demonstrates learning. Among the positive comments were “GenAI forced me to validate outputs manually, which deepened my understanding” and “AI became a thinking partner, not a shortcut. It sharpened my judgment.”
A more ambivalent student said, “AI helped me think through valuations, but it also made me lazy at first until I realized I had to interrogate it.” Another added, “Before this course I thought AI was unreliable. Now I see its value, but only if I question it critically.”
Asking students to explain, defend, and extend their work separates genuine understanding from surface-level outputs, and this is something that cannot be outsourced to AI.
Even students who expressed concerns demonstrated metacognitive awareness. Said one, “Sometimes it didn’t feel like my work anymore, just editing machine output.” This recognition alone is evidence of learning.
What Comes Next
Currently, we collect 3P evidence manually, which can be overwhelming. We are developing an AI platform to automate evidence collection, provide formative feedback, measure progress, and generate teaching insights. But, and this is critical, humans remain in control of the process.
The platform captures student interactions, provides real-time feedback, creates Promptbooks and Decision Ledgers, flags issues for instructor attention, and generates analytics showing competency progression. It doesn’t make final judgments about student learning. That remains the irreplaceable role of expert educators.
Additionally, oral defenses still matter, because the HCAIF classroom teacher-student experience is fundamental. Asking students to explain, defend, and extend their work separates genuine understanding from surface-level outputs, and this is something that cannot be outsourced to AI.
From Compliance to Credible Assurance
The traditional AoL routine—collect products, score rubrics, aggregate numbers, write reports—can’t withstand the challenge of AI. But HCAIF and 3P evidence are up to the task. Together, they verify attainment through triangulated, longitudinal evidence. They clarify authorship without banning tools. And they generate actionable insights schools can use to pursue continuous improvement.
Just as important, these assessment strategies prepare students for an AI-enabled workplace where they will need to document processes, justify decisions, and demonstrate learning. Students become exactly what employers need: workers who can combine AI capability with irreplaceable human judgment, ethics, and creativity.
Yes, schools that adopt these strategies will need to train faculty in new kinds of programmatic assessment, orient students to new expectations, acquire the technology to automate evidence collection, and gather the courage to lead. But they’ll gain an intentional programmatic approach that delivers credible AoL for the AI era. And they’ll develop assessments that support learning rather than just measuring it.
I am grateful to the faculty and students at the University of Newcastle Business School Australia, and to my colleagues Timothy Hor and Vishal Rana, who have been instrumental partners in developing and refining this approach.