Navigation auf uzh.ch

Suche

Department of Finance AI-assisted Grading

Knowledge Base

To comprehensively understand the current landscape of open-ended questions in educational assessment at University of Zurich, our research was designed as a multi-method investigation that would capture both quantitative insights and qualitative nuances. The research approach consisted of two main components: a survey and qualitative interviews. Additionally, we performed market research to identify existing platforms that support AI-assisted grading, and classified the functionalities they offer according to our own framework.

Scientific Background

The cornerstone of effective AI-assisted assessment lies in a deep understanding of both its theoretical underpinnings and practical challenges. From cognitive development theories that emphasize higher-order thinking skills to the complexities of grading consistency and potential biases, a wealth of research informs our approach. Explore this foundation to understand the importance of open-ended questions, the difficulties in their assessment, and how AI can offer promising solutions while upholding pedagogical value and human oversight.

Scientific Background

Survey

The survey, titled "P8: Correction of open-ended exam questions through automatization" was completed by 45 participants. It covered various aspects including participant demographics, exam conduct methods, motivations for using open-ended questions, challenges in grading, current use of grading software, and perceived importance of various software features for grading.

The survey provided several key findings:

  • The main advantage of open-ended questions was better assessment of students' knowledge.
  • The greatest challenge was the considerable workload of grading.
  • 37% of respondents had used software for grading open-ended questions.
  • The most important software feature was the ability to specify consistent correction criteria (rubrics).
  • The survey also included open-ended responses, with some participants expressing skepticism to-wards automated solutions.

Lecturer Survey

Interviews

Following the survey, five qualitative interviews were conducted with faculty members of the University of Zurich. These interviews focused on specific grading processes, awareness of biases in grading, criteria used when grading exercises, attitudes towards assisted grading tools, perceptions of automated grading features, concerns about using automated features, and desired features for grading software.

The interviews revealed several key findings:

  • All participants worked in teams for grading large exams.
  • Most preferred grading exercise-by-exercise rather than student-by-student.
  • Various biases were mentioned, including contrast effects and halo effect.
  • Four out of five interviewees would definitely use an assisted grading tool.
  • Participants were open to automated features but emphasized the need for human oversight.

Interviewees also provided insights into their grading processes for different types of questions. For text exercises, some explicitly looked for keywords, while for quantitative exercises, the approach varied, with some first checking the final result and others examining the derivation from top to bottom.

The digitization of exams was perceived as a challenge, with a suggestion to outsource this task to student assistants. Participants also suggested several features they would like to see in grading software, including adaptive learning, access to student grades, text recognition, visual feedback during grading, statistics, keyword extraction-based features, and automatic plot comparison.

Lecturer Interviews

Grading Framework

The grading framework defines three distinct approaches to assessment that represent different levels of automation and human involvement in the grading process. These approaches - Assisted Grading, Semiautomated Grading, and Automated Grading - form a spectrum from minimal to maximal automation while considering the critical balance between efficiency and control. Starting with Assisted Grading that focuses on workflow improvement while maintaining full human control, moving through Semiautomated Grading that incorporates AI suggestions with human oversight, and culminating in Automated Grading for scenarios with clearly defined criteria, this framework provides a comprehensive structure for understanding different levels of AI integration in assessment processes.

Grading Framework

AI-assisted Assessment Lifecycle

The AI-assisted assessment lifecycle outlines five distinct phases of assessment and highlights opportunities for AI integration to improve efficiency and effectiveness at each stage. From assessment design and exam preparation to exam execution, grading, and review, AI offers tools to support educators while maintaining human oversight. In the assessment design phase, AI can suggest appropriate question formats and validate assessment structure. During exam preparation, AI can generate questions and assist in developing rubrics.

AI can also support exam execution through real-time monitoring and automated technical support. The grading phase benefits from AI through automated scoring, similarity-based answer ordering, mathematical expression parsing, and intelligent workload distribution. Finally, the review phase leverages AI for personalized learning recommendations and identification of common misconceptions. By addressing key areas across the entire assessment lifecycle, AI has the potential to enhance the assessment process.

AI-assisted Assessment Lifecycle

Overview of Features

Our overview of the features we have analyzed in existing applications and evaluated for future development can be found in our knowledge base.

Market Research

Our overview of the tools we have evaluated and the functionalities we identified can be found in our knowledge base.

Practical Considerations on the EU AI Act

The evolving regulatory landscape presents important considerations for the development and deployment of AI in educational assessment. Explore the key implications of the EU AI Act, which classifies AI systems for student evaluation as high-risk. Understand the core requirements surrounding data governance, human oversight, transparency, and risk management that institutions must address to ensure responsible and compliant use of AI in grading. Proactive engagement with these regulatory frameworks is essential for ensuring fairness, accountability, and ethical AI practices in education.

Practical Considerations on the EU AI Act

Practical Considerations on Handwriting Recognition

The digitization of handwritten exam responses presents significant technical challenges that impact the effectiveness of AI-assisted grading. While Optical Character Recognition (OCR) and handwriting recognition technologies offer potential solutions, their reliability in exam settings remains limited due to factors like stress-induced poor handwriting, non-linear responses, and complex formatting. Recent developments in Large Language Models show promise, but current limitations have led us to focus on digital assessment formats to ensure optimal system performance. Understanding these constraints helps inform practical decisions about assessment delivery methods and the scope of AI assistance in grading.

Practical Considerations on Handwriting Recognition

Conclusion

The combination of the survey and interviews provided both quantitative data on trends and preferences, and in-depth qualitative insights into individual experiences and attitudes towards automated grading of open-ended questions. This mixed-method approach allowed for a comprehensive exploration of the topic, offering a nuanced under-standing of the current state of open-ended question grading and the potential reception of automated grading tools in the academic setting.

Our survey of 45 participants and interviews with five faculty members revealed that while open-ended questions are crucial for assessment, their grading presents significant workload challenges and consistency concerns. While AI-assisted grading tools exist on the market, our analysis showed severe limitations in their ability to process longer text responses effectively. This gap is particularly problematic as complex open-ended questions are essential for assessing higher-order thinking skills.

Grid containing content elements

Bereichs-Navigation