What is it about?
Artificial intelligence tools such as ChatGPT are increasingly being considered for grading student assignments. They can provide feedback quickly and apply assessment criteria consistently. However, consistency does not necessarily mean that the grades are accurate or fair. This study compared grades awarded by ChatGPT with those assigned by human instructors for 61 undergraduate assignments in Political Science and Public Administration. Both ChatGPT and the instructors used the same grading rubrics. The results showed that ChatGPT generally ranked stronger and weaker assignments in a similar order to the instructors. However, it consistently awarded substantially higher marks and produced a narrower range of grades. ChatGPT tended to reward clear structure, fluent writing, good formatting, and well-organised presentation. It was less effective at identifying weaknesses in analytical depth, theoretical engagement, originality, and critical reasoning. Even when its written feedback recognised these limitations, the numerical penalties were often too small. The findings suggest that ChatGPT can be useful for providing formative feedback, helping students improve early drafts, and supporting instructors with preliminary reviews. However, it should not replace academic judgement when final grades are awarded. Universities should use a carefully calibrated human–AI approach in which instructors retain responsibility for summative assessment, fairness, and academic standards. Perspectives AI can support educators, but it should not become the final decision-maker in academic assessment. Our study shows a clear distinction between grading consistently and grading rigorously. ChatGPT can recognise structure, fluency, and presentation effectively, yet it may be overly positive and insufficiently sensitive to deeper intellectual qualities such as critical analysis, originality, and theoretical reasoning. The practical message is not to reject AI, but to use it responsibly. ChatGPT is most valuable as a diagnostic and formative tool: it can provide rapid feedback, highlight areas for improvement, and reduce routine workload. Final grades, particularly for analytical assignments, should remain under the supervision of qualified instructors. A responsible human–AI assessment model can combine efficiency with academic integrity, provided that institutions establish clear policies, regular calibration procedures, transparency, and appropriate safeguards for student data.
Featured Image
Photo by AFINIS Group ® - AFINIS GASKET® Production on Unsplash
Why is it important?
Universities are under growing pressure to assess student work efficiently while maintaining fairness and academic standards. Although ChatGPT can review assignments quickly and apply rubrics consistently, this study shows that it may also award marks that are systematically higher than those given by instructors. This creates a risk of grade inflation and may weaken the credibility of academic assessment. The issue is particularly important because clear writing and good formatting are not always evidence of deep understanding, critical thinking, or originality. If AI-generated grades are accepted without human oversight, students may receive marks that do not accurately reflect their academic performance. The study therefore provides a practical message for universities: ChatGPT can support formative feedback and reduce routine workload, but final grading decisions should remain with qualified instructors. A carefully governed human–AI assessment model can improve efficiency without compromising fairness, accountability, or trust in educational qualifications.
Perspectives
AI can support educators, but it should not become the final decision-maker in academic assessment. Our study shows a clear distinction between grading consistently and grading rigorously. ChatGPT can recognise structure, fluency, and presentation effectively, yet it may be overly positive and insufficiently sensitive to deeper intellectual qualities such as critical analysis, originality, and theoretical reasoning. The practical message is not to reject AI, but to use it responsibly. ChatGPT is most valuable as a diagnostic and formative tool: it can provide rapid feedback, highlight areas for improvement, and reduce routine workload. Final grades, particularly for analytical assignments, should remain under the supervision of qualified instructors. A responsible human–AI assessment model can combine efficiency with academic integrity, provided that institutions establish clear policies, regular calibration procedures, transparency, and appropriate safeguards for student data.
Dr Hisham Al Ghunaimi
Read the Original
This page is a summary of: Reliable but not rigorous: Evaluating ChatGPT's reliability, validity, and bias in automated academic grading, Social Sciences & Humanities Open, June 2026, Elsevier,
DOI: 10.1016/j.ssaho.2026.102788.
You can read the full text:
Contributors
The following have contributed to this page







