AI Capabilities in Formative Assessment of Undergraduate Theses: Human Experts vs GPT

Authors

  • Dr. Cristian Velandia-Mesa Director de Investigaciones, Universidad El Bosque (Colombia)
  • Dra. Ruth Stella Chacón Pinilla Profesora Titular de la Universidad El Bosque (Colombia)
  • Dra. Erika Fernanda Cortés Ibarra Profesora Titular de la Universidad El Bosque (Colombia)
  • Dr. Carlos Eduardo Rodríguez Muñóz Profesor Investigador de la Universidad El Bosque (Colombia)

DOI:

https://doi.org/10.5281/zenodo.15995749

Keywords:

Artificial Intelligence, Higher Education, Assessment, Educational Technologies, Intelligent Systems, Educational Innovation

Abstract

The accelerated development of artificial intelligence (AI) in educational contexts poses significant challenges to traditional models of formative assessment in Higher Education. This study aimed to examine the capacity of Generative Pretrained Transformer (GPT) models to perform evaluative functions in undergraduate thesis assessment, comparing their judgments with those issued by expert human evaluators. A non-integrated mixed-methods approach was employed, combining a quasi-experimental time-series design with a control group and a qualitative corpus analysis of feedback content. Sixteen undergraduate theses were intentionally selected as case studies and assessed at three successive stages of the formative process. The quantitative component analyzed the evolution, consistency, and alignment of scores, while the qualitative analysis explored the critical depth, argumentative structure, and pedagogical orientation of the feedback. The findings reveal a progressive convergence between GPT and expert evaluations, with high levels of correlation and agreement in the final stage. Additionally, GPT-generated feedback showed sustained improvement in semantic richness, argumentative precision, and adaptive capacity. It is concluded that, under controlled conditions and clearly defined evaluative criteria, GPT models exhibit significant potential as complementary agents in formative assessment within Higher Education, offering efficiency, consistency, and scalability. However, limitations remain regarding the personalization of feedback and the promotion of critical reflection, highlighting the need to enhance their pedagogical and metacognitive capabilities.

References

Boud, D. y Falchikov, N. (2006). Assessment & Evaluation in Higher Education. Assessment & Evaluation in Higher Education, 31(4), 399-413. https://doi.org/10.1080/02602930600679050

Braun, V. y Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa

Cáceres, P. (2008). Análisis cualitativo de contenido: una alternativa metodológica alcanzable. Psicoperspectivas. Individuo y sociedad, 2(1), 53-82. https://doi.org/10.5027/psicoperspectivas-Vol2-Issue1-fulltext-3

Chacón Pinilla, R. S. (2014). Del maestro como investigador: ¿reto y necesidad? Itinerario Educativo, 28(64), 249-257. https://doi.org/10.21500/01212753.1430

Cortés Ibarra, E. F. y Martínez Clares, P. (2020). Adaptación y validación de un cuestionario para la caracterización de las prácticas evaluativas de los aprendizajes en educación superior PREVAPREDU. Horizontes pedagógicos, 22(2), 25-36. https://doi.org/10.33881/0123-8264.hop.22202

Creswell, J. W. y Plano Clark, V. L. (2017). Diseño y desarrollo de métodos mixtos de investigación. SAGE Publications.

Delso Vicente, A. T., Carvajal Camperos, M. y Corral De La Mata, D. Á. (2024). La evolución del procesamiento del lenguaje natural y su influencia en la inteligencia artificial: Una revisión y líneas de investigación futura. European Public & Social Innovation Review, 10, 1-23. https://doi.org/10.31637/epsir-2025-782

Flick, U. (2018). An Introduction to Qualitative Research (6ª ed.). SAGE Publications. https://doi.org/10.4135/9781529622737

Floridi, L. y Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30(4), 681-694. https://doi.org/10.1007/s11023-020-09548-1

Fúquene Ardila, H. J. (2024). Procesamiento de Lenguaje Natural, los Transformers y los Bots Conversacionales. XIKUA Boletín Científico de la Escuela Superior de Tlahuelilpan, 12(Especial), 151-160. https://doi.org/10.29057/xikua.v12iEspecial.12904

García-Peñalvo, F. y Vázquez-Ingelmo, A. (2023). What Do We Mean by GenAI? A Systematic Mapping of The Evolution, Trends, and Techniques Involved in Generative AI. International Journal of Interactive Multimedia and Artificial Intelligence, 8(4), 7-16. https://doi.org/10.9781/ijimai.2023.07.006

Glaser, B. y Strauss, A. (1967). The discovery of grounded theory. Chicago: Aldine Press.

Guárdia-Ortiz, L., Bekerman, Z. y Zapata-Ros, M. (2024). Presentación del número especial “IA generativa, ChatGPT y Educación. Consecuencias para el Aprendizaje Inteligente y la Evaluación Educativa”. Revista de Educación a Distancia (RED), 24(78). https://doi.org/10.6018/red.609801

Kalla, D., Smith, N., Samaah, F. y Kuraku, S. (2023). Study and Analysis of Chat GPT and its Impact on Different Fields of Study. International Journal of Innovative Science and Research Technology, 8(3), 827-833. https://ssrn.com/abstract=4402499

Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., et al. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274

Martínez-Comesaña, M., Rigueira-Díaz, X., Larrañaga-Janeiro, A., Martínez-Torres, J., Ocarranza-Prado, I. y Kreibel, D. (2023). Impacto de la inteligencia artificial en los métodos de evaluación en la educación primaria y secundaria: revisión sistemática de la literatura. Revista de Psicodidáctica, 28(2), 93-103. https://doi.org/10.1016/j.psicod.2023.06.001

Menard, S. W. (2002). Longitudinal research (2ª ed.). Sage Publications. https://doi.org/10.4135/9781412984867

Nicol, D. J. y Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199-218. https://doi.org/10.1080/03075070600572090

Paredes-Marín, R. V., Ramírez-Chumbe, I. y Ramírez-Chumbe, C. A. (2024). La competencia digital y desempeño docente en instituciones educativas públicas: estudio bibliométrico en Scopus. Revista Científica UISRAEL, 11(1), 31-48. https://doi.org/10.35290/rcui.v11n1.2023.1066

Velandia-Mesa, C., Serrano-Pastor, F. J. y Martínez-Segura, M. J. (2019). El desafío de la formación en competencias para la investigación educativa: aproximación conceptual. Actualidades Investigativas en Educación, 19(3), 1-27. https://doi.org/10.15517/aie.v19i3.38738

Velandia-Mesa, C., Serrano-Pastor, F. J. y Martínez-Segura, M. J. (2021). Evaluación de la investigación formativa: Diseño y validación de escala. Revista Electrónica Educare, 25(1), 1-20. https://doi.org/10.15359/ree.25-1.3

Yeadon, W., Peach, A. y Testrow, C. (2024). A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course. Scientific Reports, 14(1), 23285. https://doi.org/10.1038/s41598-024-73634-y

Published

2025-07-28

How to Cite

Dr. Cristian Velandia-Mesa, Dra. Ruth Stella Chacón Pinilla, Dra. Erika Fernanda Cortés Ibarra, & Dr. Carlos Eduardo Rodríguez Muñóz. (2025). AI Capabilities in Formative Assessment of Undergraduate Theses: Human Experts vs GPT. Comunicar, 33(82), 103–115. https://doi.org/10.5281/zenodo.15995749

Issue

Section

Research Article