Dr. Cristian Velandia-Mesa Director de Investigaciones, Universidad El Bosque (Colombia)
Dra. Ruth Stella Chacón Pinilla Profesora Titular de la Universidad El Bosque (Colombia)
Dra. Erika Fernanda Cortés Ibarra Profesora Titular de la Universidad El Bosque (Colombia)
Dr. Carlos Eduardo Rodríguez Muñóz Profesor Investigador de la Universidad El Bosque (Colombia)
Keywords
Artificial Intelligence, Higher Education, Assessment, Educational Technologies, Intelligent Systems, Educational Innovation
Abstract
The accelerated development of artificial intelligence (AI) in educational contexts poses significant challenges to traditional models of formative assessment in Higher Education. This study aimed to examine the capacity of Generative Pretrained Transformer (GPT) models to perform evaluative functions in undergraduate thesis assessment, comparing their judgments with those issued by expert human evaluators. A non-integrated mixed-methods approach was employed, combining a quasi-experimental time-series design with a control group and a qualitative corpus analysis of feedback content. Sixteen undergraduate theses were intentionally selected as case studies and assessed at three successive stages of the formative process. The quantitative component analyzed the evolution, consistency, and alignment of scores, while the qualitative analysis explored the critical depth, argumentative structure, and pedagogical orientation of the feedback. The findings reveal a progressive convergence between GPT and expert evaluations, with high levels of correlation and agreement in the final stage. Additionally, GPT-generated feedback showed sustained improvement in semantic richness, argumentative precision, and adaptive capacity. It is concluded that, under controlled conditions and clearly defined evaluative criteria, GPT models exhibit significant potential as complementary agents in formative assessment within Higher Education, offering efficiency, consistency, and scalability. However, limitations remain regarding the personalization of feedback and the promotion of critical reflection, highlighting the need to enhance their pedagogical and metacognitive capabilities.
References
Boud, D. y Falchikov, N. (2006). Assessment & Evaluation in Higher Education. Assessment & Evaluation in Higher Education, 31(4), 399-413. https://doi.org/10.1080/02602930600679050
Braun, V. y Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
Cáceres, P. (2008). Análisis cualitativo de contenido: una alternativa metodológica alcanzable. Psicoperspectivas. Individuo y sociedad, 2(1), 53-82. https://doi.org/10.5027/psicoperspectivas-Vol2-Issue1-fulltext-3
Chacón Pinilla, R. S. (2014). Del maestro como investigador: ¿reto y necesidad? Itinerario Educativo, 28(64), 249-257. https://doi.org/10.21500/01212753.1430
Cortés Ibarra, E. F. y Martínez Clares, P. (2020). Adaptación y validación de un cuestionario para la caracterización de las prácticas evaluativas de los aprendizajes en educación superior PREVAPREDU. Horizontes pedagógicos, 22(2), 25-36. https://doi.org/10.33881/0123-8264.hop.22202
Creswell, J. W. y Plano Clark, V. L. (2017). Diseño y desarrollo de métodos mixtos de investigación. SAGE Publications.
Delso Vicente, A. T., Carvajal Camperos, M. y Corral De La Mata, D. Á. (2024). La evolución del procesamiento del lenguaje natural y su influencia en la inteligencia artificial: Una revisión y líneas de investigación futura. European Public & Social Innovation Review, 10, 1-23. https://doi.org/10.31637/epsir-2025-782
Flick, U. (2018). An Introduction to Qualitative Research (6ª ed.). SAGE Publications. https://doi.org/10.4135/9781529622737
Floridi, L. y Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30(4), 681-694. https://doi.org/10.1007/s11023-020-09548-1
Fúquene Ardila, H. J. (2024). Procesamiento de Lenguaje Natural, los Transformers y los Bots Conversacionales. XIKUA Boletín Científico de la Escuela Superior de Tlahuelilpan, 12(Especial), 151-160. https://doi.org/10.29057/xikua.v12iEspecial.12904
García-Peñalvo, F. y Vázquez-Ingelmo, A. (2023). What Do We Mean by GenAI? A Systematic Mapping of The Evolution, Trends, and Techniques Involved in Generative AI. International Journal of Interactive Multimedia and Artificial Intelligence, 8(4), 7-16. https://doi.org/10.9781/ijimai.2023.07.006
Glaser, B. y Strauss, A. (1967). The discovery of grounded theory. Chicago: Aldine Press.
Guárdia-Ortiz, L., Bekerman, Z. y Zapata-Ros, M. (2024). Presentación del número especial “IA generativa, ChatGPT y Educación. Consecuencias para el Aprendizaje Inteligente y la Evaluación Educativa”. Revista de Educación a Distancia (RED), 24(78). https://doi.org/10.6018/red.609801
Kalla, D., Smith, N., Samaah, F. y Kuraku, S. (2023). Study and Analysis of Chat GPT and its Impact on Different Fields of Study. International Journal of Innovative Science and Research Technology, 8(3), 827-833. https://ssrn.com/abstract=4402499
Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., et al. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
Martínez-Comesaña, M., Rigueira-Díaz, X., Larrañaga-Janeiro, A., Martínez-Torres, J., Ocarranza-Prado, I. y Kreibel, D. (2023). Impacto de la inteligencia artificial en los métodos de evaluación en la educación primaria y secundaria: revisión sistemática de la literatura. Revista de Psicodidáctica, 28(2), 93-103. https://doi.org/10.1016/j.psicod.2023.06.001
Menard, S. W. (2002). Longitudinal research (2ª ed.). Sage Publications. https://doi.org/10.4135/9781412984867
Nicol, D. J. y Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199-218. https://doi.org/10.1080/03075070600572090
Paredes-Marín, R. V., Ramírez-Chumbe, I. y Ramírez-Chumbe, C. A. (2024). La competencia digital y desempeño docente en instituciones educativas públicas: estudio bibliométrico en Scopus. Revista Científica UISRAEL, 11(1), 31-48. https://doi.org/10.35290/rcui.v11n1.2023.1066
Velandia-Mesa, C., Serrano-Pastor, F. J. y Martínez-Segura, M. J. (2019). El desafío de la formación en competencias para la investigación educativa: aproximación conceptual. Actualidades Investigativas en Educación, 19(3), 1-27. https://doi.org/10.15517/aie.v19i3.38738
Velandia-Mesa, C., Serrano-Pastor, F. J. y Martínez-Segura, M. J. (2021). Evaluación de la investigación formativa: Diseño y validación de escala. Revista Electrónica Educare, 25(1), 1-20. https://doi.org/10.15359/ree.25-1.3
Yeadon, W., Peach, A. y Testrow, C. (2024). A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course. Scientific Reports, 14(1), 23285. https://doi.org/10.1038/s41598-024-73634-y
Fundref
Universidad El Bosque (Colombia). Rectoría y Vicerrectoría de Investigaciones. Facultad de Educación. Grupo de Investigación: Educación e Investigación UNBOSQUE. Ministerio de Ciencia Tecnología e Innovación: Categoría A1.
Technical information
Received: 2025-03-20 | Reviewed: 2025-04-04 | Accepted: 2025-04-05 | Online First: 2025-07-21 | Published: 2025-07-24
Metrics
Metrics of this article
Views: 38099
Abstract readings: 36810
PDF downloads: 1289
Full metrics of Comunicar 77
Views: 459033
Abstract readings: 446071
PDF downloads: 12962
Cited by
Cites in Web of Science
Currently there are no citations to this document

Cites in Scopus
Currently there are no citations to this document

Cites in Google Scholar
Currently there are no citations to this document
Alternative metrics
How to cite
Dr. Cristian Velandia-Mesa., Dra. Ruth Stella Chacón Pinilla., Dra. Erika Fernanda Cortés Ibarra., Dr. Carlos Eduardo Rodríguez Muñóz. (2025). AI Capabilities in Formative Assessment of Undergraduate Theses: Human Experts vs GPT. Comunicar, 33(82). 10.5281/zenodo.15995749