Evaluation of Educational Documentary using GAI and Prompt Engineering

Authors

  • José Miguel Suárez-Martínez Consellería Educación. Generalitat Valenciana (Spain)
  • Roberto Arnau Roselló Universitat Jaume I (Spain)
  • Rubén Nieto-González Universidad Jaume I (Spain)

DOI:

https://doi.org/10.5281/zenodo.16121533

Keywords:

Artificial Intelligence, Educational Evaluation, ChatGPT, Transmedia Storytelling, Key Indicators, Delphi Method

Abstract

The study explores the use of generative artificial intelligence (GAI) in evaluating educational audiovisual material, focusing on the documentary NOMADS, produced at the Cabo de la Huerta Secondary School in Alicante, Spain, as part of an Erasmus+ project on human rights. A multi-agent system was developed using prompt engineering (PE) techniques in ChatGPT 4.0, aiming for a multidimensional evaluation of the documentary based on key performance indicators. The methodology is grounded in semiotic engineering, a discipline that analyzes human-computer interaction. The process comprises four phases and incorporates a variant of the Delphi method, a structured technique for achieving expert consensus. In the first phase, PE generates evaluation metrics. Then, expert agent profiles are defined. The third phase models the iteration of prompts to gather evaluations, and the final phase involves the statistical validation of results. Adaptive interaction factors are introduced as tools to map multidimensional evaluations using PE, integrating insights from five experts in education and audiovisual fields. The resulting metrics successfully captured elements such as educational impact, transmedia storytelling, and audiovisual quality. The outcome presents a validated methodological framework, with internally confirmed results through different coefficients, highlighting the need for further validation through both quantitative tools and real expert panels.

References

Basili, V. R. (1992). Software Modeling and Measurement: The Goal/Question/Metric Paradigm. University of Maryland at College Park. https://hdl.handle.net/1903/7538

ChatGPT. (2025a). Libro de datos de evaluación documental NOMADS en Google Sheets [Data set obtenido de ChatGPT (2025c)]. https://bit.ly/libro-datos-evaluación-NOMADS

ChatGPT. (2025b). Respuesta generada con ChatGPT mediante ingeniería de prompts. Elaboración de Métricas de material audiovisual educativo [Documento PDF generado con modelo de lenguaje GPT-4, indizado y parcialmente editado para facilitar su análisis operativo]. OpenAI. https://bit.ly/hilo-chat-gpt-elaboracion-metricas-NOMADS

ChatGPT. (2025c). Respuesta generada mediante prompting con ChatGPT, diseñada con técnicas específicas de ingeniería de prompts para evaluación de documental educativo transmedia [Documento PDF generado con modelo de lenguaje GPT-4, indizado y parcialmente editado para facilitar su análisis operativo]. OpenAI. https://bit.ly/evaluacion-agentesIA-NOMADS-hilo-ChatGPT

Chaubey, H. K., Tripathi, G. y Ranjan, R. (2024). Comparative Analysis of RAG, Fine-Tuning, and Prompt Engineering in Chatbot Development. En 2024 International Conference on Future Technologies for Smart Society (ICFTSS) (pp. 169-172). IEEE. https://doi.org/10.1109/ICFTSS61109.2024.10691338

Cordón García, O. (2023). Inteligencia Artificial en Educación Superior: Oportunidades y Riesgos. RiiTE Revista interuniversitaria de investigación en Tecnología Educativa, (15), 16-27. https://doi.org/10.6018/riite.591581

De Souza, C. S. (2005). The Semiotic Engineering of Human-Computer Interaction. MIT Press. https://doi.org/10.7551/mitpress/6175.001.0001

Díaz Arce, D. (2024). Herramientas para detectar el Plagio a la Inteligencia Artificial: ¿cuán útiles son? Tools to detect Plagiarism in Artificial Intelligence: how useful are they? Revista Cognosis, 9(2), 144-150. https://doi.org/10.33936/cognosis.v9i2.6195

Diaz Vera, J. P., Molina Izurieta, R., Bayas Jaramillo, C. M. y Ruiz Ramírez, A. K. (2024). Asistencia de la inteligencia artificial generativa como herramienta pedagógica en la educación superior. Revista de Investigación en Tecnologías de la Información, 12(26), 61-76. https://doi.org/10.36825/RITI.12.26.006

Easterday, M. W., Rees Lewis, D. G. y Gerber, E. M. (2018). The logic of design research. Learning: Research and Practice, 4(2), 131-160. https://doi.org/10.1080/23735082.2017.1286367

Guo, B., Wang, H., Xiao, W., Chen, H., Lee, Z., Han, S., et al. (2024). Sample Design Engineering: An Empirical Study on Designing Better Fine-Tuning Samples for Information Extraction with LLMs. En Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 573-594). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-industry.43

Hsu, C.-C. y Sandford, B. A. (2007). The Delphi Technique: Making Sense of Consensus. Practical Assessment, Research, and Evaluation, 12(1), 10. https://doi.org/10.7275/pdz9-th90

HUMREV. (2016). Canal de Youtube del proyecto HUMREV [Video]. YouTube. https://bit.ly/canal-youtube-HUMREV

HUMREV. (2017). NOMADS, Migrations and Human Rights [Video]. YouTube. https://bit.ly/NOMADS-documentary-film

Joshi, I., Shahid, S., Venneti, S., Vasu, M., Zheng, Y., Li, Y., et al. (2024). CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering. arXiv preprint arXiv:2411.06099. https://doi.org/10.48550/arXiv.2411.06099

Li, D., Zhao, Y., Wang, Z., Jung, C. y Zhang, Z. (2024). Large Language Model-Driven Structured Output: A Comprehensive Benchmark and Spatial Data Generation Framework. ISPRS International Journal of Geo-Information, 13(11), 405. https://doi.org/10.3390/ijgi13110405

Moreno, Y., Ortega, L., Reyes, J. y Saldana-Barrios, J. J. (2024). Revisión Sistemática de la Literatura Acerca de Prompt Engineering Enfocado en la Educación. Revista Ibérica de Sistemas e Tecnologias de Informação, (E74), 328-345. https://dialnet.unirioja.es/servlet/articulo?codigo=9929876

Moruzzi, S., Ferrari, F. y Riscica, F. (2024). Biases, Epistemic Filters, and Explainable Artificial Intelligence. En Proceedings of the Third International Conference on Hybrid Human-Artificial Intelligence (HHAI). CEUR Workshop Proceedings. https://ceur-ws.org/Vol-3825/short1-3.pdf

Okoli, C. y Pawlowski, S. D. (2004). The Delphi method as a research tool: an example, design considerations and applications. Information & Management, 42(1), 15-29. https://doi.org/10.1016/j.im.2003.11.002

Prendes-Espinosa, M. P. (2023). La revolución de la Inteligencia Artificial en tiempos de negacionismo tecnológico. RiiTE Revista interuniversitaria de investigación en Tecnología Educativa, (15), 1-15. https://doi.org/10.6018/riite.594461

Reinking, D. (2021). Design-Based Research in Education: Theory and Applications. Guilford Publications.

Roumeliotis, K. I. y Tselikas, N. D. (2023). ChatGPT and Open-AI Models: A Preliminary Review. Future Internet, 15(6), 192. https://doi.org/10.3390/fi15060192

Rowe, G. y Wright, G. (1999). The Delphi technique as a forecasting tool: issues and analysis. International Journal of Forecasting, 15(4), 353-375. https://doi.org/10.1016/S0169-2070(99)00018-7

Singh, J., Samborowski, L. y Mentzer, K. (2023). A Human Collaboration with ChatGPT: Developing Case Studies with Generative AI. Proceedings of the ISCAP Conference, 9, n6039. https://bit.ly/3C64Nfu

Suárez-Martínez, J. M. (2023). Propuesta de Indicadores de logro para evaluación de estrategia transmedia en educación secundaria [Trabajo Fin de Máster]. Universidad Jaume I. https://doi.org/10.35542/osf.io/epyb7_v1

Suárez-Martínez, J. M. (2024). Evaluación de proyectos Erasmus + con Indicadores clave e inteligencia artificial generativa. Anexo Publicaciones. https://bit.ly/anexo-publicaciones

Turoff, M. y Linstone, H. A. (2002). The Delphi Method-Techniques and Applications. http://is.njit.edu/pubs/delphibook/

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., et al. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv preprint arXiv:2302.11382. https://doi.org/10.48550/arXiv.2302.11382

Ye, Q., Axmed, M., Pryzant, R. y Khani, F. (2023). Prompt Engineering a Prompt Engineer. arXiv preprint arXiv:2311.05661. https://doi.org/10.48550/arXiv.2311.05661

Zhou, W., Jiang, Y. E., Cotterell, R. y Sachan, M. (2023). Efficient Prompting via Dynamic In-Context Learning. arXiv preprint arXiv:2305.11170. https://doi.org/10.48550/arXiv.2305.11170

Published

2025-07-28

How to Cite

José Miguel Suárez-Martínez, Roberto Arnau Roselló, & Rubén Nieto-González. (2025). Evaluation of Educational Documentary using GAI and Prompt Engineering. Comunicar, 33(82), 90–102. https://doi.org/10.5281/zenodo.16121533

Issue

Section

Research Article