Introduction and state of the art
In the past years, there has been a renewal of educational software thanks to the incorporation of specific designs based on serious games. “Serious game” refers to a game in which “education (in its various forms) is the primary goal, rather than entertainment” (Michael & Chen, 2006: 17), with the focus placed on specific contents, regardless of the form and the structure that are being used (Zagalo, 2010). These initiatives build on one of the possibilities of serious games, but we should not lose sight of the richness of all the modalities of serious games, based on their ludic structure, purpose or scope (Alvarez & Djaouti, 2012; Romero-Rodríguez & Torres-Toukoumidis, 2018). The design of experiences from this perspective, which has multiple parallels with the field of videogames, provides teachers in educational centers with a new way of working. In addition to these developments, we should mention the emergence of initiatives that include several gamification-based elements, placing the focus on the domain of motivation (Pérez-Manzano & Almela-Baeza, 2018). Although their implementation usually has positive effects, it is greatly dependent on the contexts of application (Hamari et al., 2014).
Some studies related to the use of serious games in the classroom reveal improvements in learning (Clark et al., 2016; Wouters et al., 2013), or identify progress in students’ cognitive capacity (Lamb et al., 2018). This situation informs the school curriculum (Carvalho et al., 2018). However, there are also certain studies that, while reporting benefits in the level of student participation and involvement, do not provide conclusive data on learning progress in general (Chauhan, 2017; Fisher et al., 2020), or in relation to specific contents (Boendermaker et al., 2017; Mellado et al., 2018). Additionally, although teachers do not rule out the possibility of using serious games, there are only very sporadic experiences (Del-Moral & Fernández, 2015). Kaufman’s work (2013) refers to the existence of a generational barrier (i.e., teacher not older than 35 years), as teachers above this age lack previous positive experiences with this type of software (Marín-Díaz et al., 2019). This situation reveals a strong relationship between teachers’ attitude toward this type of software and its potential use in the classroom (Stieler-Hunt & Jones, 2015).
What all available studies suggest is how beneficial serious games are to students’ attitude, with a remarkable impact on the domain of motivation (Filella et al., 2017; Gómez-García et al., 2016). This factor should be borne in mind when considering the use of these resources in the classroom, especially in areas of the curriculum deemed critical, such as languages and mathematics. Specifically, calculus and its automation are still some of the main challenges to be tackled at these levels of education (Baroody et al., 2009). It is not just one more element of content: continued improvement in these learnings throughout compulsory education serves as a guarantee that spreads to schoolwork in general (Duncan et al., 2007) and may become a good predictor of learning at higher levels (Geary, 2011). Therefore, this space is potentially useful for the implementation of serious games, which might facilitate learning that requires some sort of automation. There are currently several pieces of software that have been designed as serious games specifically for these contents and which, in some cases, also enable the use of gamification as a potential complement, as this is, precisely, a key strategy in the domain of motivation (McGonigal, 2011). Gamification may take place in parallel, reinforcing motivation in a non-ludic context (Teixes, 2015). It provides help in the progress toward a final goal, thereby increasing students’ interest considerably (Zagal & Altizer, 2014).
At present, it is possible to access online serious games designed to be used in educational centers around the world, breaking the traditional barriers of publishing markets and opening the door to experimentation with initiatives available to other countries. The use of these products in the Spanish context may be challenging, since in some cases there are differences with the curricula of the country of origin, as happens with the learning of calculus. It is common to present addition and subtraction at the same time, in grades 1 and 2, and multiplication together with division in grade 3, as contained in the proposal of the U.S. Council of Chief State School Officers (2020) under what has been called “Common Core State Standards”, compared with the much more linear guidelines of the curriculum applied in Spain.
Precisely, the contents relating to calculus have been a recurring focus of work in the field of serious games in the past years, with a direct impact on the field of mathematics fluency (Baroody et al., 2013). The latter is understood as a sign of skillfulness in algorithm solving. It is based on principles such as efficiency, precision, the use of strategies and flexibility (Kling & Bay-Williams, 2014). The development of fluency maximizes efficiency and precision, and making progress in fluency is a protective factor against failure in the areas of mathematics or reading (Meiri et al., 2019). Evidence of these processes can be identified as an indicator of performance according to the specific learning proposals.
Recent studies have explored the benefits that this type of software bring to mathematics fluency, and such benefits have become the reference to assess these products (Van-der-Ven et al., 2017). An example of this is Reflex Math, with a substantial level of use in American classrooms at different educational levels (Cozad, 2019; Cress, 2019; Sarrell, 2014). The software generates a personalized learning route based on response times and error rates. It also provides help to students through a virtual trainer. The figure of the virtual trainer provides some guidance to develop strategies for overcoming any difficulties detected in the individual learning routes, adapted to the specific situation of the student. Errors are not seen as penalties; rather, they become opportunities for improvement. In this sense, errors are analyzed by the system to activate the support processes by the virtual instructor. In addition, the software has a built-in gamification system that rewards students by improving their avatars and awarding them diplomas, but never giving them academic benefits, through redeemable points based on the number of activities carried out, the frequency of use of the software or the correct completion of the tasks. This type of rewards that are external to the use of the software and the promotion of these rewards are a supplemental task that should be managed by teachers. Students carry out their work independently. Teachers can monitor the process through specific tools that provide information based on the data collected by the system. However, we should not lose sight of the fact that the use of software with these characteristics reveals, again, certain contradictions. A number of studies focused on mathematics have demonstrated benefit in learning (Fernández-Robles et al., 2019; Pires et al., 2019), while other studies suggest the contrary (Hieftje et al., 2017).
In this scenario, we designed a research study with the general purpose of understanding the impact of the use of serious games on mathematics fluency among primary education students. Our research, conducted in real classroom contexts, was based on several variables of interest, such as the grade level of the students, the application of gamification, the classroom group and teaching experience. We also intended to understand the relationship between the results derived from this learning initiative and the grades obtained by the students. The findings from this study may be useful to governmental agencies and educational centers in decision-making processes related to the provision of resources and the development of innovative strategies in the classroom-based methodologies used.
Materials and methods
In response to the objective described, a pretest-posttest quasi-experimental design without a control group and with several experimental groups was used. Totally standardized conditions were sought, i.e., a common school scenario with classroom groups already formed and without any sort of randomness. Each group worked independently between the two tests that were applied, using the educational software Reflex Math. Those in charge of the implementation were, in each case, teachers in the area of mathematics, who were all given the same indications to use all the possibilities provided by the system. The study was carried out after obtaining informed consent from the school where it was conducted.
The research was carried out in a school that offered three lines for each level, with the participation of 12 primary classes between levels 1 and 4. This was a private school financed by public funds through an agreement with the State and located in an urban area of Galicia. The study sample was constituted by 284 students. A total of 54.2% were boys and 45.8% were girls. A total of 24.3% of them were in the first grade, 25.4% were in the second grade, 25.7% were in the third grade and 24.6% were in the fourth grade. Our proposal was previously submitted to the management team and the 25 members of the staff teaching at the primary education stage. They were given information explaining the characteristics of the program and the research proposal, as well as the implications of the use of the software for their teaching. Lastly, a positive evaluation was obtained for their participation in the research, particularly from the teachers responsible for the area of mathematics, who were expected to be more deeply involved in the process.
The Basic Math Operations Task (BMOT), developed by Foegen and Deno (2001), was used to assess the learning of calculus. Access to this tool was gained through Sarrell’s work (2014). Its use required a full translation into Spanish of both the pretest and the posttest. In both cases, combined addition, subtraction, multiplication and division calculation operations were presented. This proposal perfectly suits the educational level envisaged at this school in the third and the fourth grades. For the first and the second grades, our proposal was adapted to the curricular level and included only addition and subtraction operations. The correction of the test generates an indicator of individual performance based on the summation of all correct answers in a maximum of one minute. This indicator is one of the potential benefits that can be directly derived from the use of the software, hence the advisability of using it.
The research was conducted in the first term of the year 2019-2020. The pretest was applied in September 2019, and the posttest was applied in December that year. The use of the software was integrated into the duration of the mathematics class; following the developer’s guidance, it was used in three sessions per week. Also, student data were collected in relation to academic performance, specifically the students’ grades in all the areas evaluated at this stage at the end of the term, at the time the posttest was applied. As to the teachers, data on the number of years of teaching experience were collected. By accessing the system’s database using a tool intended specifically for teachers, it was possible to compile individual data on the number of days of use of the program, the volume of activities completed and the use of gamification strategies.
The statistical software SPSS v. 25 was used for data analysis. Univariate descriptive analyses based on measures of central tendency and dispersion were carried out. Because parametric assumptions were not met for related situations (comparison between the pretest and the posttest), the Wilcoxon signed-rank test was applied, for which a level of significance of 0.05 was set. For independent situations, the Mann–Whitney U test was used to find out whether there were significant differences (p<0.05) between the classes in which gamification had been applied and those that had not applied it. It was also used to find differences between the classes with novice teachers and those with experienced teachers. Also, the statistic r=|z| / √N was calculated as measure of effect size (Field, 2018; Fritz et al., 2012). For the interpretation of r, we followed the criterion proposed by Cohen (1988), with the extension suggested by Rosenthal (1996). Spearman’s ρ coefficient (rs) was used to establish the degree of association between the different variables analyzed. Statistical significance (p<0.05) and relationship strength were jointly considered for its interpretation, for which the indications given in the specialized literature (including, among others, Sánchez-Huete, 2013) were followed. Lastly, the coefficient of determination (R2) was calculated to have an approximation to the amount of variance of academic performance as explained by the results of the learning method based on serious games.
Analyses and results
First, we will present an overview of the effect of serious games on mathematics fluency to later analyze it according to the grade leven were the games were used. Additionally, the results of the proposal will be analyzed according to the application or non-application of gamification, and also with regard to each specific class, considering the distinction between novice teachers and experienced teachers in those classes with gamification conditions. We will finish with a study of the relationships between the results achieved through this learning proposal and the grades obtained by the students.
Serious games and improvement in mathematics fluency
The software was used for a mean of 27 days, over which the students carried out 5,747 activities on average. After using the software, a statistically significant improvement of great magnitude was seen in mathematics fluency (n=284, Z=-14.291, p=0.000, r=0.60), which went from an average score of 8.99 (pretest) to 17.79 (posttest). The time of use of the software was clearly related to the number of activities solved (rs=0.82, p=0.000).
When we look at each educational level, it can be seen that the posttest score comes hand in hand with two aspects that turn out to be revealing, with a generally significant relationship that tends to be moderate in the case of time of use (grade 1, r=0.20, p=0.108; grade 2, r=0.36, p=0.002; grade 3, r=0.39, p=0.001; grade 4, r=0.40, p=0.001) and moderate with regard to the number of activities solved (grade 1, rs=0.37, p=0.002; grade 2, rs=0.65, p=0.000; grade 3, rs=0.48, p=0.000; grade 4, rs=0.48, p=0.000).
Table 1 contains the main descriptive statistics for each level. Significant differences are found between the scores obtained in the pretest and those attained in the posttest, with a difference of great magnitude in all cases: in grade 1 (Z=-7.225, p=0.000, r=0.62), in grade 2 (Z=-7.378, p=0.000, r=0.61), in grade 3 (Z=6.354, p=0.000, r=0.53) and in grade 4 (Z=-7.251, p=0.000, r=0.61).
As Figure 1 shows, the scores were lower and tended to be more concentrated in the pretest, while the posttest scores reached higher values and had greater dispersion. Substantial progress was made between the baseline situation and the final situation at the end of the term at all educational levels.
Serious games and gamification in the classroom
The software proposed included a gamification strategy to be used in the classroom as a complementary element, which was employed by some of the teachers. The pretest score was lower in the classes where gamification was implemented, and four points higher in the classes in which gamification was not applied (Table 2). This is a statistically significant difference of medium magnitude (RYes=123.24, RNo=181.43, Z=-5.628, p=0.000, r=0.33). It is important to note the variable “grade” in this result, since the classes that implemented no gamification were of the third and the fourth grades, while the classes that did implement gamification covered also the first and the second grades. However, the posttest reflected greater progress in the classes with gamification (a 10-point difference between the two means) compared with the classes in which gamification had not been applied (6.38 points). This difference is significant in the case of the progress made by the classes that had gamification versus the progress of the classes in which gamification was not used, with an effect size that comes close to medium (RYes=158.68, RNo=109.80, Z=-4.729, p=0.000, r=0.28). In fact, the posttest score does no longer reveal any significant or remarkable differences between the classes that applied gamification and those in which it was not implemented (RYes=139.59, RNo=148.38, Z=-0.850, p=0.395, r=0.05). In addition, the classes with gamification attained more days of use of the program and showed a higher number of activities solved by the students. Statistically significant differences with a medium effect size were obtained for the days of use (RYes=164.82, RNo=97.39, Z=-6.517, p=0.000, r=0.39), and an effect size close to medium for the number of activities (RYes=157.58, RNo=112.02, Z=-4.399, p=0.000, r=0.26).
At an intragroup level, there was a statistically significant improvement of great magnitude in mathematics fluency, both in the group that benefited from gamification in the classroom (n=190, Z=-11.959, p=0.000, r=0.61) and in the group in which gamification had not been implemented (n=94, Z=-7.638, p=0.000, r=0.56), although the effect size was greater for the former.
At classroom level: Novice teachers versus experienced teachers
All classroom groups made progress in mathematical calculus fluency after the serious gaming experience (Table 3). However, there were groups whose level of progress was clearly lower, as occurred in A8 and A9. Equally, this situation extended to apply to effect size, which was large in all classes (r=0.62) but lower in A8 (r=0.43) and A9 (r=0.49). We started from a similar scenario for all cases; however, diverse ways of use, appropriation and management can be seen among the teachers.
At this point in the analysis, a potential relationship between these data and the teacher’s professional experience emerges. In order to make comparisons between classes under this criterion, we took educational levels as reference, since the requirements of the curricula were similar. At the same time, there had to be a coincidence between the conditions of implementation or non-implementation of gamification in the classroom. In this sense, we saw that both the teachers with greater seniority (more than 30 years of teaching experience) and the novice teachers (between 1 and 5 years of teaching experience) used the software in a similar way during class time.
Even in some cases, higher values were obtained for the teachers with greater seniority and, in addition, better posttest results were seen among the students who had more experienced teachers. In this regard, in the second grade, in which all the teachers used gamification, those who were more experienced (in A4 and A5) attained better mean results in the posttest than the novice teachers (A6) (M=21.81 versus M=19.33), made more intensive use of the program (M=30.6 days versus M=27.7) and obtained a higher rate of activities carried out (M=7,299.9 versus M=5,726.6). It was seen that there were no significant differences between the values obtained for the novice teachers and those for the experienced teachers, with small effect sizes (pretest: Z=-0.665, p=0.506, r=0.08; posttest: Z=-1.005, p=0.315, r=0.12; days of use: Z=-1.580, p=0.114, r=0.19; activities solved: Z=-1.374, p=0.170, r=0.16).
As to grade 3, for the two classes that were compared (A8: novice teacher; A9: experienced teacher), in which gamification had not been used, similar values were obtained in the use of the software and the number of activities solved (Table 3). In fact, there were no significant differences in this regard (Table 4); in contrast, the pretest and posttest scores did show significant differences, with a notable effect size.
Relationship between the posttest scores and the academic performance of the students
In general, a significant and moderate relationship can be seen between the posttest scores and the grades obtained in mathematics in the first term, except in the third grade (grade 1: rs=0.62, p=0.000, R2=0.38; grade 2: rs=0.53, p=0.000, R2=0.28; grade 3: rs=0.15, p=0.203, R2=0.02; grade 4: rs=0.43, p=0.000, R2=0.18).
There was also a significant and moderate relationship between posttest scores and the academic grades in general, except in the third grade, where a lower correlation was obtained, although it was also significant (grade 1: rs=0.57, p=0.000, R2=0.32; grade 2: rs=0.63, p=0.000, R2=0.40; grade 3: rs=0.33, p=0.004, R2=0.11; grade 4: rs=0.46, p=0.000, R2=0.21). We should highlight the coefficient of determination in the second grade, which allows us to infer that the posttest scores account for up to 40% of the variance of the overall academic grades.
Depending on the classes (Table 5), we can also notice, in general, a significant and moderate-high relationship between the posttest scores and the school grades, except for a number of specific groups in the third grade (A7, A8 and A9) and in the fourth grade (A12). Several coefficients of determination stand out which indicate that more than 50% of the variability of the grades in mathematics and of the overall academic grades can be accounted for by their relationship with the posttest scores.
Discussion and conclusions
The purpose of this study was to investigate the impact of the use of serious games on the work carried out at primary education levels. Our study provides scientific evidence of the improvement in mathematical contents through the use of serious games. These results are consistent with the work of Clark et al. (2016), Carvalho et al. (2018) and Wouters et al. (2013). Our study reinforces this line of research with evidence of the benefit that serious games can provide in real-life contexts. Specifically, substantial progress in mathematics fluency can be seen in all four grades analyzed, in all 12 classes involved, and for all the participating students. These data are consistent with previous studies that used much smaller samples (Cozad, 2019) and students with learning difficulties (Sarrell, 2014). There is a statistically significant difference of great magnitude between the baseline situation and the situation that was seen after the serious gaming experience. We should bear in mind that there is an apparently limiting factor in these data: the fact that a piece of software designed for a US curriculum was used. This did not determine its potential. Additionally, greater progress in mathematics fluency as well as more intensive use of the program and a higher number of activities solved were observed in the classes that used the gamification strategy. This leads us to confirm the impact of the use of these strategies on the domain of motivation, clearly in line with previous studies (Filella et al., 2017; McGonigal, 2011; Zagal & Altizer, 2014), and to associate its use with an improvement in students’ performance (Fernández-Robles et al., 2019; Pires et al., 2019).
A significant and moderate relationship is also found between the scores obtained in the performance tests applied after the serious gaming experience and the students’ grades, with similar percentages of variation, which exceed 50% for some classes. In this regard, we should point out that this relationship between the different variables does not necessarily imply causality but refers to the degree of relationship. These data reveal the explanatory power of this type of curricular contents in relation to the work carried out in the classroom, which serves to update previous studies on the matter and highlights the potential influence of this sort of proposals on students’ general performance (Duncan et al., 2007). The evidence of improvement shown by these data allows us to question the studies that suggest the existence of generational barriers against the potential use of this software by teachers (Kaufman, 2013) owing to a lack of previous positive experiences (Marín-Díaz et al., 2019). Adherence to work with serious games and the process to implement this in the classroom requires modulation by teachers. Rather than being a problem, as suggested by previous literature, age is seen here as an opportunity that complements the work carried out with the software, banishing the image of novice teachers as being more sensitive and inclined to the use of these programs. Experienced teachers utilize them with a level of intensity that is similar to that of novice teachers, and under equal conditions significant differences can even be found in favor of experienced teachers compared to younger ones. These results seem to indicate that these teachers value the use of serious games and gamification according to the perceived benefits, which drives them to reinforce this proposal by facilitating the conditions to carry out the tasks suggested, with knowledge and experience that lead to even better results. This situation opens up the possibility of investigating the elements that these teachers bring into play, and highlights the need to conduct research studies focused on core areas of schoolwork that have a high value for educational centers.
Everything suggests that contextual elements are determining factors in the study of serious games and gamification, and may lead to disparate results. The selection of contents that are relevant to teachers appears as a key factor when it comes to having an impact on schoolwork, bringing different skills—particularly mathematical and digital skills—into play. Additionally, students, who are clearly motivated and involved in this proposal, are offered digital gaming experiences with a high degree of isomorphism compared to the usual experiences of their leisure time, which helps to bridge the gap between the reality of the school and social reality and to add value to the role played by educational centers. This situation allows us to conclude that the use of serious games and gamification specifically designed for school environments has potential in relation to students’ performance.
Nevertheless, this work has some limitations. We should highlight the fact that it only included students and teachers from one single educational center, so the results should be used with caution in any potential generalizations beyond the context in which the study was carried out. Additionally, the reasons for the different forms of appropriation by teachers are unknown, especially in the case of experienced teachers, who, quite unexpectedly, joined the proposal with great intensity.
Idea, F.F.V.; Literature review (state of the art), F.F.V.; Methodology, E.V.C., E.M.P.; Data analysis, E.V.C., E.M.P.; Results, E.V.C., F.F.V., E.M.P.; Discussion and conclusions, F.F.V., E.V.C.; Writing (original draft), F.F.V., E.V.C.; Final revisions, E.V.C., E.M.P.; Design project and sporsonship, F.F.V. (1)