Women on YouTube: Representation and participation through the Web Scraping technique


YouTube is the favourite entertainment platform for teens and pre-teens. It is configured as a space for interaction and collaboration that coordinates collective creativity as a generator of meaning. Because of this, nowadays the platform constitutes an enabling environment for subjectivation. Women and men participate by sharing or consuming videos, although the visibility and experiences are different for each gender. The aim of this study is to analyse the presence of women in the new spaces of participation, both from the perspective of producers and consumers of content. An analysis based on Web Scraping of the Instagram profiles of the 50 most successful Youtubers in Spain was carried out. The data obtained was analysed with the statistical software R. The results show a low representation of women among the channels with the highest number of views and subscribers. In addition, there is a lower presence of a female audience. Both roles, women as content creators and as consumers, are mostly associated with stereotypically feminine content such as beauty, drawmylife and fitness sports. The study shows that YouTube, the platform representing the new online participation spaces, reproduces the gender power structures of traditional media. An analytical approach to media education is needed to fight against sexist representations, stereotypes and under-representation of women in public spaces.


Gender, stereotypes, audience, teens, pre-teens, youtubers, YouTube, web-scraping

Palabras clave

Género, estereotipos, audiencia, adolescentes, preadolescentes, youtubers, YouTube, web-scraping


YouTube es la plataforma favorita de entretenimiento de adolescentes y pre-adolescentes. Se presenta en esta etapa como un entorno propicio para la subjetivación, al configurarse como un espacio de interacción y colaboración, que coordina la creatividad colectiva como generadora de significado. Mujeres y hombres participan compartiendo o consumiendo vídeos, aunque la visibilidad y las experiencias son diferentes para unas y otros. El objetivo de este estudio es analizar la presencia de la mujer en los nuevos espacios de participación, tanto desde una perspectiva de productora como de consumidora de contenidos. Se ha realizado un análisis basado en Web Scraping de las 50 cuentas de youtubers de mayor éxito en España, a través de sus perfiles de Instagram. Los datos obtenidos se han analizado con el software estadístico R. Los resultados evidencian la escasa representación femenina entre los canales con mayor número de visualizaciones y suscriptores. Además, existe una menor presencia de público femenino. Ambos roles aparecen asociados mayoritariamente a contenidos estereotipadamente femeninos como belleza, drawmylife y deportes fitness. El estudio pone de manifiesto que YouTube, plataforma representativa de los nuevos espacios de participación online, reproduce las estructuras de poder de género de los medios tradicionales. Es necesaria una educación mediática crítica para luchar contra las representaciones sexistas, los estereotipos y la insuficiente representación femenina en espacios públicos.


Introduction and current status of research

The development of digital media and Information and Communication Technologies (ICT), in a complex context characterised by social, economic, political and cultural change, has led to variations in the way we communicate, relate and perceive reality. This rapid socio-technological advance clashes with the almost leisurely pace of the transformation of gender inequalities in and throughout the media (French et al., 2019).

The processes of media convergence, participatory culture and collective intelligence (Ayuste et al., 2012) resulted in the transformation of the perception of media and the way in which we relate to it, whilst also leading to profound implications for those who use it (Jenkins, 2008). Furthermore, these processes forecast a potential democratisation (Jenkins et al., 2015) in terms of access and participation.

These transformations have led to the understanding of the web as a space where participatory culture can also overcome the barriers or imposed constraints linked to corporeality; which would imply the possibility of constructing alternative identities, avoiding those that have been attributed or imposed. Although this idea may be appealing from a gender perspective, it is somewhat naïve, as it postulates that if the same opportunities of access are offered to groups that have not been socialised in the same way, nor have had access to the same goods, actions or rights (Ficoseco, 2016), a democratic and universal participation would be achieved. This would mean that the web is interpreted as a space alien to the social fabric and that gender socialisation is not taken into consideration, as disembodiment is associated with the power to break from gender constructs and the freedom to create alternative identities.

Research on traditional media reveals a scarce presence of women in these spaces when their corporeality becomes visible and they are attributed a gender identity. As the studies promoted by UNESCO (Byerly, 2011), show, women are under-represented, especially in places of power, such as government or management councils (<20% in Spain). Similarly, it is evident that the positions involved in the decisions regarding the content produced in the media are not occupied by them either. As a result, only 24% of the people to whom this content refers are women (Macharia, 2015).

Female representation and representativeness are limited and influenced by stereotypes (Grizzle, 2014). The figures reveal slow progress on this issue, which is further aggravated by the perpetuation of online media trends (Macharia, 2015). In new participation spaces, the results are similar but acquire new expressions. In the case of YouTube, Wotanis and McMillan (2014) determined that, although there are numerous investigations on the presence and participation of women on the platform and some of them manage to obtain favourable results in terms of equality (Rainie et al., 2012), there is a shortage of female referents among the 50 most visited channels.

Digital technologies are part of the social fabric, given that these technologies and society mutually produce and define themselves as contingent and open agents, expressing the social relations in which they are integrated (Haraway, 2006). The absence of women in the technological narrative, or the binary and stereotyped expression of them, is a constitutive part of technology as media and as a social tool with a patriarchal system of imaginaries associated with prestige, reason and power (Ficoseco, 2016). In this context, digital media emerge with a hint of democratisation and decentralisation but do not challenge the patriarchal nature of the media (Macharia, 2015). Technology is neither naïve nor neutral, while the media is structured in a way that impacts gender power relations (French et al., 2019). This structure imposes rules on how people behave and the roles and responsibilities they assume, thus creating pressures and expectations.

From a gender perspective, it is necessary to critically examine how gender power devices appear and what social practices they reproduce, that is, to review what is privileged and what is postponed in the architecture of the Internet (Byerly, 2011).

Based on the current reality described above, this study seeks to assess the presence of women in online participation spaces — specifically, on today’s largest entertainment platform: YouTube (Berzosa, 2017), the leisure space most visited by young people and pre-teens (Haddon & Livingstone, 2012). From a gender standpoint, our purpose is to establish to what extent women are part of this new social framework for entertainment, considering those with producer and prosumer roles, as well as those limited to content consumption.

Gender and identity construction on YouTube

Watching videos is one of the most widespread entertainment habits among young Europeans, with YouTube rising as the queen of online platforms, used by 90% of 12-15 year-olds (Haddon & Livingstone, 2012), who mainly consume humorous videos and content related to video games (Mascheroni & Ófalsson, 2014).

Pre-adolescence and adolescence are the stages in which the platform shows greater success, coinciding with a key moment in the development of individuality and relevant decision-making for the future. This makes teens and pre-teens more susceptible to the influence of the environment when it comes to constructing their identity (Aran-Ramspott et al., 2018). YouTube constitutes an enabling environment for subjectivation as it is configured as a space for interaction and collaboration that coordinates collective creativity as a generator of meaning (Sánchez-Olmos & Hidalgo-Marí, 2016).

In this context, the mediating figures branch out into subjectivation and new forms of expression are configured, resulting in the promotion of new opportunities for the construction of the subject, thus placing the referents in these spaces in the focus of interest. At the same time, we must understand these opportunities in a context characterised by the convergence of the media and the socio-economic, political and cultural reality of the young people who participate on the platform, within which they are also subject to a gender socialisation.

Socialisation and education instill an ideal that situates the subject in the symbolic, the language and the available schemes of cultural intelligibility (Lacan, 1977), configuring a norm of conduct (Butler, 2010). Power forms the subject, provides it with the condition of its existence and the trajectory of its desire; in such a way that submission consists precisely of dependence on a discourse that is not chosen but paradoxically initiates and sustains the existence of the individual (Foucault, 1994). Therefore, socialisation and access to goods, actions and rights, among other issues (Ficoseco, 2016), inevitably determine gender power relations, as well as the role that women play on the platform, despite no existing explicit conditions.

The studies that have analysed the platform from a gender perspective manifest the appropriation of the environment and the unequal presence of identities. The lack of feminine referents between the figures of greater reach on the platform is evident (Wotanis & McMillan, 2014); as well as masculine domination characterised by sexism, visible in comments received by the female YouTubers that do not conform to the expectations of gender (Döring & Mohseni, 2018). Along with the conditioning of content according to gender, there is an imbalance in the production and reception of videos and a more active participation of men (Molyneaux et al., 2008; Sánchez-Olmos & Hidalgo-Marí, 2016), despite the greater effort that women invest in the quality of interactions (Pierson, 2015).

Gender does not seem to be a conditioning factor to be part of the platform. However, no research has been found that concludes equal participation, which reveals the traces of a participatory gap (Jenkins, 2009).

YouTubers: mediating figures in the construction of identity

The ways of dealing with culture have changed mainly due to the interaction with digital media (Dussel, 2017), which is especially evident in the field of YouTubers, a field characterised by a kind of participatory culture (Jenkins, 2009). There is a social connection between Youtubers and the audience, in a space where the latter frequently perform informal mentoring, thinking that they can teach something to their followers, who value their contributions. Therefore, today’s culture belongs to the audience, which is often rooted in their experiences and background (Jenkins et al., 2015).

The term YouTuber refers to content creators on YouTube (Berzosa, 2017; Van-Dijk, 2016). They are people, mostly young, who generate large masses of followers who support them over time. A community is formed around this figure, and the YouTuber himself or herself belongs to this community, which creates identity symbols. The interaction in the community is bidirectional: the YouTubers question their audience, who, in turn, provide feedback. This relationship, constituted in a relevant social context (Pérez-Torres et al., 2018), involves elements of collaboration, interaction with other users, learning opportunities, civic commitment and identity construction (Lange, 2014; Lenhart et al., 2015). The YouTuber phenomenon implies a generational and socio-temporal factor, which entails its own way of generating and consuming content (Berzosa, 2017). The attributes that characterise YouTubers are, in turn, those that propitiate their success: identity construction, processes of identification and empathy with the characters, as well as "the ability to improvise, to change, and to surprise", in a way that is far from the logic of traditional media (Aran-Ramspott et al., 2018: 73). These are people who are close and accessible, and who challenge the margins of intimacy and privacy.

The uniqueness of the strategy of the YouTuber on the web is based on the periodicity in the publication of videos in a way that connects the audience and creates expectation (Berzosa, 2017). This results in engagement from the audience that visits the given channel repeatedly. Success relies on the retention of large masses that view their videos, which makes them influencers and a social reference for young people. They are "role models" as followers identify with their discourse and consequently become "subscribers" (Pérez-Torres et al., 2018: 67). This is a tendency shown in the work of Gewerc and Alonso-Ferreiro (2019), where the power of these models is evident: the YouTubers become someone to please by means of subscribing to their channel in response to their demands.

The admiration generated by YouTubers (García-Jiménez et al., 2016) makes them position themselves as mediators of media consumption, guiding the preferences of pre-teens and teens, leading them to videos of other YouTubers with the power of their discourse. Like this, they become mediators of the construction of the individual who socialises on online platforms such as YouTube.

The construction of the subject is produced via the mediation of others (Foucault, 1994), who act: 1) as examples of behaviour; hence the relevance of the visibility of female referents; 2) as figures of empowerment, contributing to the transmission of knowledge and principles; including jargon, values, and other elements identified in communication that contribute to configuring the community and the identity of the group; and 3) in the spaces of interaction, within which their exposure can cause unease. The same schemes that constitute social reality are transferred to these new spaces of online participation. There are different forms of appropriating and occupying spaces, in such a way that the dialogue is conditioned by the individuals who form this space. Namely, a group that assumes the responsibility of having the voice, while others occupy the periphery without exposure. In this respect, men subscribe to the request of the referent figure (Gewerc & Alonso-Ferreiro, 2019); while women, feeling unrepresented, become passive participants, observers that don’t leave comments nor generate content of any type.

Media education for full and critical participation

In this media setting, where YouTube occupies a prominent place among young people, critical media literacy is necessary (Buckingham, 2007; French et al., 2019; Gutiérrez, 2008; Jenkins, 2009; Jenkins et al., 2015) as an essential competence for the 21st century.

Dussel (2017) points out the vital necessity to tackle what circulates in YouTube and other networks in school, analysing with students how to produce and how to look at images, to contribute to the expansion of their worlds, and to help them become accustomed to other images and production methods. To this end, it is essential to address this learning process from a critical, reflexive and creative approach (Gutiérrez, 2008), breaking and confronting gender stereotypes. It is about preparing young people to read and write media (Buckingham, 2007), addressing the dual role of producers and consumers (Jenkins, 2009), so that they actively participate in digital culture and take advantage of the potential of YouTube as a space for social (Gutiérrez, 2008) and gender equality.

It is necessary to deepen their experiences with media outside the classroom (Buckingham, 2007; Dussel, 2017) and to understand critically and profoundly how forms of media work, how they communicate and intervene in relationships, how they represent the world and how they are produced and used (Buckingham, 2019). In this context, we should consider the participatory gap (Jenkins, 2009; Robles-Morales et al., 2016), which refers to inequality in access to media and opportunities to participate fully. Examining this gap from a gender perspective requires concern for the situation of women in this scenario, as well as consideration of their way of behaving on the web — a behaviour that implies ethical values.

The ethical challenge (Jenkins, 2009) focuses on the importance of the school context and refers to the learning of ethical norms during the online experience. Furthermore, it relates to issues of netiquette and digital identity management, since practices with Web 2.0 technologies, even if they are for fun and entertainment purposes, are important in the construction of one's own identity (Jenkins, 2010). In the education of prosumers, users who participate and give feedback on the web, it is fundamental to instill respect and empathy, topics that are often forgotten although research reveals them as factors related to the low participation of women (Macharia, 2015). These topics must be taken into consideration to create democratic participation spaces.


This project proposes a quantitative study employing a descriptive method that uses social network analytics (Thelwall, 2018) to investigate and illustrate the presence of women in the participation spaces generated on YouTube. For this purpose, the study of content-generating users with a greater reach (in the Spanish context) and of the audiences that subscribe to their channels was addressed from a gender perspective.

Sampling unit: YouTubers’ profiles

For data collection, a list of the 50 largest accounts was created. For this purpose, we selected those accounts that as of August 1st, 2019, housed the largest number of total views in their videos and the highest number of subscribers (Burgess & Green, 2018). This list was delimited in accordance with the situation and age group in which this study is framed: adolescents in the Spanish context. The following criteria were considered: the YouTube channel a) is located in Spain; b) generates content in one of the official languages of the country; c) does not generate children's content; d) shows an Instagram account — the second most used platform in the age group (Mascheroni & Ófalsson, 2014) — in the YouTube profile; e) shared its last video in the last 5 months; and f) received its last comment in the last 5 months.

The resulting list was ordered according to a variable generated from the normalisation of the variables "views" and "subscribers", establishing a ranking that considers the reach capacity of the 50 accounts that comprise the list. Furthermore, the genre expressed by the creators and the type of content generated was added. Table 1 includes the ten accounts with the greatest reach. The complete list made it possible to ascertain the presence of women among the main referents, as well as their scope and content.

Sampling unit: audience profiles

YouTube’s data policy makes it possible to see the channels followed by an account, but not the subscribers of that given channel. Due to this policy, Instagram was the platform chosen for the exploration of the audience. Furthermore, Van-Dijck (2016) highlights the importance of cross-platform interconnection (media convergence) as it allows for greater visibility and maximum presence. In this respect, Instagram is the platform preferred by young people after YouTube (Mascheroni & Ófalsson, 2014), which justifies our methodological decision.

Using the free software R, a loop was created, represented in Figure 1 (https://bit.ly/2OeUF8e), which performs a Web Scraping procedure with each of the accounts that constitute the initial list. This technique allowed us to extract a large amount of information from websites, obtaining public data accessible to anyone.

For this study, a process was programmed to go to the Instagram profiles subject to examination, click on the “followers” tab and extract the names of between 40,000 and 50,000 users, through a simple indiscriminate sampling: the program randomly chose the profiles that represent the sample, whereby any profile of the audience was equally eligible.

According to the proposed objectives, we sought to determine the gender of the people who make up the YouTubers’ audience, their preferences regarding content and the genre of the channel. As Thelwall (2018) points out, this information can be inferred from the names obtained from Instagram with the Web Scraping technique, which can be compared to the Continuous Register Statistic as of January 1, 2017, published by the National Statistics Institute of Spain. Here, the names registered in the Spanish context and the prevalence of the name concerning each gender can be found. If the frequency of a name in one of the groups (feminine or masculine) was one hundred times greater than the frequency associated with the same name in the opposite group, the association of said name with the gender of the first group was accepted. If this was not the case, the name was categorised as neutral and was eliminated because it did not provide relevant information.

After comparing the sample with the Register, an average of 15,000 users was obtained for each profile, defined as female or male; whereby a confidence level of 95% and an error of 1% is achieved with a sample size of 9,603 for the account with the greatest number of followers. A total sample of n=904,939 analysed names was collected (Table 1).


The Web-based Scraping methodology facilitates access to and management of an enormous amount of data. There is no bias in its selection as it is performed under a simple random sampling, which provides a comprehensive view of the YouTube audience in terms of gender. This technique, based on the analysis of social networks using the software R, finds its limitations in the impossibility of locating profiles not easily identifiable with feminine or masculine names, an issue to which the assumption of a binary perspective of gender is added. Besides, we do assume a certain bias as it resorts to a second social network (Instagram) to recover the subscribers of each YouTube channel.


Female YouTubers: presence of women who create and produce content

Among the 50 YouTube accounts considered to have the largest reach, there are a total of 4 female YouTubers, compared to 41 male YouTubers. Furthermore, there are 2 accounts shared by a woman and a man, and 3 accounts linked to mass-media and institutions such as football clubs.

The first five accounts correspond to VEGETTA777, elrubiusOMG, TheWillyrex, Willyrex and AuronPlay, channels whose main content is related to video games or entertainment linked to humour. All of them are managed by men, who appear as the creators and image of their content. In the case of VEGETTA777 and elrubiusOMG, their reach positions them on the international scene (Figure 2: https://bit.ly/2rxQrzO).

The first woman to appear in the ranking is ExpCaseros (ranked 13th), an account shared by a woman and a man who generate entertainment linked to experiments and do-it-yourself (DIY) tutorials. The next woman to appear on the list (ranked 27th) does so under this same condition, together with a man, generating entertainment content and draw-my-life videos (TikTak Draw).


The female creators of content who manage to be in the list of 50 accounts represent 12.2% of the total number of visible physical persons. Moreover, none of them manages to reach the top 10, meaning that the female gender has a notoriously inferior reach. These women create content such as entertainment, sports, beauty or gameplay, while male creators create content that includes entertainment, sports or gameplay.

It is evident that the contents related to female gender stereotypes are less valued among those that have a greater reach, as they rank starting from the 28th position; while content stereotyped as "masculine" or "neutral" holds the most privileged positions, although these positions are not occupied by the women who generate it.

In the overall list, gameplay is the type of content best valued by the YouTube community, produced by 54% of the accounts. Gameplay is also the best positioned, as it is the primary category of content in 8 of the top 10 accounts with the greatest reach. In contrast, women gamers (2) rank 33rd and 49th.

When it comes to sport-related content, only one female YouTuber enters the list and is the best-positioned woman among the women that generate content alone (28th position). She mainly shares content concerning aerobic exercises, the titles of which frequently include the word “adelgazar” (lose weight). The gender stereotype is also reproduced in the content linked to beauty, with no male YouTubers sharing this type of content and with only one woman entering the list and ranking 48th.

The relationship between gender and the audiences of the most far-reaching YouTube channels

The audience builds its gender identity through mediation with female and male YouTubers. It is noteworthy, therefore, that the sample studied is made up of 74.1% men and 25.9% women. In this regard, Figure 3 (https://bit.ly/35HxrNX) illustrates the breakdown of YouTube channels according to the gender of their audience, taking the content into account. Most YouTubers have a mostly male audience.

There are few accounts (6) of greater reach that gather an audience of more than 50% women, and, among these, the most valued content is created by an account with sports tutorials focused on losing weight (Gymvirtual) and by an account focused on makeup (LizyP). This means that the content that impacts the female audience to a greater extent is subject to the reproduction and perpetuation of beauty norms. Music content is the next most important content category, as the official account of Adexe and Nau, two Spanish male teenage singers, has an audience of 82% women.

The accounts closest to reaching a gender balance among their audiences are the shared accounts managed by a woman and a man, followed by mass-media channels, whose content is plural and cannot be linked to a stereotype.

As can be seen in Figure 3, most of the YouTube channels examined in this study have a largely male audience that consumes content related to gameplay and entertainment. Female creators of this type of content also have mostly male audiences, but they are lower down the list due to the domination of male creators. Patty Dragona has 14% female subscribers and Sarinha 15%.

Women who create content linked to the female gender stereotype get greater recognition from the female audience. The content and its link to gender stereotypes present a greater influence than the gender expressed by the content creator. However, the channels that obtain the most balanced audiences in terms of women and men are those in which the two genders are expressed by the creators, that is to say, accounts shared between a man and a woman who create neutral content in terms of gender stereotypes (entertainment linked to humour).

Discussion and conclusions

This study, which investigates the role of women in the new media scene, on the main entertainment platform (YouTube), reveals the under-representation of female figures in the public sphere on the network, continuing the trend of what happens in traditional media, as revealed by UNESCO's work on the subject (Byerly, 2011; French et al., 2019; Macharia, 2015). In the case of this social media platform, where the audience is not necessarily passive but has spaces for participation, female representation is linked to gender stereotypes and low participation.

The differentiated appropriation of the environment suggests the transfer of power relations from the physical space to the YouTube platform: a leisure space has been generated on the web, similar to the schoolyard, a recreational space and time not planned by the school organisation. The familiar image illustrates boys and male pre-teens on the football field, an hegemonic sport in the Spanish context and one which is usually situated in a central and privileged space on the playground. This is where the action takes place, where they interact and learn the rules and codes that they will transfer to other play spaces. Girls and female pre-teens, as explained in other studies (Cantó & Ruiz, 2005; Martínez-García, 2018; Castillo-Rodríguez et al., 2018), tend to occupy the periphery of playgrounds, standing around the football field while chatting, playing in small groups or amusing themselves as spectators. In these spaces they acquire learnings of all kinds, very valuable learnings. The most relevant of these being the place that they belong to and the place that the knowledge they have acquired in this space occupies.

Those YouTubers at the core of the media scene, at the top of the ranking, are men who share stereotyped content: gameplay, humour, and football. They are also the most recognised by pre-teens, as pointed out in the research of Aran-Ramspott et al. (2018), as they occupy a visible place that makes them referents. While there are women creators of all kinds of content, the female audience assembles around those who publish content linked to gender stereotypes, such as beauty or staying in shape. Women who produce content associated with videogames have male followers, although they are overshadowed by their male counterparts, who dominate the platform (Döring & Mohseni, 2018). This is because pre-teens who follow this type of content prefer a male referent, as proven in research conducted by Gewerc and Alonso-Ferreiro (2019). Their findings show that when given the opportunity to follow Sarinha's profile, from which the pre-teens have learned some tricks, they decide to follow Luh, recommended by the female YouTuber herself in her videos.

The appropriation of the woman's environment, circumscribed in the patriarchal system of imaginaries associated with prestige, reason and power (Ficoseco, 2016), translates into its visible absence as a potential referent and mediator for the construction of the subject. This impacts 1) on the given example, shaped to a greater extent by men; 2) on the transmission of knowledge, content, principles and values, among which women are not represented; and 3) on free, active and critical participation on the platform.

No physical barrier prevents young women from occupying the symbolic central point of this space and, nevertheless, they continue to be relegated to the peripheries. Through socialisation and education, an ideal is fostered that situates the subjects in the available schemes of cultural intelligibility (Lacan, 1977), thus determining the norms that govern their behaviour (Butler, 2010). The evidence highlights a peripheral presence of female participation. Furthermore, the presence of the more successful women is less decisive and influential compared to their male counterparts. Even when the content generated by these women corresponds to their gender stereotypes, they find themselves in less privileged positions, as the knowledge and interests they disseminate are attributed less value.

Whilst the figures illustrate that the majority of the audience is male, among all the accounts in the ranking there is indeed a broad audience made up of women, who represent the majority, or minority, depending on the content. In this regard, major educational concerns emerge. Namely, the democratic participation of women, not conditioned by stereotypes; the devaluation of content linked to the female stereotype; and the consumption of content created by and for men that mediates the construction of masculinity in their majority audiences, and in the same way mediates the subjectivation of female pre-teens.

Against this backdrop, media education presents itself as an opportunity, and it is, therefore, essential to develop actions to fight against sexist representations and in favour of equal access and participation of women in digital media in which they have an insufficient presence (French and others, 2019). In this context, a challenge and an opportunity emerge, that is, to train in media education in order to create spaces and opportunities for equal participation. 1