Abstract
Mental illness in media can shape viewer’s beliefs about mental health, help-seeking, and empathic behaviors. The current study sought to investigate how mental health and substance use is depicted in popular media targeted for youth. The visual-verbal video analysis (VVVA) framework was applied to the HBO American drama television series Euphoria to understand how mental illness, substance use, and mental health service use is portrayed, and how characters respond to mental health scenes. Euphoria follows a group of high school students as they navigate adolescence, mental illness and substance use. The VVVA provides a framework for social science and medical researchers to qualitatively analyze multimodal information (e.g., text, cinematography, music and sounds, body language and facial expressions) of visual content. This commentary will briefly describe the VVVA framework, provide an overview of how the framework was applied and adapted to analyze a scene in the television series Euphoria, note similarities and differences to the original VVVA framework, and benefits and drawbacks. The VVVA framework was flexible and effective in coding various elements (e.g., body language, camera angles) in a scene in Euphoria.
Mental illness portrayed in media can play a significant role in shaping viewer’s beliefs and behaviors about mental health. Researchers have studied how portrayals of mental health in television shows impacts viewers (e.g., youth and young adults) and have found that exposure to mental health content and characters (e.g., therapists, characters struggling with mental health challenges) can influence help-seeking, empathic behaviors towards others, talking about mental health to others, and mental health symptoms (Lauricella et al., 2023; Leaune et al., 2022; Maier et al., 2014). Further, many characters in media (i.e., film, video games) with mental illness are portrayed as aggressive, unable to care for themselves, and commonly referred to by others with negative labels such as “crazy,” “mad,” and “nutty” (Ferrari, et al., 2019; Lawson & Fouts, 2004; Riles et al., 2021). These portrayals perpetuate mental health stigma that can eventually be a barrier to youth accessing mental health services (Ferrie et al., 2020). With the ubiquitous nature of media that youth are accessing, our team wanted to further investigate how mental health and substance use is portrayed in popular media targeted for youth. A noteworthy show that has resonated with millions of viewers and heavily portrays mental health and substance use is the show Euphoria. Reported as HBO’s second-most watched television show behind Game of Thrones (Maas, 2022), Euphoria is an American drama television series that follows a group of high school students as they navigate adolescence.
With the immense popularity of this show, our research team was interested in conducting a visual discourse analysis on Euphoria to understand: (1) how mental illness, and mental health service use is portrayed in Euphoria, and (2) how characters respond to mental health scenes (i.e., a panic attack, accessing mental health and substance use services). We employed a methodology that allowed us to extract and analyze multimodal information (e.g., text, cinematography, music and sounds, body language and facial expression) from Euphoria. For this purpose, we chose to apply the flexible yet structured framework of the visual-verbal video analysis (VVVA) method developed by Fazeli et al. (2023). The purpose of this commentary is to demonstrate the utility of applying this framework to a television show context. We will briefly describe the VVVA framework followed by an overview of how our team applied and adapted the steps of the VVVA to analyze one scene from episode one (E1) of Euphoria. In addition, similarities and differences to the original framework will be noted throughout to showcase the strength of the original VVVA design in addition to its versatility.
The VVVA is a method designed for social science and medical researchers (Fazeli et al., 2023). It is based on Visual Grounded Theory (Konecki, 2011) and the Multimodality theory (Kress, 2010) and provides a framework for qualitatively analyzing multiple layers of visual content (Fazeli et al., 2023). It can be applied to different types of video data (e.g., video-recorded interviews, teaching or training sessions, digital stories, testimonials, segments of movies or TV shows) to identify relationships between verbal and visual elements of the visual data. For example, Fazeli et al. (2023) applied this framework to analyze how video clips of mental health testimonies and digital narratives may engender feelings of empathy, compassion and decrease stigma in viewers (Ferrari et al., 2022). They outlined six steps to follow when applying the VVVA: “(1) Collecting, organizing, and reviewing data; (2) Transcribing verbal data; (3) Choosing units of analysis; (4) Extracting and coding data; (5) Organizing, describing and interpreting extracted data; and (6) Reporting findings” (Fazeli et al., 2023, pp. 4–5). To date we have completed steps 1–5 of the VVVA to one scene in Euphoria. We will explore how we applied and adapted these steps to our project below.
Step 1: Collecting, Organizing, and Reviewing Data
Our video datum was collected and stored on YouTube by purchasing episodes of season one of Euphoria. In line with recommendations by Fazeli et al. (2023), we first became familiar with the content by viewing the videos several times. The first and second author independently watched E1 of Euphoria and chose scenes that met the inclusion and exclusion criteria. Ten scenes met criteria, and one scene (hereafter known as the Panic Attack scene) was chosen as a starting point so the team could become familiar with this method.
Next, three coders watched episode one individually to understand the full context of the Panic Attack scene before coding the data. Although it’s encouraged to watch the video focusing on separate elements (Fazeli et al., 2023), we chose a more relaxed approach of absorbing the content of episode one instead of focusing on separate elements (e.g., spoken text, character, setting, motions and gestures). Part of the reason for this is we were still deciding which individual elements were going to be of interest to our research question and felt by ‘zooming out’ at the start and looking at the episode broadly it would be easier to see what captured our attention to study further. Additionally, television shows have multiple elements overlapping in a scene and it can be challenging to focus on each individual aspect (e.g., music, dialogue, cinematography, facial expressions). In ‘zooming in’ too closely in the beginning, one could miss the collective storylines that all elements working together are trying to display for the viewer.
Additionally, our team participated in self-reflexivity activities that included journaling and peer debriefing meetings to observe our own reactions specifically to the Panic Attack scene and the larger first episode. Self-reflexivity consisted of reflecting on thoughts, feelings, biases, and judgements that emerged when watching the visual content in order to be aware of one’s influence on the research (Probst, 2015). At this point the three coders were also introduced to analytical memo writing, a valuable qualitative research technique to record thoughts and ideas about what researchers believe is happening when viewing data, in our case our thoughts about what was happening when watching the Panic Attack scene (Birks et al., 2008).
Step 2: Transcribing Verbal Data
In this step, spoken words from the Panic Attack scene were manually transcribed by three coders to build categories (Fazeli et al., 2023), and then the group convened to discuss discrepancies within transcripts. When deciding whether to use subtitles or manually transcribe what was heard, the team chose to manually transcribe the audio being heard to ensure accuracy.
Step 3: Choosing Units of Analysis
In this step, the research team made decisions about the length of video data to analyze. Fazeli et al. (2023) state that video content can be one continuous piece or may be broken down into smaller sections of a few minutes to analyze. As discussed in step one, a unique facet of studying a television show versus a 2-minute testimonial, is having multiple elements of a scene to comb through. For example, in the Panic Attack scene there are 19 characters that could be individually studied. Since it can be overwhelming to gather detailed information about visual and verbal content from E1 of Euphoria where the length of the episode is 53 minutes, we decided to follow the recommended advice of breaking the episode into individual scenes and using these as our units of analyses instead of analyzing a full episode. This led to more digestible chunks, with the Panic Attack scene having a run time of 35 seconds. Next, the coders divided the scene into further ‘snippets’ that varied in length between 1 and 19 seconds, which helped each coder carefully break down the scene and focus on significant details within the scene.
Step 4: Extracting and Coding Data
In this step, the content from the video data is examined using extraction matrices to guide the coding of various elements of verbal and visual information in the video data (Fazeli et al., 2023). Creating extraction matrices was an evolving process based on the needs of our research question and through on-going conversations with our larger research team and the coders. For example, we had the category general scene characteristics to describe the scene we were analyzing, but it was recognized that a new category would be needed called “show characteristics” to describe details of the show as well (e.g., language of the show, release date, how the show describes the episode, run time of the episodes). The extraction matrices we used were heavily based on those suggested by Fazeli et al. (2023) with adaptations to capture more elements from the show. For example, instead of collecting multimodal characteristics all under one category, we split them up into separate categories and expanded on the ideas because the Panic Attack scene had rich detailed content in each of these areas and we wanted to analyze each of these sections more thoroughly. These separate categories included: spoken and written word characteristics, music and sound characteristics, and expressive non-verbal characteristics. We also elaborated on the visual characteristics category to include more cinematography details about secondary characters and camera movements. Other categories were created and combined with the extraction matrices developed by Fazeli et al. (2023) based on elements of the mental status examination (MSE) (e.g., appearance, behavior, mood and affect). The MSE is a clinical structured assessment of a person’s behavioral and cognitive functioning (Voss & Das, 2022) and was selected to bring a clinical lens to the analysis of the scene based on our research questions. A codebook was created including definitions of each component of the extraction matrix to keep track of the extraction matrices, and to assist coders during their data extraction.
Coding process: First, three coders independently watched the Panic Attack scene to complete each extraction matrix and code the components of that category over 1–19 second snippets. Inductive content analysis (Chinn & Kramer, 1991) suggested by Fazeli et al. (2023) was utilized to code main themes for each snippet (if applicable). Afterwards the coders convened to discuss their findings, issues related to self-reflexivity, edit the matrix for clarity or to add more elements in the matrix that were important to the scene, and create a final consensus document for each extraction matrix. The coders collaboratively reviewed every snippet, and when there were differences, they would re-watch the snippet again to reach a mutual conclusion together. The consensus process allowed the team to combine their insights and interpretations of the scene, capturing details that each coder may not have been individually recognized all into one document.
Step 5: Organizing, Describing, and Interpreting Extracted Data
After the data is gathered all the information is collected and further themes are created that link the individual strands of analysis together (Fazeli et al., 2023). With the consensus chart completed, our team reviewed the themes that emerged across each category and combined them into one chart. By doing so we could compare the themes and patterns of information gathered from each layer of visual and verbal video content we analyzed. We brought this to our larger team of global researchers to gather their insights about the data we extracted and integrated their analysis into our documents.
Step 6: Reporting Findings
Next steps will be to code all ten mental health scenes identified for the first episode of Euphoria following the format outlined in this commentary and write up the results to share for publication or reports (Fazeli et al., 2023).
Benefits and Drawbacks
The VVVA framework provides a structure to study individual components of visual data in order to comprehensively understand the sum of its parts. The VVVA framework was simple to follow and adaptable to apply to a television show. It provided helpful directions to code different types of visual data (e.g., coding body language, camera angles). The method also provided a common language for all team members to use to discuss the video data, and is a low-cost method with data extraction completed in word and excel documents. Some limitations reported by Fazeli et al. (2023) consistent with our experience is that it can be time-consuming to capture each detail. Our team also encountered technological issues when we tried to apply screen shots of the data to the excel document. Finally, reviewing a scene that is emotionally charged can be challenging for coders to review repeatedly. Having the space to debrief with team members is recommended for graphic content.
Conclusion
This commentary demonstrated the way the VVVA framework can be applied to television shows. The VVVA framework was instrumental in effectively and comprehensively coding various elements in a scene in Euphoria.
Footnotes
Acknowledgements
We’d like to thank Dr. Fazeli, Dr. Sabetti, and Dr. Ferrari for their assistance and consultation in implementing the VVVA framework to this project.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Michael Smith Health Research BC (grant number 18249).
