Abstract
The increasing accessibility and ease of video recording, coupled with advancements in editing technologies, present significant opportunities for research within the social sciences. This study builds on the established tradition of visual data usage in ethnographic and anthropological research, particularly through the lens of video ethnography, which seeks to analyze the content of video recordings and the underlying meanings they convey. Focusing on the methodological aspects of this approach, we explore its application in researching older adults in institutional settings, specifically in relation to food practices in care homes. We aim to provide methodological insights and inspiration that emerged during our investigation into the significance of food in later life, utilizing video ethnography in these specific environments. Unlike existing comprehensive guides on video ethnography, our paper offers practical examples and inspiration for applying this methodology in institutional settings related to older adults. We examine the challenges and limitations associated with using video data for research purposes, particularly in capturing the everyday practices of clients of institutional care settings in the Czech Republic. Our research question addresses the challenges and dilemmas of employing video recordings to analyze daily practices in such environments. We identified four types of dilemmas that video-based research presents: video record legitimization, video data shooting, post-hoc technical and analytical dilemmas, and the authenticity of video records. These dilemmas essentially shape both the quality and content of the collected data and the interpretative possibilities of the analysis. We demonstrate that video material for analysis is the outcome of a dynamic process of continuous decision-making, the acquisition of which is continually legitimized and captures the pseudo-real form of common practice. Additionally, we discuss the time demands of video analysis-based research on the researchers themselves.
Introduction
The widespread availability and ease of video recording, along with the growing possibilities for easy editing, encourage its increased use in research. Social sciences, which focus on interpreting the meaning of social behaviour, traditionally use techniques based on verbal interaction or text. When visuality comes into play, it is primarily the subject of analysis rather than its means of analysis (Traue et al., 2019). In other words, it is the object of research rather than a tool, as in methods such as content analysis, frame analysis, or discourse analysis. On the other hand, there is also a tradition of using video recordings in ethnographic and anthropological research, dating back to the very beginnings of cinematography. The video ethnography approach is based on this tradition, in which the goal of the analysis is not the content of the video itself, but rather something beyond it, which the recording is intended to represent and help uncover.
Video recording serves as a tool that extends and deepens data typically obtained through observation. It complements or, in some cases, replaces traditional sources such as observation, field notes, or photographs. The advantages of using video recordings, as well as their underappreciated potential, have been well documented in the literature, along with the procedures associated with their use (Knoblauch & Schnettler, 2012; Nassauer & Legewie, 2021; Pauwels, 2011).
This paper examines the methodological aspects of applying video ethnography in a specific research context—the study of older adults living in institutional care settings. We focus on the challenges, insights, and methodological reflections that emerged from our research on the meanings of food in later life, conducted in the distinctive care home environment. Although this is a particular environment, we believe our findings are applicable more generally.
We discuss the specific insights, inspirations, and challenges that arise when employing video ethnography in an institutional context. Drawing on empirical material from our study of mealtime practices in Czech residential facilities for care-dependent older adults, we examine both the benefits and the limitations of using video recordings in research practice. The central question guiding our analysis is: What methodological challenges and constraints does the use of video recordings bring to the study of everyday practices of older adults in institutional settings?
Our contribution builds on previous work, particularly that of Rostvall & West (2005), Smets et al. (2014), and Wills et al. (2016), by focusing on the methodological challenges that arise when using video data for research purposes in institutional care settings. Specifically, we aim to provide general recommendations and highlight key organisational and ethical issues that significantly influence the final dataset.
Background
Video Ethnography as a Method
Video technology is viewed as a tool for observing and recording participants’ behaviour in their natural environment, providing a more authentic view of their daily lives and interactions (Dowrick & Biggs, 1984). The video camera allows for a much more comprehensive view that is beyond the physical and cognitive capabilities of a researcher without a camera, capturing small moments of interaction and the associated expressions, body movements, spatial arrangements, and other nonverbal signals in vivo (Nassauer & Legewie, 2021; Smets et al., 2014). It thus allows an immense amount of detail to be captured in the data, enabling analysis to focus on the function and meaning of these details within broader interactions (Grimshaw, 1982; Miles, 2006). Such depth would be difficult to achieve without video recording (Nassauer & Legewie, 2021). Thanks to video recording, researchers can repeatedly revisit the data and incorporate the time dimension into the analysis.
Video stimulates discussion of emerging themes and the various nuances of interactions among the researcher, participants, and the environment that might otherwise go unnoticed during observation (Grimshaw, 1982; Wills et al., 2016). Recording and watching the video together can also involve the research subjects in interpreting and discussing emerging findings, i.e. it can be used for participatory research methods (Blomberg & Karasti, 2012). Moreover, video recording reduces researchers’ dependence on field notes and memory (Smets et al., 2014) and can be effectively used in research involving individuals with reduced communication abilities (Schneider et al., 2019).
However, video recordings and photographs are interpretations in themselves, specific versions of events arising from the joint efforts of the creator, the actors depicted, and the viewer (Heath et al., 2010). Video is a constructed image of a given situation, and a naive idea of depicting reality is not fruitful here. It is essential to remember that video recordings capture only a tiny piece of reality (Caldwell & Atwal, 2005). Creating an image involves selecting shots, choosing the timing, and working with light, sound, and visuals, all of which contribute to a particular segment of the situation. The composition of this situation is then formed by the interaction between technology, the crew, and the participants, creating a “videoactive context” together (Shrum & Scott, 2017). Therefore, video analysis is associated with the risk of misunderstanding when context is analysed without sufficient cultural insight (Smets et al., 2014).
The meaning of such material is continuously shaped and re-evaluated through multiple layers of interpretation by the various actors involved, as well as by other viewers — much like other qualitative data sources (Gibson, 2005). Consequently, visual material becomes research data only once it is subjected to this layered process of interpretation and analytical examination (Meah & Jackson, 2013).
Grimshaw (1982) emphasises that decisions about what to record must be based on theoretical principles and an understanding of the information relevant for future analysis. In other words, for records to be valuable and rich for research purposes, decisions about their content must be carefully considered and focused. This approach also shifts the focus of representativeness: rather than adhering to classical notions of sample and population, qualitative research prioritises capturing multiple facets of the phenomenon, moving away from positivist claims of objectivity (Heath et al., 2023).
However, video and audio recordings differ from the mediation of social phenomena through verbal interactions, as they often exhibit a looser relationship between the recorded material and its supposed meaning. In an interview, for instance, we generally assume that participants’ words accurately convey their intended meanings regarding the topics under discussion. While interviews are interactional and allow for various forms of stylisation, they remain a relatively direct form of communication. Video recordings, by contrast, frequently capture actions without accompanying communication, leaving meaning interpretation primarily in the researcher’s hands. This characteristic can be both an advantage and a limitation. On the one hand, technologically assisted observation enables researchers to uncover patterns that actors may not be aware of, may not consider significant, or may not wish to discuss. On the other hand, it carries the risk of misinterpreting behaviour. Nevertheless, this risk can be substantially mitigated by combining multiple research methods (Cipriani & Del Re, 2012).
By its very nature, visual data analysis fulfils critical criteria for scientific research: it increases the reliability and validity of findings, as the same data can be analysed by multiple researchers (Caldwell & Atwal, 2005; Miles, 2006; Nassauer & Legewie, 2021). It allows not only retrospective monitoring and revision of the findings, but also ongoing modification and refinement. Once a set of image material has been created, it remains available for further analysis or comparison, which opens the door to new results and alternative interpretations. These may even contradict the original conclusions and offer a new perspective from researchers who approach the material from different angles (Cipriani & Del Re, 2012). The possibility of repeated viewing increases the reliability of the results, as the recording can be viewed by more researchers, including those who were not present at the time (Nassauer & Legewie, 2021; Smets et al., 2014) and, in principle, opens up possibilities for secondary analysis. Both methods, i.e., repeated viewing of the same footage by the same person and viewing of the particular footage by different people, are forms of data triangulation (Rosenstein, 2002).
By integrating video data with other methods and adopting a mixed or multi-method approach, researchers can achieve more robust and reliable findings (Cipriani & Del Re, 2012; Denzin, 1978). In practice, video recordings are often complemented by extensive fieldwork, which allows the researcher to develop an in-depth understanding of the environment, including aspects that cannot be captured solely on video (Ban’kovskaya, 2016). The verbal and nonverbal expressions of participants are closely linked to material elements of the environment, such as objects and artefacts, which complement interactions and actively participate in the course of the situation (ibid.).
Video recordings appear very useful for researchers, especially when combined with other data types (Ban’kovskaya, 2016; Cipriani & Del Re, 2012). Still, their use is associated with numerous challenges, and necessary, expected, and unexpected decisions (Luff & Heath, 2012; Pink, 2007). It is precisely these challenges that we encountered during the collection and analysis of video data that we address in this paper.
Video Ethnography in the Context of Food
In this paper, we use food as an illustrative example of a situation particularly suited to video ethnography, as it represents an everyday routine activity that is closely intertwined with numerous other institutional practices and carries a variety of social and cultural meanings. Food is a complex issue involving practical decisions (what to eat and when) and emotional and sensory experiences linked to context and memory (Ashley et al., 2004). Eating habits are firmly rooted in social relationships and processes, which makes it challenging to study them through reflection or description in a unimodal form (Power, 2003). People often perform everyday activities automatically and unconsciously, based on “practical orientation” rather than a specific, consciously describable plan or strategy (Bourdieu, 1977; Goffman, 1961; Wiersma & Dupuis, 2010).
Given this unconscious, intuitive behaviour, traditional methods of interviews and observation may be less effective in capturing these aspects of life (Sweetman, 2009). In this regard, visual research methods such as photography or video offer new possibilities: they can capture the sensory, material, and often irrational aspects of everyday life, providing deeper insight into the role that food plays in society (Power, 2003). By employing video ethnography, we can capture the specific rituals and procedures associated with serving and consuming meals in an institutional setting.
Observing these practices on video allows us to see how they contribute to, or undermine, a sense of community within the institution, as well as how they impact experiences of loneliness, autonomy, dignity, and personal integrity. Beyond the immediate act of eating, the visual record reveals the multi-layered and often subtle dynamics (Knoblauch & Schnettler, 2012; Wills et al., 2016) of relationships between staff and residents, which in turn can influence not only the quality of meals but also the overall satisfaction and well-being of older adults in care settings. While dining spaces are, except for private rooms, technically semi-public, the act of eating remains a deeply personal and sensitive experience.
Methodology
Our paper is based on experience with video analysis gained during data collection for our research project, “Meanings of food intake in old age” (Czech Science Foundation Grant No. GA23-06348S). As part of this project, we aimed to document eating practices in care homes and memory care units in the Czech Republic, which serve as primary residential facilities for care-dependent older adults. We sought to capture the dynamics of the relationship between the facilities’ practices and the clients’ responses to them. In 2024, we therefore visited six different facilities in the Czech Republic.
These facilities were selected to capture the diversity of institutional settings in which care-dependent older adults live. The selection included facilities located in both urban and rural areas, differing in size, organisational structure, and the level of dependency among residents. The aim was to enable a comparative, multi-case perspective that would reveal both shared and contrasting features of mealtime organisation and interactional dynamics across different contexts.
We interviewed clients and staff in each facility and spent several hours observing the dining rooms and their surroundings. Additionally, we recorded 65 hours of video footage and took over 190 photographs. We filmed in three types of spaces, as agreed with the clients and the care home’s management. The central part of the filming took place in the communal dining room. The second data source was the small dining rooms on wards, typically intended for clients with a higher degree of disability and the associated greater need for assistance with eating. The third location for filming was the clients’ rooms at mealtimes. This data constitutes the smallest part of our records.
We focused on the entire dining process, from preparing the dining room and tables (including cleaning and setting the tables) to the arrival of clients, serving and consuming food, and their departure from the dining room, and the subsequent cleaning of the tables. Typically, we recorded over several days, capturing each day’s meal multiple times. Due to the risk of malfunctioning recordings, some situations were captured using multiple cameras to obtain detailed footage, which significantly increased the volume of generated data.
We edited the recordings using Wondershare Filmora 14 software to create coherent sequences from the individual videos, while removing redundant passages that contained no relevant information and ensuring the rights of persons who refused to be filmed. The videos were then analysed in ATLAS. ti, which allows analytical codes to be assigned to particular sections of the videos. Along with coding the videos, we also coded and analysed the interviews we did. The knowledge we gained from the initial coding of footage and interviews helped us prepare for our visit to the next care home.
The analytical process followed an inductive approach. Using open coding, we identified recurring patterns and categories emerging directly from the visual material, focusing on the organisation of meals, staff–resident interactions, and residents’ expressions of autonomy or dependence. As the analysis progressed, these initial codes were refined, merged, or divided based on repeated viewing and comparison across different situations and facilities. To ensure analytical rigour, selected video sequences were independently reviewed by multiple researchers, who discussed their interpretations and reached consensus on key themes. This iterative process helped us identify both common and contrasting features across the cases. All these parts of the research process involved a series of decisions that influenced the final data and its analysis to varying degrees. In the following section, we will focus on the dilemmas we faced during this part of the research process.
Our research aimed to capture not only the act of eating itself but also the interactions and moments of commensality that occur between bites and courses, during pauses, and immediately before and after meals. We did not focus solely on the moment when a person puts food in their mouth, but also on how they prepare their food, how and what cutlery they use (whether they eat with cutlery, only a spoon, a fork, etc.) and how they adapt the served food to their tastes, needs, and physical limitations, such as poor dental health, shaky hands, or swallowing difficulties. Our research also included etiquette and aesthetics associated with dining. Without video, examining all these aspects carefully and in detail would not have been possible. The recording allows us to notice details that significantly impact the dining experience, details that are also evident in overall well-being and in the relationship to the institution as a place of dignified living.
Video recordings enable detailed observation of routines and interactions, yet they are limited in capturing residents’ and staff’s subjective experiences, motivations, and interpretations. To address this limitation, we complemented our video ethnography with semi-structured, qualitative interviews with both residents and facility staff, providing participants the opportunity to reflect on and explain their practices, preferences, and experiences. These interviews were conducted without video recording to avoid intrusiveness and to minimise any influence on participants’ responses. By combining these different types of data, we aimed to develop a more comprehensive understanding of everyday practices, particularly those whose significance could not be fully discerned from the video recordings alone.
Food Provision in Czech Institutional Care
To better understand the methodological challenges discussed in this article, it is necessary to situate our video ethnography within the institutional context in which it was conducted. In the Czech Republic, care homes and homes with memory units are the primary providers of residential social care for older adults who are care-dependent. Meals are legally defined as an integral part of the care provided to residents and are therefore strictly organised within an institutional framework. The organisation, composition, and serving of meals are not regulated by law. There are price limits and total volumes for specific food groups. Additionally, strict hygiene regulations govern food handling in these facilities.
The daily meal schedule is fixed, with food typically served five or six times a day at regular intervals. Most facilities operate a central kitchen that prepares meals for all residents according to a pre-set menu, while others rely on external catering services. They often include various diet types, such as a diabetic diet. The degree of choice varies across institutions—some offer residents a selection between two main courses for lunch or dinner, while others provide a single fixed option. Residents eat in a central dining hall, in smaller dining areas on wards, or in their own rooms, depending on the facility’s layout. A wide range of staff members are involved in providing meals—from cooks and dieticians to care workers who assist some residents during eating.
Results
Dilemmas of Video Record Legitimisation
The use of video techniques in research offers many advantages and significant challenges, particularly regarding ethical issues and legitimising the study to potential participants. In this context, Grimshaw (1982) discusses the difficulties in obtaining preliminary consent to filming and subsequent permission to use the material. Many participants may be (often justifiably) cautious, suspicious, or dismissive, leading them to refuse permission for video recording or to decline the use of recorded footage. Refusal to participate poses the risk that researchers may not be able to utilise analytically valuable data; therefore, these factors must be thoroughly considered when planning research. For more information on ethical issues in conducting research with a “vulnerable” population, including the fact that it involves not only caring for vulnerable home clients but also the researchers themselves or the university they represent, see von Benzon & van Blerk (2017).
The ethical aspects of our project were discussed and approved by the Research Ethics Committee at Masaryk University (approval number: EKV-2022-014). Given that our participants belong to a vulnerable population, we decided that no parts of the video material would be published or presented—all recordings are used exclusively for analytical purposes by research team members. We hadn’t created any documentary films or other visual outputs from the research, nor had we published any information that could identify specific participating facilities. Filming of individual participants was always carried out only with their expressed consent. We fully respected refusals to participate and resolved the situation according to the participant’s will, primarily by appropriately placing and positioning cameras so they did not record individuals who had not given their consent, and eventually by stopping the recording or deleting part of it.
Explaining the objectives of our research and justifying the recording to potential participants was a considerable challenge, especially since many considered eating to be too mundane an activity to document. Some homes’ clients had cameras connected to television broadcasts or to control and surveillance systems. They feared we would use the video recording for television broadcasts or show it to the home’s management, who would then punish individuals who had “done something wrong”. On the other hand, for many participants, our research was a welcome change of pace to their daily routine, and the awareness that they were contributing to scientific knowledge gave them a sense of purpose and usefulness.
In addition to the research participants, other actors appear in the videos, whose roles are essential (such as care home staff) and whose inclusion is desirable, as well as individuals who happened to be present during filming. We did not primarily focus on filming the staff, but they appear in the video recordings and participate to varying degrees. While we, as researchers, negotiated the clients’ consent to participate in the research, the facility’s management agreed on the situation with the staff. During our first visit to the field, we explained the research’s purpose and the course of our stay at the facility in detail to the staff. The staff were generally cooperative and supportive of our research, though some expressed concern that the facility’s management could misuse our recordings to evaluate their work. For this reason, we repeatedly assured them that only our research team would have access to the data and that our goal was not to evaluate their work. Similarly, we repeatedly assured them that they could refuse to participate. We explicitly sought the consent of individuals who happened to enter the frame and, if necessary, adjusted the camera position or blurred their identity in the recording.
One challenge that cannot be predicted in advance is the appearance of unexpected people in the shot. These included both wider staff and family members. Their information about the research was often fragmentary, and we did not always have the opportunity to address their appearance in the shot or explain the situation to them.
Entering the field with a camera is more challenging to legitimise in front of participants than mere observation or interviews. The camera significantly disrupts the environment being studied (Heacock et al., 1996; Satyshur, 2016). The camera’s presence raises doubts about what will happen with the recorded material, whether it will be misused to control staff or discredit clients. These concerns were shared by clients and staff, even though most facilities’ interiors are routinely equipped with CCTV cameras.
Dilemmas of Video Data Shooting
When documenting everyday practices, methodological decisions about recording methods are crucial. Their choice has a fundamental influence on the quality and nature of the data obtained. This chapter provides a detailed description of decision-making processes, including the selection of recording type and recording frequency, camera activation and deactivation, and the scope of shots. These decisions were guided not only by methodological and technical considerations but also by the desire to minimise the encroachment on the facility’s usual routine and to respect the ideas and needs of the home’s management and their clients.
We made several decisions that affected the nature of the data during preparation and recording. It was necessary to decide what type of recording we wanted, how many times we would record the same event (once or repeatedly), when to turn the cameras on and off, and how wide a shot we wanted. The goal was to set up the recording process so that it would disrupt the usual routine as little as possible – see section “Dilemmas of Video Records’ Authenticity” for more details. As a result of this effort and initial coding of recorded material, we captured each case study slightly differently in terms of the number of repeated recordings of the same event (e.g., lunch in the central dining room) and the length of our stay at the facility. We always negotiated our particular approach with the care home’s management and sought to respect its ideas and needs. We also tried to balance the recording form with the number of recordings we would produce for our analyses. All these decisions influenced the form of the obtained data.
Video recording seems to save time and increase the researchers’ efficiency during data collection. It is possible to turn on more cameras and thus “observe” events in several places at once, or from different parts of the room to capture various details. This multiplication of “eyes” significantly changes the nature of ethnographically oriented research. It enables concentrating the data collection period into a shorter time frame. In our case, this involved parallel filming in several dining rooms of a residential facility. Using multiple cameras did not incur additional financial costs, as we borrowed the cameras and all the necessary equipment from our workplace.
However, this initial time-saving produces a large amount of footage whose content and quality are unknown primarily to team members immediately after filming. The lack of knowledge about the data immediately after filming was also due to a desire not to influence the events in the dining hall by our presence. In some cases, team members switched on and positioned the video camera, but were not present for the duration of the recording. Indeed, the dual presence of cameras and observing researchers was distracting and harder to defend to clients. Thus, the time-saving is only apparent, and its cost is the researchers’ limited familiarity with the data immediately after obtaining it, necessitating a review before further filming.
At the beginning of the data collection process, we also had to decide whether to film residents from a greater or lesser distance. In our research context, it proved most meaningful to alternate between different shot types depending on the recording’s purpose. Medium close-ups, typically capturing part of a table with one or two residents, allowed us to observe individual actions, gestures, and interactions in more detail. In contrast, full and wide shots — showing several tables or the entire dining room — provided a broader perspective, helping us capture the overall flow of the meal, the rhythm of service, and the atmosphere in the room. Importantly, these wider shots were also less intrusive, as they maintained the residents’ privacy: facial details were often indistinct, conversations were unintelligible, and speakers could not be identified.
Over time, it became clear that combining multiple cameras and varying types of footage offered the most suitable setup for our research aims. While some cameras captured the overall dynamics of the dining room, including serving routines, clients’ movements, and general interactions, others focused on specific tables or individuals and their eating habits. This combination revealed how institutional and individual practices intersect and are negotiated in everyday interactions. The broader framing of the scene also conveyed the institutional atmosphere — for example, the recurrent regimes of haste noted by Hradcová et al. (2020).
We encountered unexpected difficulties finding a suitable location for the cameras to capture medium to close-up shots. The camera had to be positioned to minimise the risk of tripping over the tripod or otherwise obstructing the passage. However, items placed on the set table, such as decorations, menus, glasses, and the clients’ personal belongings, significantly limited the view and, from some angles, made it impossible to capture the desired details (plate, hands, etc.) with the camera. We did not interfere with placing these items because we wanted to disturb and influence the environment as little as possible. Finding a suitable camera position was therefore difficult in some situations, and we did not always capture the details as planned or expected.
To minimise participants’ awareness of being filmed and preserve the authenticity of the recorded interactions, several techniques can be employed. For example, the researcher can remain in the environment for at least 10 minutes before the recording begins, limit their movements, and match their appearance with clothing similar to that observed (Satyshur, 2016). Additionally, they can place the cameras next to a larger piece of furniture or a pole where they will blend more with the environment. When the researcher sets up the camera in advance and then moves away from the filming location, this also reduces the participants’ reactions to their presence (Heacock et al., 1996).
In addition to visual considerations, capturing sound presented its own set of challenges. Individuals outside the film industry often fail to recognise that sound recording is considerably more difficult than image recording. Video production is typically perceived as predominantly visual, with sound frequently taken for granted. Filming older adults in institutional settings is particularly challenging, as much of the scene takes place in high-traffic environments, while conversations are typically low in volume, infrequent, and sometimes poorly articulated. While recording the sound environment can be informative, capturing clear conversations often proves too complex relative to its potential benefits. In our study, we therefore eventually abandoned this component. We recommend carefully considering the role of sound recording before and during filming.
Standard camera microphones are often insufficient, and a possible solution is to connect an external microphone to the camera. Without going into the technical details that can be found elsewhere (Margolis & Pauwels, 2011; Paulus et al., 2014; Shrum & Scott, 2017), it is worth mentioning the organizational implications of the technical options: technically speaking, there are two different approaches available - either (1) from a greater distance to capture specific sound sources with narrowly directional microphones, or (2) to use local microphones placed as close as possible to the sound source (e.g., wireless lavalier microphones, or microphones placed on clothing, on a table, or even use a voice recorder). The first approach is less conspicuous - the technique does not directly enter the scene space. However, it can still record a relatively large number of disturbances that render conversations unintelligible (especially given the requirements of machine transcription). From the perspective of intelligibility (and for transcription requirements), lavalier microphones are preferable. Still, they do result in feeling “bugged” - the microphone is clipped to the lapel. A compromise is a microphone placed on a table or within a metre of the speaker, but even here, the device’s position is visible at all times.
It is also worth noting that the challenges of audio editing are far from over. It is essential to note that we typically shoot the same scene with multiple cameras and microphones. The result is a multi-track video and audio recording that needs to be synchronised in time, and a mix of intensity or other adjustments (equalisation, compression, etc.) needs to be decided. For this topic, one can refer to the practical sections of the video ethnography manuals (Shrum & Scott, 2017) or for details of multi-track video and audio processing (Kern, 2008; Rose, 2015; Steiglitz, 2001).
A considerable amount of time must be devoted to the organisationally rigorous and systematic labelling and organisation of the recordings so that it is exactly clear which event each recording captures. The first prerequisite for their organisation is to set the correct date and time on all instruments and to check these settings regularly. Another is the systematic marking of all cameras and the consistent recording of their location. It is worthwhile to draw diagrams of the rooms and camera locations. Although these steps may seem trivial, when using multiple cameras to film several shots and scenes, organising the recordings is highly complex. At the same time, it is a desirable setting that elevates the usability of video ethnography beyond that of a single camera.
Post-Hoc Technical and Analytical Dilemmas
All three of the most widely used platforms for qualitative data analysis (NVivo, MaxQDA, and Atlas. ti) process video by default—that is, they allow you to load multimedia files, play them back, and encode individual time periods, much as you would with text. At the same time, the ability of the packages mentioned above to produce usable transcriptions of spoken words has been improving with the development of AI.
However, the brief information provided in the product documentation proved insufficient in practice, and several limitations became apparent during the video analysis. Video coding was only available in the desktop version of the software, making real-time collaboration between researchers highly complicated or impossible. The reasons were twofold: first, as of 2025, desktop versions did not support real-time online cooperation; second, the video files were so large that online access caused significant delays. Consequently, the project had to be created and edited on the desktop version and shared as a bundle containing the media files (or linking to them), with team members editing the project sequentially. It is also worth noting that video files were often large and required significant computer capacity. Some software did not handle them transparently — for example, ATLAS. ti copied them in an encrypted format to a hidden working directory. The requirements for confidentiality and non-alienability of the data conflicted with the ability to manage disk space, as ATLAS. ti did not delete these files even after the project was cancelled, resulting in tens of gigabytes of “dead space” on the disk.
There are also specialised tools for video annotation that allow for elegant, straightforward tagging of specific segments. However, these tools generally did not support advancing to higher levels of analysis, such as coding, categorisation, or establishing relationships between codes. Notable examples included ELAN (https://archive.mpi.nl/tla/elan) and Transana (https://www.transana.com). From our experience, it was also essential to view synchronised footage from multiple cameras simultaneously (multi-track video editing), but no analytical software available at the time provided this functionality.
The analysis could not proceed without specialised software that allowed cutting, taking notes, coding, or otherwise marking specific segments of the recordings. This need required an intermediate technical step to prepare the footage for analysis, such as editing, improving sound quality, and combining individual clips. Suitable software had to be selected and mastered — in our case, we used Wondershare Filmora. This technical stage enabled us to reduce the recordings to only those segments containing relevant information, optimising both storage capacity and the research team’s time. It was also essential to anonymise the tapes within the software, for example, by blurring or removing certain parts of the footage. Our data included individuals who either did not consent to be filmed or were unable to provide consent, necessitating careful management of the recordings in accordance with ethical principles and agreed-upon data management procedures.
Overall, we collected a substantial amount of visual data, amounting to 65 hours of video footage after editing and before analysis. Understanding the practices captured in the recordings required repeated viewing, which enabled us to develop multiple layers of code for analysis. We approached coding like that used for textual data, such as interviews. The recordings were reviewed and coded sequentially, beginning with broader, more general codes and followed by a more detailed, refined layer. From these two layers of coding, we identified four key areas that guided our subsequent analysis. Once these areas were established, it was no longer necessary to examine all the footage in full, as only the segments corresponding to the selected codes were required for further processing.
However, the initial coding proved to be highly time-consuming. Analytical work with video data required significantly more effort than with text-based sources, due to the need to edit and prepare recordings before uploading them to the analytical software. Moving from a purely descriptive level to a deeper analytical level demanded a greater time investment than typical interview analysis. Consequently, the analytical process was always preceded by the resolution of numerous technical issues, which consumed both the team’s time and cognitive resources. This requirement also highlighted the need for appropriate technical expertise within the research group. In addition, these technical challenges created a time lag between the recordings’ acquisition and their subsequent analysis.
Dilemmas of Video Records’ Authenticity
A lunch in a dining room with cameras and seated researchers is inevitably different from a typical lunch. However, the key question is not how these situations differ, but rather what can be learned from the recorded situation about the broader phenomenon under investigation. Participants naturally style their behaviour in anticipation of expectations and to address various psychological needs, such as preserving dignity, “saving face”, or presenting themselves positively. While the degree of such stylisation is difficult to estimate and cannot be entirely avoided, in the long term, conscious stylisation tends to diminish, as our experience also showed.
Although we attempted to minimise the cameras’ intrusiveness, participants generally became accustomed to filming over time. However, the stylisation of both clients and staff remained considerable, particularly at the beginning of the recordings. Participants naturally adjusted their behaviour in ways they considered socially desirable, performing their social roles in line with their expectations of what mealtime should look like. For staff, this often involved enacting their concept of ‘proper’ care, for example, through careful handling of napkins and frequent inquiries about residents’ satisfaction.
In particular, employees repeatedly reminded each other that the cameras were on, and some clients did the same. Many initially thought the footage might be shown on television, so we consistently explained the research’s purpose and how the recordings would be used throughout our time at the institution. To ensure their ongoing consent, we repeatedly checked with all participants whether they were still comfortable being filmed, even though written consent had been obtained in advance. A common joke among participants was that, had they known about the filming in advance, they would have gone to the hairdresser’s or dressed differently. Although these remarks were made in jest, they highlight that participants sometimes struggled to remember study details. We repeatedly sought confirmation of the participants’ consent to filming. Similarly, we repeatedly informed participants of the research’s purpose and how the recordings would be handled, as needed.
These patterns were also strongly shaped by cultural norms and expectations. In the Czech context, older adults often demonstrate a high level of respect for authority figures, including researchers, which can sometimes make them hesitant to refuse to participate even when they are uncertain about their involvement. Norms surrounding privacy and social propriety influenced participants’ awareness of being observed, prompting them to adjust their clothing, posture, table manners, and interactions with staff in ways they deemed appropriate. Jokes about appearances or being filmed reflect not only humour but also concern for social presentation and maintaining dignity in a semi-public institutional space.
We also noted that care home management shaped both who we could approach and what we could film. In memory care units, where many residents experience significant cognitive impairments, it helped us select suitable individuals who could participate in interviews with us. In other cases, by pre-selecting the people we would film and interview, it seemed likely that we lost some information. Another specific disruption we observed was changes in seating arrangements. Clients who did not agree to participate in the research were seated so they would not be visible, or staff allowed them to eat in their rooms, even though they were accustomed to eating in the dining room. These changes may have disrupted usual interactions between clients.
It is therefore essential to note that, like interview-based methods, video ethnography cannot rely on the naive assumption that the quality of a research tool lies in presenting reality as directly as possible without distortion. On the contrary, it is necessary to recognise that the data available through intervention methods are always the result of an interaction and, thus, are produced to some extent in response to a given situation. However, this does not mean that they are worthless - the results of the interaction represent the mental processes in which the social structures we want to investigate are realised. Specifically, the stylised form of the lunch provides us with a myriad of information about how staff want to demonstrate appropriate care and how clients want to be seen. Our goal was not to evaluate the objective correctness of these practices, but to understand how participants themselves frame the meaning of their actions — that is, the discourses underlying everyday practices. By observing how staff and clients consciously or unconsciously shape their behaviour, we gain insight into the interpretive frameworks that are central to understanding the significance and negotiation of routines within institutional and cultural contexts. The quality of the research tool should therefore be understood more in terms of its ability to spark fruitful interactions that provide accessible, relatively easy-to-interpret material.
At the same time, all the actors who were filmed gradually dropped out of the roles they had initially set, much as Erving Goffman (2021) describes. Moments of forgetting the camera revealed snippets of a more authentic performance of care and everyday life in the facility, producing relatively heterogeneous material with hermeneutic layers of varying depth. Long-term filming further supported participants’ habituation to being observed, similar to the value of extended participant observation in ethnographic research. Digital recording made this possible without incurring additional financial costs for longer filming sessions.
Nevertheless, our study encountered two significant limitations to this long-term perspective. The first was the sheer volume of footage, which required extensive processing and careful sorting. Even a preliminary review at a higher speed to make sorting decisions demanded many hours of work, and a substantial portion of the material remained for detailed analysis. These practical challenges underscore that, while long-term filming can enhance authenticity and depth, it also entails a considerable workload for the research team, both in terms of time and cognitive effort.
The second limitation concerns the impact of prolonged filming on staff and clients. In our experience, after a few days, our presence with the cameras began to feel undesirable and increasingly difficult to justify. While one or two days of filming were generally accepted, extending the recording period proved more challenging, as the topic of mealtime practices seemed too trivial for repeated observation. Initially, our presence was perceived as a pleasant enrichment of the clients’ routine; however, over time, it became tiresome and somewhat intrusive.
As a result, the multi-day approach did not yield the anticipated benefits. We assessed that extended filming increased the demands of the research, not only for us as researchers but particularly for those being filmed. To address this, we gradually reduced the recording schedule, limiting filming to twice per dish, while increasing the number of cameras to capture both the overall dynamics of the dining room and detailed interactions at selected tables. This adjustment enabled us to strike a balance between data richness and participants’ comfort and well-being.
Conclusion
Visual data analysis, including video, has a long tradition in the social sciences. The decline in production costs led to an increased use of video recording for research purposes. Using video to capture more than a single moment can significantly benefit social researchers across various fields (Nassauer & Legewie, 2021), as visuals can be as expressive as participants’ words (Shrum et al., 2005). However, because they are relatively rare, methodological issues are not addressed as often as in, for example, interviews. At the same time, it is essential not to overestimate the potential of such research and to consider the epistemological, ethical and practical constraints that may arise (Harrison, 2002). Based on our ethnographic research in a residential care home, we identified four areas of dilemmas that video-based research presents: (1) video record legitimisation, (2) choices connected to video data shooting, (3) post-hoc technical and analytical dilemmas, and (4) the authenticity of video records. These dilemmas fundamentally shape both the quality of the collected data and the interpretative possibilities of the analysis. Videos as data for analysis are thus the result of a dynamic process involving many interconnected challenges and decisions made by researchers and research participants.
Regarding the legitimisation of video recording, a key insight from our study is that obtaining consent and maintaining legitimacy are not discrete events but continuous processes. Researchers must be prepared to renegotiate consent and to repeatedly explain the research purpose to ensure participants understand the study’s objectives. This ongoing dialogue is essential for maintaining participants’ trust and comfort, particularly in institutional environments where daily routines and power asymmetries shape the research experience.
Many circumstances related to the legitimacy of video research stem from the fact that while interviews are a widespread and accepted cultural phenomenon, appearing in a video is a less familiar experience (Gobo & Mauceri, 2014). Respondents are used to conducting dialogues; they are familiar with the interview format from the media, and they often have previous experience with being interviewed, but they only gain a clearer idea of what video involves during filming (e.g., what will be in the shot, how long the filming will take, how far the microphone will reach).
Another area of dilemmas associated with the recording process concerns the delicate balance between the need for detailed data and the need to minimise intrusion. Recording the moving picture and sound multiply the burden of intrusion – This is clearly and inevitably one of the most intrusive research designs and must be understood as such, as it has significant ethical and epistemological implications. An important challenge was determining the optimal placement of the cameras, as a less-intrusive approach at a greater distance was insufficient to capture the required details. Due to the environment’s physical appearance, it was not always possible to capture the desired shot. The presence of cameras, crew, and equipment can have a twofold dynamic. Although long-term filming can, in theory, promote habituation to cameras (Heath et al., 2023), our experience suggests that prolonged presence can instead become burdensome for participants and difficult to justify to staff. Adjusting the number of filming days and using more cameras proved to be a more sustainable compromise, preserving both the richness of the visual data and participants’ well-being (c.f. Cipriani & Del Re, 2012).
Complex technical aspects, video editing and modifications, as well as extensive opportunities for combining with other fieldwork and recordings, essentially structure the meaning of the data. Technical and methodological decisions have a much greater impact on the possibilities of interpretation than we are used to, for example, in interviews. These post-hoc technical and analytical dilemmas have a fundamental effect on the interpretation of results. Therefore, from the outset of the research, it is necessary to reflect not only on the filming process but also on the context, and to use planned strategies to combine video data with other sources. Not because video is not a sufficient source in itself, but since video processing is an intersection of various sources per se (the interaction of images with our knowledge of camera placement, the time frame of filming, the role of sound, the proportion of the population under study out of the reach of the camera, etc.).
The issues of authenticity or reactivity constitute one of the central dilemmas of video-based research (Nassauer & Legewie, 2021; Pauwels, 2011). Trying to purify the records of layers of stylisation interpretively is not a fruitful strategy; instead, it pays to accept that what we see is the result of complex, multi-layered interactions among those being filmed, those filming, other actors, circumstances, and equipment. This creates material that is rich not only in fragments of everyday life but also in its depiction of the processes of conscious reenactment, interpretation, or concealment. The stylised behaviour of participants in front of the camera should not be viewed solely as a distortion of reality, but rather as an expression of how individuals construct and communicate social meanings through their actions. These performances can reveal how care, order, and everyday routines are imagined, negotiated, and displayed within institutional contexts. The researcher’s presence is inevitably part of this dynamic, shaping the field and the interactions captured on camera. A reflexive awareness of this position is therefore essential, as video-based research amplifies the visibility and mutual influence between the observer and the observed.
Video ethnography, when employed sensitively and reflexively, provides a powerful methodological lens for examining the social dynamics of everyday life within institutions. It also conveys the experience of time and space—categories that are rather implicit in other types of research. Yet its strength lies not only in the visual record itself but in the interpretative work that follows—the careful negotiation between what is seen, what is said, and what remains unseen and unspoken. Visual materials can reveal unexpected information due to their nature (Harrison, 2002). Furthermore, it is impossible to have a priori knowledge of what or who will appear in the recording (Wills et al., 2016). Therefore, complex epistemological reflexivity and adherence to ethical principles, combined with effective safety management, are crucial.
Although we agree with Grimshaw (1982) that decisions about what to record should be based on theoretical principles, our experience shows that the environment itself actively influences the filming process, both through the research participants and the physical environment, and also affects the researchers’ decisions. The resulting video data thus reflects these dynamic interactions among all actors. The resulting data captures a pseudo-reality, a specific version of events created by the dynamic interaction of all actors (Heath et al., 2010).
From a broader perspective, our methodological reflections point to the potential of video ethnography beyond care homes. The dilemmas and practices discussed here are relevant to studying everyday routines and embodied interactions in other institutional settings, such as hospitals, schools, or shelters. Video methods can capture the material, spatial, temporal, and sensory dimensions of institutional life that are often overlooked in interviews or textual analysis. At the same time, they invite researchers to engage with questions of power, privacy, and ethics in concrete and situated ways.
Footnotes
Ethical Considerations
Data collection and analysis were conducted in accordance with ethical guidelines and were approved by the Research Ethics Committee at Masaryk University (approval number: EKV-2022-014).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research was supported by the Czech Science Foundation Grant No. GA23-06348S (Meanings of food intake in old age).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
