Abstract
Objective
To analyse how the patient's use of handheld technology in video consultations with their general practitioner affects communication and the possibilities for the delivery of quality healthcare. Focusing on the visually communicated aspects of the video consultation, we present three episodes from our recordings of eight video consultations between Danish general practitioners and patients.
Methods
Using a multimodal social semiotic framework to conduct a micro-level analysis, we present episodes from our data in which the hardware's affordance of mobility gave rise to salient events in the interactions of patients who used handheld devices to carry out their video consultations.
Results
Patients’ use of technology plays a significant role in the interactions between general practitioner and patient and is thus an important factor to consider in how practice is shaped when using handheld video consultation technology.
Conclusions
Our findings demonstrate that the mobility of handheld devices (smartphone, tablet) can be used to augment sensing and embodiment and enhance the delivery of healthcare in video consultations. However, mobility may also disrupt the interaction. As a result, possibilities for the delivery of quality healthcare lie quite literally in the patients’ hands.
Keywords
Introduction
Video consultations (VCs) are a form of telemedicine that enable health professionals and patients to communicate with each other in real time from different locations. Although there is no single, commonly accepted definition of telemedicine, it can be described as the use of technology to deliver healthcare services and information at a distance, improving access, quality and cost via electronic communications. 1 VCs are a type of video-mediated interaction, which is understood as interaction conducted in and through a specific type of technology (e.g. Skype, Teams, Zoom and the like) that enables synchronous (real-time) communication via a video link. Although VCs can be delivered asynchronously (store and forward) or by remote monitoring, 2 in this paper we focus on synchronous VCs used in general practice in Denmark. In this context, VCs are accessed via a free app, Min Læge (My Doctor), developed by MedCom, a nonprofit organisation financed and owned by The Ministry of Health, Danish Regions and Local Government Denmark. 3
Although the synchronous nature of VCs is also a feature of telephone consultations, the added dimension of video means that participants can see each other during the interaction in VCs. Human interaction involves more than just speech, and a number of non-verbal resources are available in VCs that contribute to communicative meaning, 4 such as gaze, body position and distance between participants, which are not available in telephone consultations: these resources can only be accessed visually. As such, the visual dimension of VCs provides a wealth of interactional information that is not present in the other currently available technologically mediated consultation forms (email, telephone) used for the delivery of healthcare. 5 Furthermore, the use of handheld technology, such as smartphones and tablets, to carry out VCs allows patients to move the device's camera around during the interaction, affecting both what patients choose to represent and how they choose to represent it, for instance, by moving the camera closer to or further away from themselves. This creates new potential for how interpersonal meaning, which we understand as the way in which social interactions are enacted as social relations, 6 is made.
Before the outbreak of COVID-19 in Denmark, there was a relatively low uptake of VCs. A pilot trial implemented by MedCom reported only 503 VCs undertaken by 40 practices across Denmark in the period of June 2019–January 2020; on average, each active practice in the pilot trial conducted 1.6 VCs per month. 7 However, sparked by the global outbreak of COVID-19 in 2020, the implementation of telehealth solutions has since rapidly gained pace 8 on a worldwide scale. At the same time, the ubiquity and habitual use of mobile technology in citizens’ everyday lives has reached the extent that, as Boeriis (2021) claims, the smartphone camera may even be considered part of the body (p. 10). Moreover, the affordability and portability of mobile technology means that it is available to the majority of citizens. 9 Likewise, Denmark ranks among the top five most digitised public sectors in OECD countries, 10 and 90% of Danish households own a smartphone. 11 In light of this, and not least in the midst of a pandemic, VCs appear to offer a practical solution for going to the doctor as well as offering a potential opportunity for improvement in healthcare delivery, for instance, by facilitating access.9,12,13
Correspondingly, in response to the outbreak of COVID-19, the facilitation of VCs through the already established My Doctor app was expeditiously implemented in Denmark, enabling citizens access to VCs with their general practitioner (GP) via their smartphone or tablet and therefore receive the necessary treatment and guidance from health professionals without fear of contracting the coronavirus. 14 Consequently, in the period from March to December 2020, when the majority of our data were recorded, 394,864 VCs were carried out in general practice. 15 However, although this is a significant increase compared with the pilot trial, there is still considerable potential for further implementation of the VC service in the context of Danish general practice, which indeed looks to be the case following the recent agreement regarding GPs’ working conditions. 16
In VCs, the added factor of technology to the human interaction presents both restrictions and new components. 17 Every technology has affordances, or potentials for use and meaning making 6 – including interpersonal meaning – which offer both material and social possibilities and constraints. 18 By the same token, potential affordances are accessed in different ways by different individuals, depending on their familiarity with or skill in using the technology: for instance, in VCs, patients may display creative adaption to the available affordances. 19 In this way, patients demonstrate that they are not simply passive recipients of care, they are also active participants in the processes of health care work. 20 Thus, such episodes of ‘creativity’ could be said to represent ‘non-strategic’, or spontaneous, incidents of patient empowerment.
Nevertheless, it is still not clear whether the use of handheld VC technology improves or undermines doctor–patient communication, 9 and as yet research in this area is still in its infancy.17,21 In order to better understand communication mediated by this technology, it is crucial to examine the embodied way in which the technology is integrated into the conversation. 17
Doctor–patient communication is said to be the cornerstone of medical practice,9,21 and face-to-face consultations are often seen as the ‘gold standard’.12,22 However, very little is known about the implementation of handheld VC technology in primary care or the impact VCs may have on communication practices and their outcomes in a general practice setting.12,22–24 This is significant as it has been predicted that if telemedicine affects health outcomes, it is likely to do so through changes in the way doctors and patients communicate with one another. 4 Nevertheless, there is a paucity of evidence in this field, reflecting low levels of VC usage in primary care to date.5,23,25
In this article, we address this knowledge deficit by answering the following research question: How does the patient's use of handheld VC technology affect communication in the doctor–patient interaction and affect the possibilities for the delivery of quality healthcare in VCs? To do this, we draw on our unique data corpus of eight authentic VCs, amounting to one hour in total. We follow a multimodal social semiotic framework to conduct a micro-level analysis of patient–GP communication in salient episodes from our data where patients draw on the mobility of their handheld VC technology during the interaction. Specifically, we address (1) how the patient's use of mobility when using handheld VC technology impacts the way in which interpersonal meaning is made and (2) the role the patient's use of technology plays in the delivery of healthcare. As we will show, VCs are not a ‘one-size-fits-all’ solution, and there are benefits and challenges to the delivery of healthcare via telemedicine. 8
Theoretical framework
We focus on the visually communicated aspects of VCs. To do this, we examine how patients use handheld consultation technology with their GP. Our interest lies in how the patients choose to make use (or not) of the potential mobility of the VC hardware when communicating with their GP. Our analysis is underpinned by multimodal social semiotic theory as pioneered by Hodge, Kress and van Leeuwen6,26,27 whose work provides a grammar of visual communication and of multimodal communication as a whole. As such, this work provides a theoretical framework for the analysis of not only verbal communication but also the meaning potential of other semiotic resources, such as those drawn upon in interactions using VC technology. This approach lends itself to fine-grained analyses and allows us to uncover significant insights into the visually communicated, interpersonal aspects of the VC interactions. Specifically, in this article, we examine how patients use the affordance of mobility in their handheld VC technology. Within this theoretical framework, ‘affordances’ are understood here as potentials for meaning. For example, ‘sound’ (loud or soft) is an affordance of speech; ‘bold’, ‘italic’ and ‘underline’ are affordances of writing. These affordances signify intensity and can be used, for instance, to realise the meaning of emphasis. By adhering closely to the multimodal social semiotic framework, we depart from most telemedicine research, which does not overtly refer to theory. 28
In our analysis, we examine mobility as a semiotic resource the patients draw on to make interpersonal meaning: that is, the way they move their handheld devices has meaning for how social relations are enacted during the interaction. 6 To unpack how they do this and remaining within the multimodality framework, our approach is guided by concepts of proxemics. Briefly, proxemics is a term coined by Hall 29 following his studies of space as ‘one of the basic, underlying organisational systems for all living things – particularly for people’ (p. xu). Proxemics concerns ‘the ordering of bodies in physical space and the relationships between persons in social space. … [which] forms the basis for a system of transparent signs that is fundamental to the organization of social life’ (Hodge & Kress, 26 p. 52). In other words, the nearness, or proximity, of one body to another says something about the social relations of those involved, by how close (or not) it is appropriate for another person to come. In this way, proximity can contribute to interpersonal meaning. Likewise, Kress and van Leeuwen's 27 visual system of size of frame is derived from physically experienced fields of vision, which correspond with the proxemic distances identified by Hall. Just as the physical distance from another person can indicate social relations in physical face-to-face interaction, the size of frame can also suggest social relations in visual representations: that is, relations between the viewer of the image and the participant(s) represented in the image. 27
For instance, the very close shot, which shows less than the head and shoulders of the subject equates with Hall's intimate distance and is reserved for people who have an intimate relation with each other. 27 These relations may also be indicated by non-verbal signs such as direction of gaze, and, in technologically mediated visual representations such as VCs, the perspective and angle from which the image is produced. By the same token, signs derive their meanings from social life and are therefore socially situated, and context specific, and concern the ‘relations of power and solidarity which constitute every social formation’ (Hodge & Kress, 26 p. 52). This approach provides a framework for us to examine the interpersonal communication between the visually represented participants in the VC interactions.
In addition, we build on Boeriis’ (2021) concept of the digital eye in the hand by proposing the notion of the augmented eye to describe how the patient's use of technology influences the GP's sensory perception (looking) by drawing on the affordance of mobility. The augmented eye shows how the GP's looking activities and proximity to the patient are to a large extent determined by the patient and how they use the technology, and thus combines Kress and van Leeuwen's 27 grammar of visually represented relations with Hall's 29 physical spatial relations. Using the augmented eye as our point of departure, we present detailed examples of how interpersonal meaning is made in the visually represented interactions in VCs through the way patients move their devices during the interaction.
Methods and data
Our data collection methods were discussed and devised by all authors during the first lockdown in Denmark, March 2020. At this time, physical attendance at GPs’ surgeries was not permitted, and VCs were being conducted using a variety of platforms, including My Doctor, Confrere, Skype and FaceTime. This presented challenges in finding screen recording software that was compatible with such a wide variety of platforms as well as the extensive range of devices patients were using to carry out VCs. Although not all Danish GPs use My Doctor, the app is recommended by the Danish Organisation of General Practitioners (PLO) 30 and was used by the GPs participating in this study. The My Doctor app was launched in 2019 featuring other functionalities, such as email consultations, and it was first during the pandemic that the VC function was added to the app. To collect our data, we selected rec.vc as this software is compatible with the My Doctor app and allowed the VC recordings to be made by the GPs themselves from their own computers. This meant that the data could be recorded when it was convenient for the GPs and did not require extra equipment or the presence of a researcher. The recordings were automatically stored remotely in a secure cloud. None of the participants (including the GPs) had access to the recordings at any time.
All participants provided informed consent: the GPs provided informed written consent, and the patients provided informed oral consent which is documented in our recordings. Furthermore, as we intended to use still images from the recordings in our published work, all participants were given the option of consenting to undisguised still images from the recordings being used for this purpose. Where consent was not given to publish undisguised still images, we have concealed the faces of the participants using mosaic patches. Both the forms of consent and the study itself were approved (Journal No. 11.052) by the institutional review board of the University of Southern Denmark, the Research and Innovation Organisation (RIO), in accordance with the GDPR and Declaration of Helsinki.
The recordings were immediately transferred from rec.vc and stored on a secure server at the University of Southern Denmark. Permission to store and analyse the recordings at this location was also granted by RIO. Transcriptions of the VC recordings were made by the first author using the CAQDAS programme ELAN 5.9. All transcriptions were stored on OneDrive at the University of Southern Denmark. Non-verbal modes – gaze direction, gesture, facial expression, head and body movement – were also noted and allocated with time stamps using ELAN 5.9. Speech was transcribed verbatim using GAT 2 transcription conventions for minimal transcripts. 31 All translations of speech from Danish to English included in this article were made by the first author.
All authors discussed and agreed upon which episodes from the data were to be included in the article. The first author carried out a multimodal social semiotic analysis of the selected episodes from the data. All authors contributed to and discussed the findings.
Our data consist of eight screen recordings of authentic VCs, ranging in length from 3.19 to 20.01 min, amounting to 59.19 min in total. The recordings were made by three GPs (one female and two male) from two different practices in the same Danish city. Eight adult patients participated in the VCs: three men and five women. Two of the women spoke on behalf of their respective children (one male toddler and one pre-teen female), who were also present during the VCs.
Analysis
Following a multimodal approach, the first author sampled the video recordings to select instances (episodes) for detailed multimodal analysis. 32 These episodes were informed by the theoretical concept of salience, 27 which pointed to instances in the interactions where ‘“order” [was] disturbed or where a convention [was] broken’. 32
The episodes featured in this article particularly stood out in terms of how mobility was used as a resource for making interpersonal meaning compared with other instances within the corpus of our data in which patients also used handheld VC technology. In other VCs from our data, two patients conducted the VC with their device placed on a surface in front of them, thus mobility did not affect the framing. Although the remaining patients held their devices in their hands, the framing was stable throughout the consultation and mobility did not play a significant role in the interactions. In this article, we have selected episodes in which the framing was not stable in order to study the effect of mobility on the interactions.
Below, we present three episodes from our data in which the hardware's affordance of mobility gave rise to salient events in the interactions of patients who used handheld devices to carry out their VCs. This affordance was readily available to patients due to their use of handheld VC technology to carry out the VCs. We focused on the patients’ use of handheld devices in the interactions as the GPs used desktop computers for the VCs, and consequently the affordance of mobility was not available to them.
Imposed intimacy
In all three episodes, the device's affordance of mobility allows each patient to direct the gaze, and thus the looking activity, of the GP, simply by moving their hardware; in our data, the patients are most likely using smartphones, based on their one-handed use of the technology. In this way, the patients display a degree of control over the video interaction. For instance, in all of our featured cases, the patients begin their VCs by framing themselves in very close-up shots in which their heads, and at times the top part of their shoulders, are visible and occupying a large portion of the frame; this framing is likely due to the position of the patient's arm in holding the device comfortably to carry out the consultation. As a result, the patients are at a distance of what Hall 29 terms intimate distance – far phase (p. 117), that is approximately 15–45 cm from the handheld device. Therefore, the patient is represented on the GP's screen at this intimate distance, which places the GPs in a position of ‘imposed intimacy’. The GPs, on the other hand, were seated further away from their computers, at around 75–100 and 20 cm, and were thus visible from the mid-torso up, with their hands free and often visible to the patient, which allowed opportunities for gesturing. However, although the GPs were placed centrally in their frames, they took up less space on the screen due to their distance from the computer's camera. Consequently, the GPs were represented on the patients’ screens at what Hall 29 terms personal distance – far phase (p. 120) – at least two stages of intimacy further away than the representations of the patients. In this way, the GPs had access to detailed visual information, such as the patient's facial expressions and direction of gaze, whereas these details were less available from the patient's perspective. As a result, the mediated representations of the patients versus the GPs displayed an asymmetry of interpersonal distance as the patient is displayed closer to the GP than vice versa. However, this reflects and is in keeping with the more general asymmetry of the doctor–patient relationship 20 in which the patient is expected to reveal intimate details and potentially offer their body up for inspection, whereas the GP does not reciprocate. In terms of proxemics, this social relation is realised through the positioning of bodies and how close the other person may come: the GP may approach and closely examine the patient, but the patient may not do so to the GP. Thus, the imposed intimacy, and asymmetrical framing in the VCs, mirrors the physical organisation and conventions of face-to-face medical consultations.
Building on the above, we now present the three episodes from our data showing what we describe as ‘disrupted attention’, ‘directed looking’ and ‘augmented embodiment’, resulting from the patients’ use of handheld VC technology in their VCs.
Episode 1: Disrupted attention
In Episode 1, the patient frames herself in a very close shot 27 but in such a way that her image occupies only a very small portion of the lower part of the frame, and only the upper part of her face is visible (Figure 1). This is in spite of the inset frame displayed within the VC window. The inset frame is present throughout the VC and shows the participant as they appear to their interlocutor, thus allowing the potential for self-monitoring during the interaction. However, as the VC proceeds the patient does not adjust her position to improve the composition of her appearance in the frame. As a result, very few non-verbal communication cues are available to the GP. Nevertheless, although it is not possible to determine what people are looking at in video-mediated interaction, the direction of gaze can indicate where attention is being directed. 33 In this episode, the patient's eyes are visible within the frame and oriented towards the screen, which signals her engagement in the interaction as well as providing other interactional information.

Video consultation (VC) opening.
Although the represented image of the patient does not provide an ideal basis for the GP's looking activities, for instance, in order to carry out an examination, the GP makes no mention of it, and the interaction proceeds. However, 1 min and 25 s into the VC, an incoming call from the patient's daughter interrupts the patient's explanation of her symptoms. As the first vibration occurs, the patient breaks off her sentence on the word ‘tablets’ (1:45.165) and does not complete the sentence. As the patient attempts to deal with the call, the VC is disrupted to the extent that the talk stops for both participants, and the interaction momentarily breaks down.
The incoming call brings to attention a feature of the patient's hardware as being a convergent mobile device. 18 That is to say, the device supports the video call function, which the patient is using to carry out the VC, that is then interrupted by the telephone function. In this way, the different functions of video and telephone are simultaneously active in the same device, and the patient must use agency to navigate and choose between the options. However, the patient appears to be distracted by the sudden shift in functionality and being unexpectedly confronted with the telephone function overlapping the ongoing VC. The device's multifunctionality is thus a material constraint, which requires the patient's skill in balancing the use of the device in line with her personal needs. 18 In this way, the use of one functionality is disrupted by another, and as such ‘it is the user's interest which establishes [the] priority among uses’ (Adami & Kress, 18 p. 193). In this episode, the patient diverts her skills towards managing the incoming call, and in doing so she does not appear to be able to continue with the VC.
As a result, the patient switches her attention from interacting with the GP (Figure 1) to attempting to deal with the incoming call (Figures 2 to 4). Curiously, the patient does not draw on her device's affordance of mobility here, and rather than moving the device she instead moves her body away in response to the changing functionality and her own looking activity. At the same time, the patient's gaze and body position shift in line with her switch of focus from talking to the GP to attending to her device. In this way, she appears to be looking ‘at’ the screen, rather than ‘through’ it at the GP. The patient's change of gaze and body position indicate her reprioritising of the higher-level actions 33 in which she is engaged: she focuses on dealing with the incoming call from her daughter over continuing the VC with her GP. Although it is a function of the technology that interrupts the interaction, it is the patient who ceases to interact with the GP, first by ceasing talk and then by her shift of attention.

The patient's device begins to vibrate, and in response she shifts her body position and gaze.

The continued vibrating of the patient's device prompts her to take action, and she moves her left hand towards the screen.

The patient's hand takes up approximately half of the frame, obscuring her face.
As the patient brings her hand towards the screen, she partially covers the GP's augmented eye and renders herself visually unavailable to his gaze. Moreover, given the intimate distance between the patient's image and the GP's augmented eye, the patient's hand ‘inappropriately’ (Hall, 29 p.118) enters the intimate sphere, as expressed by a ‘distortion of the visual system’ (p. 118). In physical face-to-face interactions, such closeness results in sharp focus being lost and vision becoming blurred, thus causing muscular discomfort. 29 In the VC, it is the smartphone's ‘eye’ (camera) that loses focus and not the GP himself. This action excludes the GP from the patient's zone of intimacy. At the same time, the patient's speech ceases (1:45.915) as she attends to the incoming call from her daughter. In parallel with this, as the patient becomes interactionally unavailable, the GP also withdraws from the interaction, ceasing talk and redirecting his own looking activities away from the patient. This reflects a shift in focus of both participants.
At no time during the interruption does the patient verbally refer to the incoming call or explain to the GP the reason for the disruption. The only reference to the incoming call is made by the GP, when on the fourth vibration of the patient's smartphone he asks: ‘Who is it that's calling?’ (1:53.765) to which the patient replies, ‘Yeah, yeah. It's my daughter’ (1:55.255). The question is a significant utterance as it prompts a response following an awkward silence and could be seen as an attempt to repair the interaction. However, following this exchange the patient still does not resume the interaction. Instead, she returns her attention to her device, moving it into her left hand and switching to using her right hand to interact with the device's touchscreen. This may be due to convenience: possibly the patient is right-handed, and it is easier for her to operate the touchscreen with her dominant hand. Nevertheless, this action once again briefly covers the GP's augmented eye (Figure 5) and once again disrupts the GP's participation in the interaction.

The patient uses her right hand to interact with the touchscreen.
After the patient has dealt with the disruption of the incoming call, she signals her readiness to recommence with the VC by once again orienting her gaze towards and ‘through’ the device's screen and thus returning her attention to the GP (Figure 6). However, at this moment, the GP's attention is focused elsewhere, and he does not immediately register the patient's display of recipiency. This illustrates one way in which video-mediated gaze differs from face-to-face interaction: ‘[p]eople find they cannot gain others’ attention through simply gazing at them in digitally mediated interaction, meaning that gaze is not as effective in gaining attention and does not function the same for monitoring the behavior of others’.34,35 In this example, it is only after both participants realign their gazes to their screens that the GP resumes the VC interaction.

The patient reorients her gaze towards the general practitioner (GP).
In this episode, the patient's use of the VC technology is negatively affected by her digital competency. Her framing choices do not actively assist the GP's participation in the interaction and in fact hinder the communicative exchange. Furthermore, there is no oral meta-communication by the patient regarding her management of the disruption to the interaction. Although her attempts to silence the incoming call are for the benefit of the VC, the abrupt cessation of talk and the awkward silence that ensue are a breach of both talk and consultation conventions: a simple explanation or apology would have been appropriate.
Paradoxically, it is through the patient's use of the handheld device and non-verbal communicative resources, gaze direction in particular, that the disruption is both signalled and repaired. However, the patient does not use these resources to best effect to support the interaction. As a result, the interpersonal relation comes under strain from these challenges, and it is the GP who ultimately takes responsibility for repairing and resuming the interaction. This illustrates a shifting balance of power during this episode: the patient's framing choices are not helpful in terms of showing her body for examination as well as obscuring the GP's view during the interrupting phonecall, yet these go unchallenged; the patient's use of technology downgrades the GP's participation in the interaction to the extent that the interaction is disrupted. Nevertheless, it is the GP who restores the interaction following the disruption. This could be viewed as an example of disruptive patient empowerment to the detriment of healthcare delivery.
Episode 2: Directed looking
In Episode 2, a mother speaks on behalf of her toddler son who has had an allergic reaction. The VC is conducted in Danish, but it is clear from the mother's accent and lexico-grammatical choices that she is not a native speaker. The mother begins the VC by framing her toddler and herself in a close shot, 27 with the toddler positioned centrally in the frame and the mother to the left (Figure 7). Both the mother and the toddler's heads and shoulders are visible. The toddler's body is turned away from the viewer and his gaze is directed downwards towards an activity he is engaged in, which indicates his detachment from the interaction. 27 The mother's body and gaze are oriented towards the screen and the VC interaction.

Video consultation (VC) opening.
After describing the toddler's symptoms to the GP, the mother explains that she has been treating her child with eye drops she obtained from the pharmacist. As she speaks, her gaze is directed away from the screen and to her left. She appears to be looking at something outside of the frame, which is in a part of her environment that is not available to the GP. The mother seems to be slightly unsure about the name of the medicine: ‘We got … Benaliv, or whatever it's called’ (00:43.310). As she continues her explanation, she returns her gaze towards the screen before the GP interrupts, requesting clarification. At this point (00:51.240), the mother brings the medicine into the centre of the frame in a showing sequence (Figure 8), which visually confirms the name of the medicine. This is an object-centred 2 sequence in which the mother brings an object from her environment to the foreground of the frame, thus establishing it as a relevant concern in the interaction. Simultaneously, the movement of the medicine bottle is accompanied by movements in the mother's gaze: she glances at the medicine outside of the frame then returns towards the GP as she introduces the medicine into the frame. The alignment of the mother's gesture and gaze indicates that the medicine bottle is the object she had been gazing at and which she now shows explicitly to the GP.

The mother brings the medicine into the frame.
By foregrounding the medicine in the frame, thus making it the most salient element, 27 the mother at the same time backgrounds herself and the toddler. As well as being central in the frame, the medicine bottle receives the greatest amount of light and is closest to the GP's augmented eye, which makes it the element in the frame with the greatest visual weight. 27 The human participants, on the other hand, are in the background: the toddler's face is partially obscured by the medicine, and the mother receives very little light and is only partially represented to the left of the frame. In this way, the salience of the medicine bottle places it at the top of the hierarchy of importance among the elements represented in the frame. 27
Nevertheless, despite the medicine bottle's salience during the showing sequence, the mother maintains the direction of her gaze towards the GP. The vector created by her eyeline 27 sustains her interactional connection with the GP and helps the continuation of the interaction. In this example, the mother does not make use of the device's mobility to move the device towards the medicine to offer the GP a close-up view of the label. This may have been easier for her as it would have involved the use of only one hand. Instead, she prioritises putting the face of the current speaker on-screen (Licoppe & Morel, 36 p. 426) and keeps the camera directed towards herself and the toddler as she continues to talk with the GP. In this way, her showing gesture serves to enhance the GP's understanding of her talk while still orienting to the talking heads arrangement in which both participants are on-screen and facing the camera. 34 By bringing the medicine into the frame rather than moving the camera towards the medicine and away from the human participants, the mother highlights the tension between using video to show participants versus relevant features of the environment. 34 However, by choosing to keep the camera directed towards herself, the mother sustains the interaction with the GP while simultaneously using a showing gesture to support her verbal explanation of the measures she has taken to treat the child's symptoms leading up to the VC.
In this episode, the mother's use of technology demonstrates that she deems the medicine bottle relevant to the interaction but not of greater importance than sustaining the ‘face-to-face’ interaction with the GP. Although her device's mobility would have allowed her to direct the GP's looking towards the medicine bottle by moving her device away from herself, she chose instead to bring the medicine into the field of vision encompassed by the GP's augmented eye. This indicates that the mother prioritises her interpersonal interaction with the GP while also ensuring that he understands her talk.
Episode 3: Augmented embodiment
Episode 3 features a VC with a mother speaking on behalf of her pre-teen daughter, who is complaining of a sore throat. Although the daughter is the subject of the consultation and is of an age where she is capable of speaking for herself, she does not directly interact verbally with the GP during the VC. She can, however, be seen verbally interacting with her mother at various points during the VC, although her speech is not audible in the recording.
The VC opens with the mother in a close shot, placed centrally and slightly to the left in the frame (Figure 9). She appears to be holding her device in her hand as the image is rather unsteady, reflecting her bodily movements. The daughter is intermittently glimpsed to the right of the frame but only partially.

The video consultation (VC) opening.
During the initial history-taking phase of the consultation, 19 the GP asks whether the mother has checked for any pain or swelling on her daughter's neck (1:12.216). Using both hands, the GP simultaneously demonstrates on his own body where and how the mother should check her daughter for symptoms. Although the GP's verbal utterance is not a direct request for action, his gesturing adds semiotic intensity, 35 with the result that the mother replicates the GP's gesture on her daughter. However, the mother uses only one hand (her left) to examine her daughter as her right hand is being used to hold the device (Figure 10).

The mother replicates the general practitioner (GP)'s gesture.
As the mother examines her daughter on behalf of the GP, she draws on the affordance of mobility to turn the camera so that the daughter is now, for the first time, central in the frame. As she does so, the mother physically moves towards her daughter, entering personal distance, close phase – a distance of one and a half to two and a half feet at which touching the other person is possible; 29 as can be seen by her hand in the frame, the mother is close enough to touch her daughter. This is an adapting-frame-to-body 19 showing sequence in which the camera is moved towards the participant.
During this sequence, the daughter averts her gaze, and no eye contact is made with the device's camera (the GP's augmented eye). Instead, the daughter adopts the characteristic pose of detachment found in face-to-face examinations in which the patient looks to one side and into middle distance but at no particular object in her environment. 36 As such, this is an offer image in which the GP is free to look at the patient impersonally as an item of information or object of contemplation. 27
At this point, the mother switches roles. In the opening of the VC, although the mother aligns with her daughter by speaking on her behalf, she does not share the frame equally with her daughter and is in fact visually sidelining her. However, by turning the camera towards her daughter, the mother now aligns herself with the GP by sharing her field of vision with him; she directs his looking by bringing his augmented eye closer to her daughter (and turning it away from herself). In this way, the mother represents her daughter at close personal distance to the GP, which reflects her own physical distance from the girl. Moreover, the mother enables the GP to vicariously ‘physically’ examine the patient by means of her gesture. Hence, the mother's body becomes an extension of the GP's body: the examination is carried out by and through the mother. Via his augmented eye, the GP can see the mother's hand touching the patient from his own perspective, as if it were his own hand. This illustrates how VC technology can be used to augment sensing and embodiment.
However, by turning the device's camera towards her daughter, the mother is simultaneously turning the screen around and away from herself. She is therefore no longer able to see the screen and monitor the inset frame and is thus unable to determine what the GP's augmented eye can see. This may explain the significant change in perspective of how the daughter, now the represented participant, is viewed: during this sequence, the daughter is seen from a low angle. Following Kress and van Leeuwen's discussion of power and vertical angle of frame, the daughter is viewed from below and therefore depicted as having power over the viewer of the image, 27 in this case the GP, which is at odds with her role in the examination. Kress and van Leeuwen 27 argue that the angle of frame is a resource for making interpersonal meaning, but in this instance, it could be argued that the angle of frame is not a deliberate choice and simply results from the fact that the mother cannot see the screen and is consequently unable to monitor the represented image. Thus, the unusual moment in this consultation when the daughter is represented as having power is perhaps only due to the mother not having complete control over what the camera is showing.
Following the mother's examination of her daughter's throat, she once again directs the camera towards herself, although the daughter is now close enough to her mother to be visible in the right of the frame. Possibly as a result of the mother's direction of his augmented eye, the GP now begins to use pronouns in a way that indicates his own shifting alignment in the interaction: ‘we’ signals his alignment with the mother, while he now addresses the patient by the pronoun ‘you’ instead of ‘she/her’ as previously. The GP appears to need more information about the daughter's symptoms and directly addressing the patient herself the GP says: ‘We [your mother and I] need to try and look in your throat … we [your mother and I] can try and do it with the camera, if that’ll work’ (1:31.047, emphasis added).
As the GP continues talking, the mother ‘flips’ the device's camera, and she can now see what appears on the GP's screen, thus realigning her gaze with his. As a result, she is better able to move the camera to helpfully assist his examination as she is simultaneously able to move the camera and monitor the screen, thereby seeing what the GP sees. This demonstrates the mother's spontaneous adaption to the use of technology: she makes use of the device's affordance of flipping the camera function rather than physically turning the device around as she did in the previous sequence. Although the mother's physical actions contradict a verbal instruction from the GP that she need not flip the camera (1:45.414), her use of the technology serves to support his examination of the daughter.
In contrast to the previous sequence, in this sequence the image of the patient is represented from above, and now ‘the represented participant [the patient] is seen from the point of view of power’ (Kress & Van Leeuwen, p. 140), a position shared by both the GP and the mother. In this way, the patient's body is presented appropriately for examination. 36 Similarly, the camera has been moved extremely close to the patient and only her open mouth is visible in the frame. In this way, the patient is depersonalised through the absence of contextual visual information, and her mouth becomes the sole object within the frame and offered as the object of inspection for the GP (Figure 11).

Extreme close-up of the patient's mouth.
However, this representation does not appear satisfactory for the GP, and he asks the mother to withdraw the camera and switch on the telephone's light. In doing so, the GP now refers to the mother as ‘you’ and not ‘we’ as previously. The mother replies: ‘We’ve just tried that, and we’ve also switched on [patient's] phone’ (2:10.049, emphasis added). Here, the mother now aligns with her daughter through her use of the pronoun ‘we’ although she keeps the camera directed towards her daughter and does not appear in the frame herself. The mother's reply also indicates that she has spontaneously drawn on another affordance of the technology – the light – as well as introducing extra technological hardware, her daughter's device, to support the examination. Following this statement, the mother once again moves the camera towards her daughter's throat. Again, the daughter's gaze is directed to the side and middle distance, signalling her detachment during the examination.
Details of the daughter's whole face are now present in the frame, and although the GP's augmented eye is further away from the patient, the image is sufficiently clear for the GP to ‘see [that] she is red’ (2:18.292, emphasis added), enabling him to make a diagnosis and prescribe appropriate treatment for the patient. Once the examination is concluded, the mother again directs the camera towards herself, as at the beginning of the consultation. The GP returns to referring to the daughter as ‘she’ and now directly addresses the mother as plural ‘you’ (in Danish, ‘I’). By contrast, the GP now uses ‘we’ to indicate his alignment with other health professionals (Figure 12).

Close-up of patient with light.
This example clearly shows how technology may be used to enhance the delivery of healthcare in VCs. The mother's spontaneous and adaptive use of mobility, as well as other affordances of the technology, supports her collaboration with the GP, thus ensuring the best possible treatment outcome for her daughter. The mother negotiates different roles throughout the VC in which she switches between aligning with her daughter and aligning with the GP. Her switching roles can be seen in how she draws on her device's mobility during the interaction, for instance, using the technology to become an extension of the GP's physical body in order to facilitate the examination of her daughter, and her shifting alignment with different roles is also often reflected in the use of pronouns during the interactional talk. Moreover, the mother's changing roles often result from verbal meta-communication between herself and the GP, which explicitly influences how she directs the GP's augmented eye. By drawing on the affordance of mobility, the mother makes choices about what she represents in the frame and how she does so, for instance, in terms of proximity and angle, which has consequences for how she makes interpersonal meaning.
Discussion
In our data, the patients' personal interests and level of digital competency determine how they use VC technology in relation to their commitment to the interaction. For instance, in Episode 1, the patient's use of technology overrides her interaction with her GP, likely due to her digital competency, and in doing so her actions represent a disruptive element of patient empowerment – she does not play by ‘the rules of the game’. 37 On the other hand, the mother in Episode 2 prioritises her interaction with the GP and uses the VC technology in a way that supports the GP's understanding of the verbal interaction. In Episode 3, the mother strengthens her collaboration with the GP through her embodiment of the technology in order to ensure the best treatment for her daughter. However, the mother's close collaboration with the GP risks excluding the daughter – who is, after all, the patient – from the interaction: for instance, by speaking on her daughter's behalf and excluding her from the frame. Furthermore, the collaborative actions of the mother in Episode 3 also raise the issue of responsibility. Although this episode demonstrates that a third party can helpfully assist a GP to assess a patient in VCs, their actions as a layperson could also lead to an error in diagnosing and treating the patient.
Our analysis demonstrated that the patient's use of technology was a significant factor that influenced doctor–patient communication, which could help or hinder the delivery of healthcare. For instance, Episode 1 showed that gaining another's attention is potentially more difficult in VCs. Gaze is a significant non-verbal factor in organising face-to-face interaction, yet eye contact is in fact impossible in video interactions. 38 However, due to the compactness of handheld technology, the camera and screen are positioned very close to each other, thus enhancing the impression that the patient is looking directly at the GP. On the other hand, the greater distance between the GPs’ full-size computer screens and the externally fitted cameras was a material constraint that made it more difficult for the GPs to create the illusion of meeting the patient's gaze. This could only be achieved by looking directly into the camera and away from the screen. Lack of eye contact from a healthcare provider can result in patients feeling overlooked or neglected 39 in VCs where access to non-verbal communication is limited, for instance, by camera view and framing choices, meta-communication becomes increasingly important.12,39,40
Although participants in video interactions tend to orient to the face of the current speaker, as shown in Episode 2 with the introduction of the medicine bottle, other elements can overtake the visual weight of the human participants within the frame and potentially deflect attention from the interaction. In Episode 1, the patient waited silently for the GP to return his attention to her after the disruption, and it was only when the gazes of both participants were oriented towards their screens that the GP took responsibility for resuming the disrupted communication. Consequently, it may be helpful to establish in advance who is responsible for repairing the interaction in the event of disruption.5,12 In our data, we found that before the VCs began the GPs informed patients about how to proceed in the event of technical disruption. Similar agreements could be made making the GP responsible for taking the lead in the event of interactional disruption.
As we have seen in our data, the variation in ways patients use handheld/mobile technology in VCs raises the question of what situations and conditions lend themselves to this particular consultation form. In our study, it was the GPs who proposed VC as the mode of interaction, which was in accordance with official recommendations regarding COVID-19 at the time. However, the patient's use of technology in Episode 1 may cast doubt on whether VC was the appropriate consultation form for her in this instance; due to her framing choices the visual aspect did not appear to contribute to the interaction in any significant way, and a telephone consultation may have sufficed. On the other hand, the visual aspect of VCs was used to advantage by the patients in Episodes 2 and 3, demonstrating the potential benefit of VC over telephone consultation in some cases. It may be difficult to assess in advance which form of consultation is most appropriate for which patients,12,40 but taken as a whole these examples illustrate the different experiences patients, and GPs for that matter, may have using VCs, and some consideration should be given to the suitability of this consultation form for individual patients and/or cases. 23
In light of our findings, we suggest that previously established patterns of medical communication, such as those set out in the Calgary-Cambridge Guide to the Medical Interview, 41 may not be directly transferrable to VC situations. As we have demonstrated, current technologies of communication unsettle former patterns of communication, 18 not least through the spontaneous ways users draw on the technology's affordances, and communication patterns will have to be adapted in response to communication changes brought about by the COVID-19 pandemic. 42 For instance, in our data, we found very little meta-communication between participants about how the VC technology was being used during the interactions, which may be a result of uncertainty about how to negotiate this comparatively unfamiliar terrain. 43 However, from the exceptional case of Episode 3, which demonstrated a great deal of meta-communication, it was apparent that even simple directions can improve the interactions and the delivery of healthcare.
Strengths and weaknesses
Although our study was based on a small data sample, our adherence to the multimodal social semiotic framework in our micro-level analysis yielded nuanced and detailed interactional information that indicates a need for greater understanding of how VC technology is used in the delivery of health provision. By the same token, a technologically savvy patient population could contribute to the improved delivery of healthcare in VCs with GPs.
With the exception of one VC, our data were recorded during the early stages of using VCs, which occurred within the context of the first lockdown in Denmark in March 2020. These were, of course, exceptional conditions, and the VCs in our recordings represent first-user experiences. As more experience is gained with using VCs, these experiences will likely evolve over time.
Given the circumstances under which this study was conducted, there are some limitations to our findings. At the time of our data collection, lockdown restrictions prevented us from being physically present in the field, and at the same time, GPs were undergoing considerable reorganisation of their practice workflows within a very short time frame. 40 Perhaps as a consequence, we faced some difficulty recruiting GPs to participate in the study. Although many of the GPs we approached were positive about the study itself, they declined to participate due to lack of time. As a result, our data sample is smaller than we had intended. Nevertheless, our data are unique, and to the best of our knowledge, this is the only study of its kind based on recordings of VCs in general practice, at least in the Danish context.
Second, the data in this study were collected over a fairly brief period (3 months). Future research based on longitudinal studies could trace the development of how VC technology is used in general practice, which would contribute knowledge that moves on from first-user experiences. Furthermore, studies that combine micro-level observations of VC recordings with interviews with the participating GPs and patients could also yield rich insights into this consultation form. For instance, it could be fruitful to explore how the GP and patients perceive the researcher's observations. In this way, more far-reaching relational consequences of this particular form of communication and interaction could be brought to light.
Conclusion
In conclusion, this study demonstrates how video communication has consequences for doctor–patient interaction and thus potentially affects the quality of treatment patients receive in VCs. In light of this, it is important to understand how technology is used when studying alternative consultation forms: as we have shown, technology plays a significant role in the interactions between GP and patient and is thus an important factor to consider in how practice is shaped when using VCs. It is, therefore, crucial to gain a deeper understanding of the affordances of technology and how their use is integrated into VC interactions.
Our findings provide specific insights into patients’ spontaneous and adaptive uses of handheld technology in VCs and how the patient's use of technology plays into the delivery of healthcare. For instance, in Episode 1 the patient's framing choices and abrupt abandonment of explaining her symptoms in order to deal with the incoming call impair the GP's access to her, both visually and verbally, thus affecting the delivery of healthcare. On the other hand, in Episode 3, the mother's creative use of handheld technology greatly enhances the GP's potential to examine the patient and deliver the appropriate treatment. These findings may be of benefit to GPs and health professionals who use VCs in their daily practice by increasing awareness of how patients’ use of technology may enhance or obstruct the delivery of healthcare. Similarly, our findings may point to considerations for further research into the development of formalised communication guides and training programmes for health professionals who intend to continue using VCs in the future.
Footnotes
Acknowledgements
The authors wish to express appreciation to all study GPs and patients who shared their experiences.
Contributorship
CJ researched literature and conceived the manuscript design. CJ wrote the first draft of the manuscript. AG contributed iteratively with adjustments and supplements to the manuscript. All authors discussed the theoretical approach and the findings, and they reviewed and approved the final version of the manuscript. CJ proofread the manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
The study was approved (approval number 11.052) by the institutional board of the University of Southern Denmark, the Research and Innovation Organisation (RIO).
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
Guarantor
AG.
Informed consent
All participants provided informed consent: the GPs provided informed written consent, and the patients provided informed oral consent which is documented in our recordings.
Trial registration
Not applicable, because this article does not contain any clinical trials.
