Abstract
For humanoid robots, gender-ambiguous presentation is implemented as a potential way to avoid gender-stereotypical design. Using conversation analysis, we look at video recorded user interaction in the presence of a designedly gender-ambiguous robot, showing how this design choice is actually dealt with within a social context. Robot gender becomes relevant initially when a user refers to the robot with a dual-gendered package (‘young lady young gentleman’), with another user proposing ‘her’ for the robot, and the talk then evolving to the pursuit of agreement on the robot’s proposed femininity. Robot gender attribution is treated by these users as a collaborative endeavor rather than an individual choice. It includes displays of accountability and orientations to delicateness of gender attribution. By implication, the analysis shows that ambiguous design shifts the burden of gender attribution from robot designers to users.
Keywords
Introduction
Gender is often overlooked in the design of humanoid robots, which raises ethical issues of representation, inclusion, and potentially stereotypical design (Seaborn and Frank, 2022). One approach to avoid gender stereotypes in robots is gender-ambiguous design, such as a gender-neutral voice for Voice User Interfaces (GenderLess Voice, 2021), ‘gender-free’ robot voices (Yu et al., 2022), or gender-ambiguous robot appearance (e.g. Pepper, Jibo, ASIMO). Nonetheless, users are found to still attribute gender to robots designed to be visually genderless (Seaborn and Frank, 2022; Søraa, 2017), which some developers seem to consider a privilege in the hands of users. For example, Aldebaran-Softbank (2019) states that users may refer to Pepper in a way that ‘makes most sense to [them]’ (para. B) and Honda claims that ASIMO’s gender is ‘anything a human wants it to be. . .’ (McCormick, 2012).
Through conversation analysis of a single stretch of interaction we show how robot gender is oriented to in the presence of a Pepper robot – a robot that officially has ‘no gender’ (Aldebaran-Softbank, 2019). The case study provides a first exploration of how robot gender is talked into being, which nonetheless has immediate and actionable consequences for robot design. Through disentangling one particular case meticulously, we provide insights into how human-robot interaction is situated in social structures including (and prominently) gender. Robot gender becomes relevant in the interaction when referencing the robot for the first time and reflects human gender practices more generally.
Referencing, gender attribution, and categorization
When referencing anything, speakers have to select a term from a wide array of options (Enfield, 2012). This selection is sensitive to recipient design (cf. Sacks et al., 1974), for example what the recipient is taken to know, who the participants are to each other, and to the communicative goal/project at hand (Enfield, 2012; Schegloff, 1972). Many options for referring make relevant features of the referent (compare Katherine/my daughter/birthday girl, Enfield, 2012) and also descriptions can be used to refer (‘the fella I have for Linguistics’, Schegloff, 2007). Thus, selecting one word over others involves categorizing, the relevance of which the speaker may work to avoid in producing the reference (Schegloff, 1996). In other words, speakers can use a reference term in such a way to only do referring, or they can use a term to do referring and something else, such as categorizing (Schegloff, 1996, 2007). For example, in the referencing done in ‘there is a woman in my class who is a nurse’, the first part (woman in my class) is doing referencing while the second part (who is a nurse) is doing describing by doing categorizing (Schegloff, 2007).
When referring to persons, people often attribute gender (Klein, 2011). Such referring thus involves gender categorization: when referring to someone as a ‘fella’, maleness is attributed by the speaker. This categorization work is grounded in common-sense knowledge and a result of (consciously or unconsciously) considering an individual’s physical and behavioral aspects (Kessler and McKenna, 1978), with gender oriented to as both perceptually available and knowable (Jayyusi, 1984; West and Zimmerman, 1987). Most (Western) society members furthermore operate with a gender system of two mutually exclusive categories, ‘male’ and ‘female’ (Crawley, 2022; Garfinkel, 1967; Kessler and McKenna, 1978). A fundamental example of how gender is treated as perceivable, knowable, and binary is the assigning of gender at/before birth as either a boy or a girl, which is usually based on reviewing the body (via ultrasound imagery) and telling what it is. 1 However, these features are not available in everyday interaction, meaning gender is continually achieved as a perceptually available matter (Jayyusi, 1984) and gender presentation is generally taken as indicative of physical features (West and Zimmerman, 1987). Gender is thus a relational, negotiated activity that is constantly in production (West and Zimmerman, 2009).
As a general rule, people apply the (gender) category that seems appropriate, except in case of discrepant features that would rule out the category (if-can test, Sacks, 1972). In meeting someone whose gender cannot be gleaned from appearance or presentation, a person can conceal that they perceive the other’s gender as ambiguous to avoid embarrassment on either side (cf. West and Zimmerman, 1987), showing that unsuccessful categorization as either male or female has social import. In recent years, a third category (or rather ‘un-category’) of non-binary has found increased footing, but the binary is still a medical and legal reality (Crawley, 2022) and cisnormativity is still dominant in medical care (e.g. cf. Lampe, 2023). The predominance of the gender binary is also reflected in interactional data outside of a legal or medical framework: when faced with gender ambiguity in a picture-describing task, participants seek clarification from the moderator, treat the attribution process as morally accountable, and quickly re-establish their belief in and commitment to dichotomous gender as an objective fact (Speer, 2005). In short, gender is performed, perceived and interpreted continually, and its performance and successful perception is accountable (West and Zimmerman, 1987).
The pervasiveness of gender as a social category can be seen in its ubiquitousness in the Dutch and English language system. Gender categorization is found normatively included in minimum basic forms of reference to a non-present and to the recipient unknown person (Klein, 2011). The omnipresence of gender in the language system means gender attribution is often a crucial step to talk about others and thus talking about others involves gender categorizing them.
However, gendered reference terms are not necessarily treated by speaker or recipient as making gender relevant (cf. doing referencing vs. doing categorizing, Schegloff, 2007). Person references can be linguistically gendered (e.g. ‘gentleman’, ‘woman’, ‘him’) or non-gendered (‘child’, ‘anybody’, ‘you’), but interactionally, gendered person categories can be treated as non-gendered, and conversely, linguistically non-gendered categories can be employed as gendered in interaction (Stockill and Kitzinger, 2007). For example, in one stretch of talk, ‘man’ was found to not make gender interactionally relevant, while ‘people’ was (Stockill and Kitzinger, 2007). Lastly, speakers can repair the selected reference (e.g. ‘Just ignore- this. . . lad- this girl’) or use multiple references as alternatives through the use of ‘or’ (e.g. ‘then the woman. . . or the girl that lives there’) to indicate they do not know (exactly) which is the (most) appropriate (Stokoe, 2011).
Hence, on top of general concerns such as recipient design (cf. Sacks et al., 1974), person reference commonly involves gender attribution, which in turn involves the language system and cultural views on gender. When gender of the referent is unclear to the speaker, the perceived ambiguity is not necessarily made public, or gender is attributed and grounded from a binary perspective.
Data and method
The central case is taken from a dataset of 30 video-recorded interactions between a nurse, a participant taking the role of interviewee as part of an experimental trial, and a Pepper robot programmed to perform health surveys (cf. Boumans et al., 2019). For each participant, the nurse provided instructions (see Figure 1), followed by some practice questions with the robot, after which the nurse left, and the participant was interviewed by the robot. The interaction we focus on here is from the instruction phase, in which robot gender came up recurrently. 2

Pre-interview setup.
The robot was present throughout the interaction but programmed to start its script only after a wake phrase (‘Hello Pepper’), which was to be produced after the instructions, meaning the robot was not verbally responsive to talk yet during the instruction phase. It gazed around the room or at registered faces and nodded semi-randomly, namely when speech was picked up, regardless of the input.
We further explored the circa 2.5 hours of data from the instruction phase. We observed that the robot was referenced through linguistically gendered and gender-neutral terms. Varying gendered referencing was recurrent both intrapersonally (e.g. one person using she in one turn and him in the next) and interpersonally (e.g. nurse using he and she and participant only using he/him). Gender-neutral person reference terms (such as ‘Pepper’ or die, rough transl. ‘that/who’) were never consistently used throughout the interactions, but seemed common. The genderless object reference ‘it’ was only used when gender was problematized (see Excerpt 1, L1).
In nearly half of the interactions, reference to the robot was problematized by the participants in some way. Excerpt 1 is an example of a reference term problematized through adding alternatives.
In some cases (5 out of 30), talk about robot gender extended (far) beyond a single turn. The interaction analyzed in this paper is one of these cases and the only one in which Pepper itself is addressed about its gender. For that reason it offers a rich case for analyzing how humans genderize a gender-ambiguous robot.
Analysis
In Excerpt 2, the reference to Pepper by the nurse (Nur) is problematized by the participant (Par). The excerpt is taken from the beginning of the interaction, where Nur starts her 3 instruction by referencing Pepper with ‘this young lady young gentleman’ (L1–2). Par interrupts the announcement, making Pepper’s gender relevant: ‘We still don’t know what it is right’ (L3–4), initiating gender talk.
Through Nur selecting a person reference (‘young lady young gentleman’, L1–2) and not an object reference (e.g. ‘robot’), gender becomes linguistically and thus potentially interactionally relevant (cf. Klein, 2011). Her way of referencing is marked in two ways. First, it violates the preference for minimization (cf. Stivers et al., 2007) by using two reference terms in the same position. The intonation is flat throughout the full referential phrase, so ‘young gentleman’ is neither hearable as a repair of ‘young lady’ through cut-offs or repair initiators, nor produced as an alternative with ‘or’ (cf. Stokoe, 2011): the dual reference is produced as a package. Second, the two categories evoked (lady, gentleman) are generally regarded mutually exclusive (Crawley, 2022; Kessler and McKenna, 1978; West and Zimmerman, 1987). The dual-gendered package enables reference without prioritizing one gender-marked reference term over another (while restricting gender to a binary), but simultaneously foregrounds Pepper’s gender ambiguity. 4
It is Par who makes gender interactionally relevant, interrupting Nur’s instruction to topicalize the reference to Pepper, overlapping with ‘yeah’ immediately following the reference (L3) and explicating the ambiguity of Pepper’s gender as a shared problem (‘we still don’t know what it is right’, L4). Nur aligns with Par (‘No huh’, L5), but then suggests that ‘it will still remain secret’ (L5-6), 5 orienting to closing the topic and proceeding with the instructions for the interview task. Both Par and Nur treat Pepper’s gender as knowable, but unknown to them (‘don’t know what it is’, ‘remain secret’). Gender ambiguity is therefore not treated as a potentially permanent state, but as a temporary lack of knowing to which category the robot belongs.
Whereas Nur introduces and accepts gender ambiguity for the robot, Par does not accept the proposed gender secrecy. Instead, he quotes one of the robot’s caretakers (pseudonym Jan in the transcript), arguably treating Jan as an authority on this matter (‘well Jan says like due to the round shapes I stick to her’, L7–8). However, even Jan is not quoted as knowing Pepper’s gender, the quote containing ‘stick to’ and an account for attributing female gender to Pepper, namely its ‘round shapes’. The account invokes the if-can test (Sacks, 1972), suggesting the robot’s round shapes are sufficient to categorize it as ‘her’. With this, not only the gender attribution of Pepper is at stake, but also which specific (physical) features are relevant to belong to the category.
Nur aligns reluctantly: a 1.2s silence, skewed slow nod, and phrasing agreement in a roundabout way (‘there is a point to that’, L9). Nur’s following assessment of Pepper’s appearance (‘looks rather friendly like this right’) is ambiguous in stance, since friendly could be seen as culturally associated with women and thus supporting the claim, but it could also be taken to be non-gendered and thus as ‘undoing’ gender. When Par does not immediately respond (0.7s, L11), Nur adds ‘less edgy’ (L12), hearable as a reformulation of, and thus in support of the ‘round shapes’ account articulated by Par.
In the following turns, Par pursues Nur’s agreement on the proposed gender disambiguation. After a silence Par strongly aligns (‘exactly, yeah’, L14), hence Nur and Par seem to have reached agreement on Pepper’s gender. However, when Nur launches a return to the task instructions (‘uhm’ in L15, bodily orientation towards the instruction sheet) Par interrupts Nur again, this time with what could be heard as an upshot of ‘less edgy’: ‘a bit more feminine therefore’ (L16). Hence, Par treats Nur’s prior turn in L9–12 as insufficiently agreeing with the gender attribution and/or the reasoning that accounted for this attribution, amplified by the contrast marker toch (‘it is’, L16). Par treats robot gender categorization as something he cannot decide on regardless of Nur’s position, pointing to the need for agreement.
Additionally, ‘a bit more feminine therefore’ (L16) adds another dimension to the pursued categorization work. In effect, Par links friendliness to femininity, bolstering the earlier attribution of ‘her’ by adding a link between round shapes and femininity. Par also downgrades the earlier proposed attribution by phrasing gender as gradual rather than categorical (round shapes are ‘more feminine’ rather than indicative of the category ‘her’). This switch towards graduality pries open the gender binary somewhat by invoking a spectrum rather than strict categories, while it still relies on the binary categories.
Nur responds with a delay (0.7s, L17) signaling a slightly disaligning response is forthcoming (cf. Pomerantz, 1984), then an affirmative ‘yeah’, and nou ja (‘well yeah’), a Dutch phrase signaling reluctant willingness to abandon a dispute (Mazeland, 2016), and after another 1s pause: ‘fine’ (L18). Then, Nur includes Pepper in the gender talk (‘huh is it also okay with you’, L20). This retroactively treats Pepper’s nodding throughout the sequence as responsive to the talk (although note that the nods were semi-randomly produced, e.g. occurring after ‘today uhm’, L1, and ‘well yeah’, L18). Asking for Pepper’s agreement is thus hearable as a confirmation check in response to multiple nods. Although Pepper has not shown itself interactionally competent enough to engage with the talk on its gender, including the robot in the gender talk alludes to gender as something one knows of themselves, drawing on the norm that others are to be treated as having privileged access to their own experiences and specific rights to narrate them (Pomerantz, 1984; Sacks, 1984). The norm is invoked through Par and Nur’s consistent orientation to the robot’s gender as knowable throughout the interaction.
Lastly, the sensitivity of gender attribution surfaces explicitly when Par distances himself from the proposed categorization of the robot. While Nur phrased her turn towards Pepper as ‘is it also okay with you’ (L20), thus implying agreement between her and Par, Par responds to Nur’s dispreferredly designed response in L18 saying: ‘but that was his suggestion’ (L21) and ‘is not my fault’ (L25). Here, with mitigating laughter (cf. Jefferson, 1984), he blames Jan for gendering the robot female, thus distancing himself from the categorization and safeguarding himself against anticipated blame (cf. resisting the identity category of sexist, Stokoe and Smithson, 2001). Hence, gender attribution is oriented to as particularly accountable and sensitive. Nur responds minimally by briefly mirroring Par’s laughter (L23), then returns to the main activity (L24 and L26) which closes the gender talk, the distancing done by Par thus seemingly sufficient to absolve the need for agreement.
Discussion
Our analysis shows that robot gender is more than just a design issue: it comes up, is of situated relevance to, and has to be managed by participants in the presence of a social robot. First, the ways in which the robot is referenced reflect the robot’s perceived status and the work done by users to keep the robot ‘alive’, selecting person reference over object reference. With person reference treated as relevant, the issue of referencing the robot collides with the language system in which gender is commonly the minimum information included (cf. Klein, 2011), evidenced by users problematizing each other’s or their own gendered reference terms in nearly half of the interactions. By implication, linguistically gendered terms were also often not interactionally oriented towards as making gender relevant (Stockill and Kitzinger, 2007), even when varying terms were used inter- and intrapersonally.
Our case study showed how referencing can trigger categorizing, and, as soon as gender was topicalized, how human gender categorization practices surfaced: like human gender, robot gender was treated by both interactants as knowable, with gender ambiguity treated as a (temporary) state of not knowing (cf. Speer, 2005). This triggered gender attribution, which involved pointing to specific attributes of the robot, in this case its appearance, and thus echoes everyday gender attribution on the basis of perceptual cues (Jayyusi, 1984; West and Zimmerman, 2009).
However, whereas gender attributions in everyday life are often done in a split second (seen but ‘unnoticed’, Garfinkel, 1967, p. 118), in the presence of a gender-ambiguous robot it may involve a lengthy stretch of talk, halting progression of the ongoing activity (cf. Speer, 2005). This may be amplified by questions of gender territory: who may decide when there is no agreement, and neither the robot nor designer provides an answer?
Furthermore, attributing a gender category may involve defining gender categories in potentially stereotypical ways, for example, tying femininity to round shapes and friendliness (cf. Seaborn and Frank, 2022). This may lead to socially delicate situations in multiparty interaction with/surrounding the robot. We see in such situations that robot gender is not just attributed by each user individually by following what ‘makes most sense to [them]’ (Aldebaran-Softbank, 2019, para. B), but that accounts are given for the proposed categorization and that agreement is sought. Our analysis shows that robot gender categorization too is not an individual act, but a relational, negotiated activity (cf. Crawley, 2022).
The case study, in the context of the general finding that reference to the robot was commonly problematized, indicates that gender categorization and the gender binary are still pervasive (Jayyusi, 1984; Kessler and McKenna, 1978). Even when reference to the robot was designedly ambiguous (‘young lady young gentleman’) it invoked the gender binary. Although the binary was somewhat opened up in the ensuing talk, conceding gradualness rather than strict categories, gender ambiguity was still explained in binary terms. Gender-ambiguous design thus does not circumvent gender’s pervasiveness in our culture and language, but rather makes it visible.
This single case analysis indicates that while leaving Pepper’s gender up to the user may seem empowering, potential robot gender issues are not solved with ambiguous design. Instead, gender-ambiguous design displaces the question of robot gender from the developer to the user, who has much less time for contemplating gender and avoiding unwanted implications. Additionally, referring to the robot may spark gender talk, which interrupts the ongoing course of action, in this case the instruction. Gender-sensitive design therefore requires an understanding and careful consideration of gender in language systems, of gender in social interaction, and of humans’ gender ascribing practices.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
