Abstract
In this paper we explore human communicative behaviour in unsolicited commercial telephone calls between human telemarketers and ‘bots’ that exhibit human characteristics. Drawing on a corpus of recorded telephone conversations between telemarketers and a spam-interception service, we examine some of the communicative dimensions through which telemarketers make sense of their interactions with this technology as trust, or rather the illusion of it, is established, severed and restored. The analysis shows how trust is established early in the calls through an authentic human voice, the illusion of progressivity and purported intersubjectivity, including ‘doing-being-human’ excuses. In cases where telemarketers realise they had not been talking to a human, verbal abuse towards the bot, and expressions of surprise and embarrassment oriented to their professional face are articulated as the call is used as a training opportunity to identify bots. The article contributes to understanding some of the technology enabled contemporary communicative practices human beings engage in as part of their everyday lives. It raises questions about how humans negotiate trust and validate authenticity in an increasingly automated and technologically driven world.
Introduction
Unsolicited phone calls from legitimate and illegitimate organisations are a significant global issue. The prevalence of spam calls has greatly increased in recent years due to technological advancements such as VOIP that enables access to telephone networks through the internet (Saunders & Frascella 2022), making it easier and more cost effective to make huge volumes of calls.
Efforts to counter nuisance calls include national ‘do not call’ registers that are available in some countries (e.g., Telephone Preference Service in the UK). Individuals can register their number to opt out of unwanted calls, and in order to try and protect consumers from fraud, many network providers have spam blocking measures in place. Regulatory bodies are introducing new rules for telephone providers that require them to identify and subsequently block spam numbers (Ofcom 2022, Povich 2022). However, fraudulent companies are using more sophisticated tactics to avoid detection, such as ‘spoofing’ caller ID in order to impersonate a trusted organisation, for instance, a bank or insurance company (Ofcom 2022).
There are now a number of companies which offer a service that intercepts spam calls 1 received by landlines or mobile phones. In this paper we examine a sample of calls from one of these companies. For an annual subscription fee, the Respond and Protect Telephone Company (R&P) 2 offers ten different ‘robots’ that subscribers can choose to intercept any spam/scam calls they receive. These ‘bots’ are in fact recordings that have been created based on analyses of telemarketing calls. Their purpose is to give the illusion that the caller is conversing with another human and to hook potential spammers or scammers into rounds of interaction without getting out of the loop to waste as much of their time as possible. The bots are, however, elementary. There is no speech recognition or AI to modify the bots’ answers; the same set of utterances are used at the same time and in the same order, akin to the chatbot ‘Lenny’ in Relieu et al (2019). Unlike mutual understanding in human communication, the bots are incapable of producing a (non-answer) response (Stivers & Robinson 2006) or parrot back. Their utterances are not adjusted progressively or incrementally. Despite their basic technology, the formulaic nature of the telemarketers’ (TMs) scripts (Lockwood et al 2009) is a key factor which enables the bots’ utterances to be effective and keep the TM on the line.
Any form of social interaction involves a level of implicit or explicit trust, expressed by information that is relayed through a communication channel between individuals (Hancock et al 2023). Trust is “the extent to which a person is confident in and willing to act on the basis of the words, actions, and decisions of another” (McAllister 1995: 25). In the calls we examine, from the TMs’ perspective, trust is plausible given that there is an authentic human voice. It is contingent on action and grounded in the fact that throughout the calls each of the TMs’ utterances receives a reaction. Even if the responses are ill-fitting, they are often ignored by the TMs in the hope that the called party (i.e., the bot) will focus on the business in hand and some value (i.e., a sale/scam) will arise from the trust invested by the TM.
Our study aims to provide a first exploration into human communicative behaviour in unsolicited and often fraudulent commercial telephone calls enabled by bots. We focus our attention on some of the communicative dimensions through which humans make sense of their interactions with these technologies as trust, or rather the illusion of it, is interactionally established, severed and restored. In so doing, we identify the conversational mechanisms through which recordings perform a human persona and examine the TMs’ reactions to the realisation they had not been interacting with a human.
In the next section we discuss the interactional principles through which trust is established. In section 3, the data and methods are introduced. Section 4 presents detailed analyses of the interactions followed by the conclusion.
(Re)establishing trust in pursuit of prospect engagement
Recent studies on nuisance and scam calls have explored methods to detect and prevent abuse and misleading calls in human-human interactions (e.g., Javed et al 2021, Wood et al 2023). They have also identified some of the mechanisms through which digital deception and cybercrime is enacted (e.g., Dynel & Ross 2020, Rowe 2009). The rise in cyber fraud has led to organised online vigilantism or scambaiting (Loveluck 2020, Button & Cross 2017) whereby the scam-baiter engages with the scammer and exposes their fraudulent activity. Whilst there are similarities between R&P’s project and scambaiting, for example, attempting to frustrate, prolong interactions and waste time (Chia 2020, Smallridge et al 2016), an important distinction is that scambaiting involves online human-human interaction, whereas R&P’s service is based on human-bot interaction; the bots are designed to intercept all unsolicited calls but not to commit any sort of (cyber)crime.
Telemarketing calls, legitimate or otherwise, follow a detailed organisational script through which businesses strive to direct and maintain the sequential organisation of the calls and its direction (e.g., Jagodzinski & Archer 2018, Tovar 2020, Márquez Reiter 2011) with a view to achieving efficiencies and increase sales. The calls we examine in this paper are characterised by the fact that their organisation is disrupted with the TMs losing the direction of the call and displaying negative emotional reactions towards the bots or embarrassment upon realising they had not been talking to a human being. Even though the interactional projects of the TM and the R&P bots are related in the sense that they both seek to maintain interactional engagement, the TM’s objective is to sell an alleged service to the ‘prospect’ and the bot is designed to waste the TM’s time.
For the bots to waste as much of the TMs’ time as possible, they need to project personhood, that is, they need to display some self-awareness, capacity to reason, capacity to communicate, and some self-motivated activity. It is the very idea or illusion of personhood and the convenient communicative setting that helps to establish trust, at least for the TMs who think they are talking to a human.
Trust has received a great deal of attention in the humanities and social sciences. It is generally considered to be a precondition for social interaction (e.g., Weber & Carter 2003 on the social construction of trust in everyday life) and cooperation (e.g., McCabe, Rigdon & Smith 2003 on trust and reciprocity in games). In human-human interaction, conversational participants assume others to be cooperative (e.g., Grice 1975), honest (e.g., Bellucci & Park 2020), able to fulfil their promises (e.g., Searle 1975) and a relatively trustworthy source of information (e.g., Sheard 2015).
Ullman and Malle’s (2018) analysis of the semantic space of trust showed that the human-human interaction literature often conceives trust as an agent’s acceptance of vulnerability in an interaction or relationship; that is, the belief that the other will not exploit their vulnerability. By contrast, when humans interact with machines, trust appears to be grounded in the machine’s capacity to do what its label says. Based on this, Ullman and Malle distinguished between “relational” trust versus “capacity” trust. Most measures of trust, especially in human-robot interaction, focus on the person’s belief that the robot can complete a given task while most measures of trust in human-human interaction focus on integrity, loyalty, and other social-moral constructs. The results of their study suggest that trust has a multidimensional structure with four distinct dimensions: Being Capable, Ethical, Sincere, and Reliable. We shall return to these in the analysis.
In the calls examined here, the bot’s utterances, in the initial stages of the call, broadly index a degree of conversational cooperation (Grice 1975). This is because every contribution by the TM receives a reaction by the bot, which, at first sight, suggests some level of informativeness, truthfulness, appear to be relatively relevant and clear, especially as far as the call’s progressivity is concerned.
As the call progresses, however, this transpires not to be the case. The premise of R&P’s service is to waste the TMs’ time and keep them on the call as long as possible. Trust is vital to the continuity of the call; if the TMs suspect they are not interacting with a human, they will end the call and move on to the next number on their list.
Trust is thus implicit early in the conversation given the bots’ authentic human voice, willingness to engage and the display of purported intersubjectivity, and it is sustained by the TMs’ need to maintain the prospect on the line and progress the call with a view to pursuing their goal (i.e., to make a sale or defraud).
Intersubjectivity, or the state of mutual understanding between interactants, is achieved through the sequential organisation of talk (Schegloff 2007) whereby each new turn continuously updates interactants’ understanding of the previous turn (Heritage & Atkinson 1984: 11). The principle of progressivity refers to the advancement of conversation within turns and sequences (Schegloff 2007). In interaction, there is a preference for progressivity (Heritage 2007, Stivers and Robinson 2006), and shared understandings are designed to achieve some action, such as agreement or response (Schegloff 1992). Intersubjectivity and progressivity are, therefore, two of the principles through which conversational cooperation gets practically done as “actions are co-created by the concerted effort of the participants” (Mey 2010:2887). One way in which participants enact cooperation is by supporting the progress of interaction (Stivers & Robinson 2006), understood as showing alignment e.g., responding to a question for information with an answer, rather than with arbitrary tokens, such as ‘mmm hmm’ (Stivers 2008). Thus, progressivity is cooperation at the structural level of the interaction (Duranti & La Mattina 2022) and participants typically adjust their actions to achieve intersubjective attunement (Rommetveit 1988).
The calls we analyse illustrate how the TMs hear the level of intersubjectivity that has been achieved is ‘good enough’ (Garfinkel 1967:8) to move the conversation forward, and how the bots hold themselves accountable for the suspension of progressivity. They do this by requesting the repetition of a previous sequence based on excuses (Scott & Lyman 1968) for temporary lapses in understanding. The excuses are grounded in human traits and the TMs readily orient to this portrayal of personhood.
Data and methods
The data for this article draw on a corpus of 140 telephone conversations recorded by the R&P Telephone Company and uploaded to YouTube between 2015 and 2022. Forty-seven calls were randomly selected from the corpus; all recordings were listened to and poor-quality calls, very short examples or extremely offensive calls were discarded. Calls were transcribed using Jefferson transcription conventions (Jefferson 2004). We present here an analysis of a sample of these calls depicting recurrent patterns of communicative phenomena observed across the dataset to give a flavour as to how they unfold and the TMs’ verbal reactions to realising they had not been interacting with a human. The data are analysed from an interactional pragmatics approach (Márquez Reiter 2009, 2019, cf. Chang & Haugh 2011, Haugh & Culpeper 2018). The analysis draws on concepts from conversation analysis (Schegloff 2007) to identify the stages in the calls where the unfolding of marked sociopragmatic phenomena, such as to third-party interventions, issues of professional face and emotional responses when realising the called party is not human, and how they are interactionally constructed by the humans. This is coupled with analytic interpretations which consider the larger economic context in which the pragmatic phenomena are embedded: telephone service provision continues to be a prevalent commercial practice with agents working under pressure and surveillance (Brophy 2017; Tovar 2020).
Whilst the recordings are publicly available on YouTube, we acknowledge the ethical dilemmas of using publicly accessible online data (boyd & Crawford 2012). It was not possible to obtain informed consent as details of the companies and individuals were not available. In view of this, all names and any other identifying information have been removed from the dataset and, where appropriate, replaced with pseudonyms (franzke et al 2020).
Analysis
The first part of the analysis focuses on how the TMs deal with a series of ill-fitting reactions in the first two stages of the calls: the opening and the middle of the exchange (Zimmerman 1992). We then turn to the TMs’ emotional reactions once they realise something is not quite right with the interaction.
Establishment and restoration of trust
Opening sequence
In this section we analyse the techniques used to establish trust. In the dataset, these are recurrent practices that occur in the opening sequence of the call and at the beginning of the business exchange. The first examples illustrate the opening of a typical telemarketing call. Excerpt 1 is from a call between a newspaper company and the R&P bot ‘Susan’. The TM is trying to get the ‘prospect’ to sign up to a year’s newspaper subscription.
The opening sequence follows the trajectory of a typical marketing call. The TM first provides self and organisational identification (Zimmerman 1992, Márquez Reiter 2011) followed by the first part of a how-are-you exchange, suggesting that the call is not business as usual (Tracy & Agne 2002) by way of the synthetic personalisation (Fairclough 1989) it indicates. The 1.6s gap and the prospect’s non-fitting response (Thompson et al 2015) (Y:es) in line 6 to the TM’s ‘how are you’ alerts the TM to a potential connection problem, confirmed by the prospect’s request in line 10. The TM then introduces an extended reason for the call in the anchor position (Schegloff 1986) in lines 12-23. The prospect’s overlapping go ahead token in line 21 (Sure) is trust-implicative for it can be understood to signal compliance with the reason for the call and helps to underlie her identity as the right ‘person’ to talk to (see also line 16). In view of this, in lines 22-23, the TM tries to ascertain the prospect’s preference for the newspaper delivery. The first sign of trouble comes with the prospect’s response in line 25, preceded by a gap of 0.8s in line 24. The sharp rising intonation in the token ‘Okay?’ is prosodically and lexically non-fitting. However, it is heard by the TM as indicating potential interest, evidenced by her okay-prefaced response (l.27) that functions as a bridge (Merritt 1978) between the different stages of the attempted sale – the initial explanation and the pricing of the offer. The characteristic opening sequence of sales calls is further demonstrated in the following excerpt from a car insurance company.
The same opening sequence seen in Excerpt 1 is also evident here – summons-answer, exchange of greetings, identification of the caller and reason for the call (Schegloff 1986). Nevertheless, there are early signs of trouble, for example, the prospect responds to the TM’s greeting with an elongated ‘Ye:p’ (l.34) and also following the question, ‘how are you doing today’ (l.35). The TM’s nervous laughter (Glenn & Holt 2013) (£Oka:y£ – l.39) and hesitation marker (er) could be a sign that he considers this an unusual response, however he continues with the reason for the call.
These examples show how the opening sequence of the interactions mostly follow the normativities of a telemarketing call (Lockwood et al 2009, Márquez Reiter 2011, Jagodzinski & Archer 2018). Although some of the acknowledgement tokens interspersed throughout the calls mark compliance, agreement or confirmation with the TMs’ sales itinerary, others are prosodically and lexically non-fitting and often produced after significant gaps. These inconsistencies seem to go largely unnoticed or are ignored by the TMs in their intent to progress the call.
Middle of the business exchange
We now turn to the middle of the business exchange. This part of the call is where the main negotiation takes place and the TMs attempt to maintain progressivity to achieve their objective of a successful sale or scam. In Excerpt 3, the TM introduces the newspaper subscription price.
The TM informs the prospect about the price of the offer in line 59 with an extreme case formulation (‘entire’) (Pomerantz 1986) and sharper intonation rise in ‘↑year’ with which she intensifies the alleged good value of the offer. The details of the offer are finalised here with falling intonation (‘.’) that signals a transition relevance place and completeness. At this point, an assessment of some sort would be expected from the prospect, however, the offer is reacted to with silence in l.60 and a repeated non-fitting token (ok:ay? – this time prolonged). It is this and the silence in l.62 that leads the TM to check the prospect’s interest with a yes/no interrogative at l.63. The lax token (Uh-huh) and subsequent silence in lines 65-66 are taken to be ‘good enough’ (Garfinkel 1968) by the TM to continue her pursuit. Similarly, the TM disattends the rather weak overlapping acknowledgement ‘Mmm hmm’ in line 71 with a specifying question (Fox & Thompson 2010) in an attempt to close the deal and secure payment. Therefore, the TM ignores repeated signs of interactional trouble (lines 62, 65, 66, 70, 71) in her efforts to hook the prospect and progress the call. In Excerpt 4, we return to the air duct cleaning call where the TM attempts to secure an appointment for a technician’s visit.
In line 88, the TM uses the declarative form as a question to confirm the prospect’s address to which the prospect reacts with an expected affirmative reply (Yeah – l.89). The TM then poses an alternative wh-question (morning or afternoon) at lines 91-93. The prospect’s non-fitting response (Oka:y) leads the TM to repeat the question (l.95) as a way of testing the reliable dimension of trust and checking for intersubjectivity; the emphasis on the ‘do’ indicates that the response received was not entirely clear. The prospect then produces another non-fitting response (Right – l.96). The lengthy silence that follows alerts the TM to a potential problem and he checks that the prospect is still on the line (l.98) and again verifies that there are no connection problems (Can you hear↑me – l.102). Whilst this time the prospect’s response to this polar interrogative is type-conforming, there is another 1.2s gap before the TM repeats his initial address inquiry as he endeavours to restore progressivity.
In addition to the non-fitting reactions which suspend the call’s progressivity for they suggest that mutual understanding has not been fully achieved, prospects attempt to maintain trust by displaying stereotypical human behaviour and reflexively account for presumed inattention.
Accounting for lapsed intersubjectivity – traditional gender stereotypes as credible excuses
All the R&P recordings contain distractions that are intended to waste time and derail the TM’s project. One way in which the bots do so is by accounting for their lack of attention. Excerpt 5 from the newspaper subscription call directly follows on from Excerpt 3 when the TM asks, ‘what would b::e better for you’ (see Ex. 3, line 72).
The Excerpt shows the bot’s request for repetition couched as an apology. The apology is articulated in a polite and expected manner, with a self-repaired intensified apologetic formula followed by an explanation (e.g., Márquez Reiter 2000). In so doing, Susan accounts for her behaviour in line 108 (I I totally distracted). The last turn construction unit of that contribution, ‘what were you calling about?’ should have provided the TM with a clear sign that something was very wrong. However, she agrees to repeat the reason for the call (not shown). In Excerpt 6, the prospect accounts for her behaviour thus far.
The scripted side dialogue between the prospect and her daughter draws on traditional gender stereotypes of women and mothers and their capacity for multitasking (Lui et al 2021). The prospect’s metapragmatic articulation of her behaviour and her role as a busy mum (lines 115-117) adds authenticity and helps to restore trust. This is done by producing an account in the form of an excuse (Scott & Lyman 1968) for the distraction. The excuse underlies the prospect’s human qualities; as far as the TM is concerned, she reacts as if she is interacting with a real person. Each time a side dialogue is introduced, the progressivity of the course of action is impeded. Nonetheless, perhaps convinced that she has succeeded in securing a new subscriber, we can see how the TM persists, constantly pursuing a (specific and fitting) response (Pomerantz 1986), as she makes four attempts to confirm the prospect’s last name, here in line 110 and on three more occasions (not shown). These contingent questions (Zimmerman 1992) are essential to progress the transaction, hence the TM’s insistence. The distracted mum stereotype (Odenweller & Rittenour 2017) is also found in Excerpt 7, taken from a call between a credit card scammer and the R&P recording ‘Emma’ where the TM is attempting to fraudulently elicit Emma’s credit card number. The call also includes a conversation between ‘Emma’ and her ‘teenage daughter’.
This Excerpt is from early in the conversation following the TM’s extended reason for the call (not shown). This is derailed by an interruption from Emma’s teenage daughter demanding to know how long the call will go on for (l.124). However, in pursuit of a response, the TM is undeterred by the interruption and attempts to maintain progressivity (l. 125 and 129). In line 132, Emma’s ‘go on go on’ is a request for action. It infers that her attention has wavered by the apparently legitimate interruption and invites the TM to continue; the recognisable human conduct accounts for the distraction which appears to convince the TM to continue. Finally, in line 134, the TM is able to state his offer to incentivise the prospect and the continuer ‘Ye::ah’ (l.136) can be heard as acknowledgement of interest in the offer.
A final example of traditional gender stereotypes can be seen in Excerpt 8, taken from a call between a pyramid scheme scam and the bot ‘Bob’, who is presented as an older person. Earlier in the call, Bob implied that he had trouble moving (it’s ↑not
Bob’s failure to reply with a type-conforming response to the TM’s yes/no interrogative (l.137) is accounted for as Bob gets distracted by a hockey game on the television (l.139-140), endorsing another male-watching-sport stereotype (e.g., Burstyn 1999). The acknowledgement tokens in lines 143, 145, 148 and 155 indicate compliance with Bob’s imperative requests to wait (Thompson et al 2015). As shown in Excerpt 5, the last turn construction unit in line 161 (.h So er what did you call me about) should have been heard as a sign of interactional trouble. However, Bob’s account of the hockey game, together with the accounts earlier in the call, combine to bring to the fore the prospect’s human identity (older man, distracted by sport on the television). The placement of these side dialogues, whilst not deliberate, often occurs when the TMs are trying to secure some agreement or information; that is, towards the end of the middle of the exchange.
So far we have shown how the TMs’ implicit expectation of trust is maintained through the interactional principles of intersubjectivity and progressivity. This was done through the arbitrary placement of particles that typically function as acknowledgement tokens and continuers. These were found to occur in approximately appropriate interactional spaces but to display some sort of ill-fitting format, such as their prosody or lexis, or by a delay in response. These allow for the creation of trust and progressivity which is then temporarily suspended by integrating derailments. Derailments were principally effected by integrating side dialogues into the recorded script. These consisted of traditional biased gender stereotypes through which the prospects made themselves metapragmatically accountable for their distraction with a view to keeping the TM on the line. The enactment of human authenticity helped to rebuild lost trust and the TM is then invited to repeat a previous sequence for progressivity to be regained. In the next section we examine TMs’ reactions upon realising their interaction was not with a human.
TMs’ reactions on realising they have not been interacting with a human
Across the calls, three meaningful reactions were identified: using the call as a training opportunity, perceived professional face threats, and verbal abuse.
Training opportunity
The job of telemarketers entails making hundreds of unsolicited telephone calls every day (Woodcock 2017) to disinterested prospects. This explains why they may become unfazed by rejection. Alongside this, the neoliberal conditions of their work (Heller & Dûchene 2012) and the need to secure a sale may explain why the TMs fail to unpack non-fitting responses. Moreover, the productivity of TMs is controlled by management surveillance, and their performance, which includes the number of calls they make and their interaction with customers, is policed (Brophy 2017). Excerpt 9 below is taken from a 14-minute call from an illegitimate holiday company and illustrates the supervisor’s intervention.
SUP: SUPERVISOR
The supervisor joins the call in the ninth minute and encourages the TM to end the conversation through a complaint-implicative statement regarding the TM’s performance thus far (l.163). The statement is constructed in the plural (We/our) and produced in a slower pace. This minimises its potential threat to the TM’s professional face by constructing the effort as collective. The TM reacts immediately (see latch in line 165) displaying agreement and a sense of frustration (i.e., slowed pace and continuing intonation). Following the silence this ensues, the supervisor questions the length of time she has been on the phone (line 167) and the TM politely tries to end the conversation. In the dataset, interactions are often lengthy as the TMs persist in trying to reach the objective of their call. At some point, non-fitting responses and lack of progressivity lead to the realisation that something is not quite right. Excerpt 10, from the newspaper subscription call, shows another supervisor intervention towards the end of the seven-minute call, when the realisation that the prospect is not human occurs.
In line 172, the supervisor asks the prospect a specifying question with a request for information, namely can she supply her email address. After a gap when one would have expected a forthcoming response, the supervisor checks whether he is talking with a real person (l.174). Another 1.3s gap follows and the prospect’s non-fitting acknowledgement token ‘Sure’ in line 176 is ignored as the supervisor responds to his own question with the assertion ‘I don’t think so’ (l. 177), asserting his epistemic authority versus the TM on recorded answering services. In Excerpt 11, we join the interaction a few turns later.
At the end of the call, while the prospect continues to produce arbitrary acknowledgement tokens, the supervisor uses the incident as a training opportunity for the TM (lines 178-180, 183). The elongated vowel and slight rising intonation of the response particle ‘Rea:lly,’ (l. 182) expresses an element of disbelief. The TM’s response is also a news mark (Heritage 1984). These tokens are understood to make a response relevant and here invite the supervisor to elaborate with further explanation (l.183) (Thompson et al 2015). The supervisor’s response at line 187 signals understanding for how the TM assumed the prospect was human and in this way reduces the threat to the TM’s professional face.
Perceived professional face threats
The supervisor’s intervention, together with the realisation that they have been interacting with a non-human party, constitute a threat to TMs’ professional face – “the ‘professional persona’ on loan to the agent” (Márquez Reite 2011: 3863, Márquez Reite 2009).
Whilst the TM’s response in line 189 ‘She’s answered all my questions’ is a tacit disagreement of this assertion, it is also an attempt to save face (Goffman 1967) and demonstrate that she has done her job properly. The supervisor’s reply in line 190, in an agree+disagree format (Pomerantz 1984), acknowledges the TM’s response as a remedial exchange (Goffman 1971), but also restates his judgement of the trouble following the conjunction but. The TM prefaces their next turn with the particle ‘Oh’ (l.191). Heritage (2002:196) attests that oh-prefaced turns can be used to ‘convey what might be termed ‘ownership’ of knowledge’. In this instance, the TM attempts to justify her performance thus far, based on her experiential knowledge of her interaction with the prospect. In other words, the response ‘Oh I’ve been talking to her’, serves to implicitly question the supervisor’s observation of the prospect as non-human and justify the amount of time spent on the call. Excerpt 13 is a continuation of the call a few turns later.
In line 193, the TM reacts to the supervisor’s assertion with a change-of-state token of ritualised belief (Heritage 1984:339) ‘Really?’ with rising intonation. The supervisor then confirms that the TM has been speaking with a recording (‘yeah she just keeps saying uh-huh’); however, he produces a positive assessment (Pomerantz 1984) ‘That’s a good one though’ as a way of explaining how the TM may have been misled and mitigate any potential professional face threat. Consequently, the TM finally accepts the prospect is not human and responds with a surprise token and apology in line 198, ‘O::h my god I’m so sorry’, with prosodic elongation of the ‘O::h’ particle and suppressed laughter displaying a state of embarrassment and attempting to save her professional face.
Previously in Excerpt 9, we showed the supervisor intervention from the holiday company. The company is trying to persuade the prospect ‘John’ that he has unused travel credits that will expire unless action is taken. There have been a number of derailments earlier in the call. The call continues with further random distractions until a few turns later the TM seems to display a metapragmatic awareness that something is not right with the call and asks for John to hold the line. Following a pause, Excerpt 14 begins with the supervisor taking the phone.
The supervisor’s intervention confirms his suspicions that John is in fact a bot. As in Excerpt 12, in lines 217 and 219 the TM manifests resistance to the supervisor and counters his assessment based on her own knowledge of the prospect during the call. However, in contrast to Excerpt 12, where the supervisor addresses any potential face threat with a remedial exchange, here the supervisor dismisses the TM’s claims (l.220). The TM responds with a but-prefaced response (l.222) (Jackson & Jones 2013). But-prefaced components are often employed as a defence to make a point (Schiffrin 1987) or to convey the accuracy of an assertion that has been questioned by the previous speaker (Bolden 2010). With her response, the TM negates the supervisor’s comments and maintains her claim that she has been talking to a real person. The TM continues to reject his assessment insisting ‘He’s talking to someone else, he’s not talking to me’ (l.231), perhaps as a way of justifying the amount of time invested in the call. To further underline her defiance, and perhaps to save her professional face, she returns to the call to say goodbye and confirms she will call John the following week.
Verbal abuse as an emotional response
While the TMs in the above calls were concerned about saving their professional face, in other calls we found two contrasting emotional reactions when they realised they had been interacting with a non-human prospect: expressions of abuse towards the ‘prospect’ through swearing, and expressions of anger through vulgar lexemes, which could be considered a form of ranting (Thorson & Baker 2019). In the following examples, the TMs make their own assessment of the prospects’ odd behaviour:
In Excerpt 15, the TM begins to ask a wh- question and switches to a polar interrogative to which the prospect responds with a type-conforming affirmative (l.247). However, when the TM repeats the specifying question in line 249, the subsequent affirmative response is non-fitting. Following a pause, the TM accounts for the prospect’s behaviour by questioning their mental capacity, referring to them in the third person with the declarative at line 252, as if they were not present in the interaction (see Rehm 2020) or perhaps addressing a co-worker in the call centre. Excerpt 16 shows a similar response. In this recording, the ‘high school’ narrative has already been introduced earlier in the call. When the same distraction occurs again, the TM queries whether the prospect is ‘with it’ (l.256). We cannot know whether at this point in the calls the TMs think they are talking with a recording or whether they actually believe the prospects have impaired cognitive capabilities; nonetheless the TMs appear to be questioning the prospects’ competence. The TMs may realise that there is no value in the time they have invested in the call and their confidence in making sale is low which leads to the abusive language (cf. De Angeli & Brahman 2008).
More sustained verbal abuse is found in Excerpt 17, taken from the first credit card scam call, which we present in detail. The whole interaction lasts for nearly 13 minutes and during this time the TM calls back three times. The Excerpt begins approximately 3:30 minutes into the call. Up until this point, there have been three side dialogues between Emma and her daughter (see Excerpt 7). This is the fifth time the TM has requested information about Emma’s credit card and he is beginning to show signs of frustration.
Following a non-fitting response (Hm mm? – l.260) to the specifying question, we can see a breakdown in trust as the TM proceeds to mimic Emma in line 262 and, in this way, also questions the capability and sincerity dimensions of trust. The TM uses message enforcers, constituting preliminaries to the ensuing threat, in lines 268, 270 and 271 (e.g., Uh-
This Excerpt follows another derailment where Emma explains her difficulties trying to record a TV show. The TM once again starts to mimic Emma (lines 289, 294 and 296). Emma’s affirmative acknowledgement tokens appear to further incite the TM and his aggression begins to escalate. He produces personalised negative vocatives in lines 291 and 303 and a third person negative reference (Culpeper 2011) in lines 301/303 with the strong intensifier (fucking foo::l). The following few minutes of the interaction consist of the TM threatening to harass Emma with multiple phone calls. The call has now been going on for a long time, progressivity has never really been achieved but the TM persists. Operators in call centres are not allowed to terminate calls and must attempt to close a sale unless the prospect ends the call (Woodcock 2017). This may explain the TM’s persistence as approximately 8:30m into the call he tries to get Emma to hang up (you can hang up the call=I will call you back
Their reactions are akin to human reactions when technology is not working, though admittedly at a different scale of abusive language. As pointed out by Ullman and Malle (2018) trust is multidimensional; it involves relational (assuming the bot was human, e.g., sincere) and capacity (the bot’s elementary abilities, e.g., task accomplishment) dimensions, and for it appears to be allowable for humans to abuse tools in ways that would be unacceptable if they were human (Parasumaran & Riley 1997).
Conclusion
In this article we provided a first analysis of human communicative behaviour in outbound marketing calls intercepted by bots. The calls in question are unsolicited and the products or services being sold are not necessarily wanted or indeed real. The bots perform a safeguarding function: detection and protection from all too pervasive nuisance calls. They are aimed at protecting humans from time-wasters or potential criminal activity at the hands of other humans by taking revenge on them.
Whilst the telephone company may have uploaded the most successful examples of calls to promote and sell their service, our analysis of a sample of calls between human telemarketers and bots has identified the main interactional principles and resources that bots ‘used’ to project personhood to give TMs the illusion that they were interacting with another human. We have contended that the communicative environment itself – telephone conversations where the audible rather than visual takes pre-eminence and distractions or deviations from established conversational rules are common and tolerated – and the working conditions of TMs offer a convenient setting in which doing being a human can successfully develop.
The formulaic nature of the TMs’ scripts coupled with the conditions of an industry which requires telephone operators to make hundreds of calls per day, be patient, listen to prospects and avoid putting the telephone down so as not to lose potential customers, enables the bots to be relatively effective despite the general ill-fittedness of their contributions.
We have noted delays in the bots’ utterances, lexical and prosodic arbitrariness of tokens, which are nevertheless heard as acknowledgement or continuers by the TM. These were, however, produced at approximately appropriate times, well into the opening and middle stages of the calls, allowing the bots to project enough ‘intersubjectivity’ for the TM to progress the call.
The bots’ intermittent commitment in the calls, as illustrated by ill-formatted and ill-placed tokens, was nonetheless accounted for. “Doing-being-human” and engaging in typical human activities (e.g., laundering clothes, watching sports) were brought forward as excuses for lapsed intersubjectivity and purported lapsed attention was used to prevent the TMs from advancing their agenda and the bots to maintain (relational) trust. The calls comprise a combination of interactional contributions in the way of utterances and the annexing of side dialogues through which bots reflected on their behaviour thus far in the calls. The calls never progress, despite engagement with alleged conversation.
The illusion of personhood was further conveyed by authentic human voices, which unambiguously distinguished between genders and age. These were accompanied by traditional stereotypes which were relationally oriented, both in terms of their content (e.g., the distracted mum and their capacity for multitasking) and the way in which they were constructed (e.g., inviting TM engagement). The integration of bots’ accounts invited the TMs to repeat a previous sequence for progressivity to be regained and the whole charade started again.
The TMs all display some form of metapragmatic awareness that there is something strange about the interactions. This awareness is expressed in a number of ways: through embarrassment, anger or frustration. Given the preference for progressivity and the pressure to keep telephone calls short in call centres (Woodcock 2017), these prolonged interactions do not conform with prescribed call centre interactions. It could be argued that while the TMs may consider some of the continuous disruptions in the calls rather odd, the derailments successfully account for this behaviour.
We have seen how intersubjectivity and progressivity are woven together in pursuit of the establishment and restoration of trust and how the working conditions of telemarketing agents responsible for making unsolicited calls to sell a given product (legitimate or not) provide fertile ground to ignore potential troubles in the calls, try to resolve them. Despite the TMs’ intent, ultimately communication failed because no intersubjective attunement or togetherness could be achieved between the TM and a non-responsive recording.
Since collecting the data, R&P has incorporated text-based ChatGPT-4 and other speech recognition and voice cloning technology into their system. The new service is an embryonic example of a proper bot. There are only two example calls available on their website, however the original recordings were so well honed to the formulaic nature of the telemarketing script that they did a better job of doing being a human than this first version of a bot. This highlights a current contextual limitation of AI as this new bot is unable to recognise the context of a telemarketing call or relate to its situated interactivity. AI would need to evolve before it could be used effectively in this type of calls.
Services offered by companies such as R&P can be useful to protect vulnerable members of the public (e.g., the elderly) from nuisance and scam calls. Notwithstanding this, there are clear ethical and moral implications in using AI which does not identify as such to target humans. Humans’ implicit assumption of trust by the very fact that the bots engage in interaction, their authentic human voice, legitimate backstories, reactions to the TMs’ every utterance, coupled with the TMs’ constraint of being unable to end the call, sheds new light on further ethical dilemmas that emerge in a communicative arena characterised by (socioeconomic) inequalities. This is especially poignant in the case of telemarketing agents who often work under slave-like conditions with clear implications for their livelihoods.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
