Abstract
In the past decades, the notion of voice in the theorizing and teaching of academic writing has been the subject of much debate and conceptual change, especially concerning its relation to writer identity. Many newer accounts of voice and identity in academic writing draw on the dialogical concept of voice by Bakhtin. However, some theoretical and methodological inconsistencies have surfaced in the adaptions of the concept. Working from a refinement of the dialogical notion of voice based on the concepts of polyphony and interiorization, this article presents a methodological approach for analyzing voice(s) in writing. The article presents material around the evolution of an early-career researcher’s dissertation synopsis. The material is multilayered, including the writer’s text, transcripts from an interdisciplinary peer-feedback conversation with two colleagues, and a video-stimulated interview with the writer. Excerpts of the material were analyzed to trace the polyphony of interiorized voices that influenced the writing. This focus revealed the multivoicedness of academic texts as an effect of their history of coming into being. This article contributes to the question of voice and identity in academic writing from a dialogical psycholinguistic perspective by presenting a de-reifying notion of voice grounded in an understanding of writing as a polyphonic activity, which also feeds into the formation of a writer’s self.
Introduction: Concepts of Voice in Academic Writing
The notion of voice has been central to studying and teaching academic writing for decades. Far from being an undisputed fundamental, the notion has been the subject of much debate and conceptual change, especially concerning the relation of voice to writer identity (Matsuda, 2015). Looking at the current perspectives on voice in academic writing, Stock and Eik-Nes (2016) note that authors exhibit heterogeneous and sometimes unclear theoretical understandings of the term, resulting in contradictory claims about and approaches to empirically studying voice. My aim is to present a psycholinguistic refinement of the notion of voice based on a dialogical theory of language and, particularly, on the concepts of polyphony and interiorization. I will present a methodological approach based on this theory, which is designed to trace the polyphony of own and interiorized voices that feed into the formation of a writer’s textual voice. This approach is illustrated with a fine-grained analysis of excerpts from an interdisciplinary conversation and a video-stimulated interview from a case study with an early-career researcher and writer who received feedback on her dissertation synopsis from two colleagues from other fields.
Reviews of the concept of voice in academic writing tend to structure the existing studies and approaches on voice into three perspectives, based on the characteristic understandings of voice and identity that they exhibit (a) an individualistic, (b) a sociodeterministic, and (c) an alternative perspective to the two first ones, often termed “dialogic” (e.g., Matsuda, 2015; Prior, 2001; Tardy, 2012). Historically, the three perspectives emerged in subsequent phases, but views related to the two older perspectives are still prevalent today. In the following overview, my focus is on the third perspective and, specifically, the question of what value a dialogical theory of voice with a psycholinguistic focus can have for the study of voice(s) and identity in writing.
In the first perspective, voice is understood as “a writer’s unique and recognizable imprint,” as Tardy (2012, p. 37) puts it in her critical review of the literature. In this individualistic or expressivist view, voice is associated with a writer’s “true” identity and authorial presence. Style and tone of a piece of writing are thought to emerge “naturally” from the writer’s individuality if the writer finds a connection to their authentic voice and self. This view can be attested to many of the early dealings with voice, most prominently to well-known descriptions of voice by Elbow, who characterized voice as “the self revealed in words” (Elbow, 1968, p. 119) or words carrying “the sound of a person” (Elbow, 1968, p. 120). An expressivist view is still characteristic of many current approaches to teaching writing and improving a writer’s style in many countries and educational contexts, where voice is considered a key aim of learning to write and of developing as a writer. This understanding of voice is part of an overall epistemological stance with the assumption that identity is fixed and can be expressed in language.
The second perspective is a sociological or sociolinguistic one that comes with a more or less deterministic interpretation of language, that is, that language is a set of socially available abstract forms and rules that speakers put to work. This view stresses the importance of social norms for writing. Voice, in this perspective, does not reveal a writer’s personal identity but is a means for writers to construct their social identity from linguistic features that are typical for a certain discourse community. Historically, this perspective has been formulated as a critique of the expressivist first perspective—both in the field of L1-writing (e.g., Harris, 2012) and L2-writing (e.g., Ramanathan & Atkinson, 1999). Many critics of the first perspective stress that the image of a writer finding their authentic voice is informed by Western individualism and imposes a powerful pressure on students less familiar with typical academic genres (Bowden, 1999). Current sociodeterministic accounts focus on genre-specific linguistic resources for constructing voice in this sense (see, e.g., Hyland, 2005, 2008) and on disciplinary styles of writing as well as on the social identities that writers take on by using these socially available features.
The third, alternative perspective on voice tends toward a social-constructivist or a social-interactionist focus. This perspective highlights the socially constructed and contextually situated nature of language and puts the reader at the center of interpreting voice (prominently: Matsuda, 2001, 2015; Matsuda & Tardy, 2007; Nelson & Castelló, 2012; Tardy, 2012; Tardy & Matsuda, 2009). Matsuda (2015) stresses that this perspective—contrary to the sociodeterministic perspective—focuses “not only on how social norms arise and become stabilized but also on how individuals shape the form and meaning by using the tools provided by the norms—or socially available discursive repertoire” (p. 149). Many of the more recent alternative understandings of voice and identity in the third perspective associate their approach with a dialogical tradition of voice (e.g., Dysthe, 1996, 2012; Ivanič, 1998; Matsuda, 2015; Prior, 2001; Tardy, 2012). The dialogical concept of voice dates back to the first half of the 20th century as developed by Mikhail M. Bakhtin (1895–1975) and Valentin N. Vološinov (1895–1936) in the context of their more global dialogical understanding of language as activity. 1 A core tenet of their theory of voice is that utterances are polyphonic or multivoiced: others’ voices are a genuine and nonoptional part of every utterance.
Even though the dialogical tradition has influenced many social-constructivist understandings of voice and identity in writing, some ideas that are characteristic of the third perspective run the risk of reproducing theoretical and methodological inconsistencies that were present in the first and second perspectives on voice and identity. Prior (2001) points out that “traditional” accounts of voice in writing, the expressivist and the sociodeterministic, bear implicitly on a Saussurean distinction between langue—language as an abstract set of linguistic rules and forms—and parole—the putting-to-work of these rules and forms by speakers. In the third perspective, many authors speak of social “repertoires” of voice features (e.g., Ivanič, 1998, pp. 52–53; Matsuda, 2001, p. 40; Tardy, 2012, pp. 37–38) from which writers deliberately or unconsciously choose in writing. Such a view highlights the social resources for identity construction—what Ivanič (1998) calls “possibilities of selfhood”—and the use writers make of them (see also Castelló & Iñesta, 2012; Hyland, 2005, 2008). In terms of the reader, voice is understood as “the amalgamative effect of the use of discursive and non-discursive features that language users choose” (Matsuda, 2001, p. 40, italics added). What could easily result from this definition is a two-phase model of interaction between writers and readers, where the textual features somehow “stand between” the writer and the reader. For example, Nelson and Castelló argue that “voice is not somehow in the language of the text. Instead, a writer’s voice is inferred from textual cues by those who read the text; it is manifested in the social relation between writer and reader” (Nelson & Castelló, 2012, p. 34; see also Matsuda, 2015, p. 153). This view suggests a constructive process on the side of the writer, which is social insofar as the resources from which the writer chooses stem from a social repertoire but is still characterized by “individual agency in appropriating, resisting, or negotiating . . . conventions” (Matsuda, 2015, p. 148). And it suggests an equally socially influenced, but individually perceptive-constructive process on the side of the reader, which results in a notion of voice as “perceived identity” (Matsuda, 2015, p. 152). 2 These ideas are typical for an interactionist conception of language; they highlight the power of individual actors to influence and reconstruct social norms and, thus, resonate well with the dialogical idea that the social is personalized by individual acts (cf. Prior, 2001). Nevertheless, they do not necessarily comprise the wide notion of dialogicality that comes with a Bakhtinian and Vološinovian understanding of discourse, which is closely linked to the idea that human psychology is fundamentally social—a process worked out in detail by Vološinov’s and Bakhtin’s contemporary Lev S. Vygotsky (1896–1934) via his concept of interiorization. Without explicitly conceptualizing the writing and reading process itself as dialogical and voiced, the metaphor of the repertoire might suggest fixed and depersonalized “items of language” that can be stored and then put to work again. In other words, the image of the repertoire risks echoing a Saussurean understanding of voice features in terms of langue and might conceal the dialogical, psycholinguistic, and sociohistorical nature of “production” and “reception” and the identity-formation happening through these processes. A social-constructivist understanding of voice can, thus, benefit from the inclusion of two core ideas in the dialogical paradigm: the polyphony of utterances and the interiorization of others’ voices. These two ideas are precisely the ones that can help to theoretically and analytically apprehend the dialogicality and sociality of the “interaction” (Matsuda, 2015, p. 153) or “relation” (Nelson & Castelló, 2012, p. 34) between writer and reader that are at the heart of the alternative dialogic perspectives of voice.
Toward a Dialogical Understanding of Voice
Polyphony and the Interiorization of Others’ Voices
The first dialogical concept, polyphony, is based on the assumption that all utterances are necessarily voiced because they express someone’s position in discourse. Every utterance is unique because it is produced from a specific position in a concrete social situation and at one point in sociohistoric time and place (Bakhtin, 1984). Vološinov’s focus on intonation conceptually links voice with its literal body-related and sociocultural implications (Bertau, 2007). Voice as a bodily phenomenon is dependent on the sociophysical characteristics and doings of specific speakers. Thus, in the dialogical paradigm, a voice is not conceptualized as a purely intellectual phenomenon but as a formal (in the sense of experienceable, bodily, and multimodal) phenomenon. My analysis below will show how one writer’s expression of a scholarly argument became the starting point of an interdisciplinary debate, which focused not so much on her argument “in itself” but on the way she expressed it (the length of her sentences, the number of abstract terms she used, etc.) and the meanings this expression gave way to.
A core assumption of a dialogical theory of voice is that language is not a neutral system that exists beyond concretely realized voiced utterances: “When a member of a speaking collective comes upon a word, it is not as a neutral word of language. . . . No, he receives the word from another’s voice and filled with that other voice” (Bakhtin, 1984, p. 202). The voice of a speaker is always met by other past, present, and future voices; utterances form dialogical chains. For writing, this means that every piece of text is met with a reader’s (and the author’s own) active, voiced, responsive understanding, while at the same time preserving influential past voices. In Bakhtin’s and Vološinov’s theory of the utterance, this idea goes beyond a two-phase model of interaction between speaker and listener, or writer and reader: “Dialogic relationships can permeate inside the utterance, even inside the individual word, as long as two voices collide within it dialogically” (Bakhtin, 1984, p. 184, italics added). This passage speaks of a dialogical relation of voices—ways of speaking, expressing positions, and making meaning—in an utterance. Over time, collective voices emerge out of the many social interactions of a collective of speakers: as typified and mutually recognizable ways of speaking and being spoken to, and as positioning oneself and being positioned in the respective community (Bertau, 2021; Prior, 2001). It is the response, the “counterword” (Vološinov, 1973, p. 102), that makes a word meaningful, and in this way, every utterance is “half someone else’s word” (Bakhtin, 1981, p. 293). My following analysis will show how single sentences in the focused writer’s dissertation synopsis are met by different “counterwords” –those of the writer herself, including the imagined responses of her intended readerships, and those of her two colleagues from other disciplinary backgrounds, who bring her text into dialogue with a number of new voices, for example, the collective voices of their own disciplines with the respective expectations and evaluations.
When the personal or collective voices a speaker or writer responds to are from another time and place, the speaker or writer remembers, anticipates, or imagines them—in the form of a re-enactment or “reenvoicing” (Prior, 2001). This is particularly pertinent when others are not physically present—although such reenactments are not exclusively tied to the written mode (Dysthe, 1996). Voices can “pass” or “wander” through modes of expression, such as in written academic texts that evolve throughout a variety of moments “as a confluence of many streams of activity: reading, talking, observing, acting, making, thinking, and feeling as well as transcribing words on paper” (Prior, 1998, p. xi).
The second dialogical concept, interiorization, has to do with a psycholinguistic focus that Vološinov adds to the dialogical concept of voice. He introduces the notion of “inner dialogue” (Vološinov, 1973)—resonating with the concepts of interiorization of others’ words and of “inner speech” as developed simultaneously by Vygotsky (1987; Vygotsky & Rieber, 1997) in the context of his interest in sociogenesis, the development of the person by and within social relations (Bertau, 2011; Bertau & Karsten, 2018; Wertsch, 1991). With the connection of polyphony and interiorization, I follow previous works at the intersection of language and psychology (e.g., Bertau, 2011, 2014; Bertau & Karsten, 2018; Prior, 2001; Wertsch, 1991).
Interiorization—as a twin concept to the idea of reenvoicing introduced above—is the developmental process by which a speaker gradually takes on the words of another person. Language activity, following Vygotsky, can take two basic forms: with others (interpersonal) and without others (intrapersonal) (Bertau, 2011). Whereas in interpersonal speech, the language activity is distributed among speakers, in intrapersonal speech, the single speaker reenacts the activity of former co-present others more or less overtly—very often in the form of an inner dialogue, which is abbreviated and semantically condensed, yet still bears the dialogical characteristics of social discourse (Vygotsky, 1987). In other words, the personal or collective voice(s) performed by others in interpersonal discourse become reenvoiced in inner speech. Ultimately, in this process, the voice of the other can become “hidden” entirely, as in Bakhtin’s idea of “hidden dialogicality.” Imagining a dialogue between two persons in which the words of one speaker are omitted, Bakhtin (1984) observes “[The second speaker’s] words are not there, but deep traces left by these words have a determining influence on all the present and visible words of the first speaker” (p. 197).
The process of interiorization is most apparent in the early development of children’s speech, where the imitation of others’ voices by a child is a central developmental practice. As a developmental mechanism, imitating a voice allows a child to “slip into” the perspective and social position of another person (Bertau, 2008, p. 103). 3 Because qualities of the other’s way of speaking are kept in this process, the taking-on of an alien voice widens the child’s possibilities for speaking, thinking, and acting. It allows the child to participate in and carry out practices that the child would not yet have been able to participate in otherwise. Along this process, individual consciousness and self are shaped as “social contact to oneself” (Bertau, 2008).
Taking on others’ voices—explicitly, in the form of imitating or citing, or more implicitly, in the form of hidden dialogicality—is not only observable in ontogenesis but also in microgenesis, that is, in moments of learning and development that occur later in life (e.g., Prior, 1998). Again, it is important that voices can be taken on and responded to across modes (e.g., oral, inner, and written, combined with other meaningful multimodal expressions including gestures, postures, voicings, and more). In the case of my analysis below, voices from the feedback interaction between the writer and her two colleagues from other fields not only influenced the writer’s textual voice but also her sense of self as a writer and disciplinary scholar. The concept of interiorization—conceived as a re-enactment of others’ personal and collective voices with a potential for development and formation of the self—ultimately lays the conceptual foundation for the claim that writing is multivoiced, both “within” a text and across all stages of “text production.” In other words, written texts have a complex socially grounded history of coming-into-being (Karsten & Bertau, 2019). This history is central for a writer’s self not in terms of a momentary act of identity construction, but as a process of ongoing social formation. Polyphony in writing can be studied by attending to the “deep traces left” (Bakhtin, 1984, p. 197) by others’ voices along the course of history.
Four Principles for Analyzing Voice(s)
Building on the outlined theoretical approach, Karsten and Bertau (in press) have formulated what we have called “Basic Principles to a Voice Methodology.” 4 The four principles are the methodological consequence of a dialogical theory of voice within which the concepts of polyphony and interiorization are central. These concepts will guide my following analysis—not in the sense of top-down analytic categories, but rather in the sense of characterizing how I identify and “pin down” voices to make interiorizing and polyphonic processes visible. According to our schema, voices are (1) transpersonal, (2) historical, (3) positioned, and (4) formed.
Transpersonality means that no voice can be fully grasped by looking at its singularity and at the individual performing that voice. Since voices echo and project other personal and collective voices, they are related to others’ utterances by the dialogical processes of interiorization and polyphony. In my analysis, I focus on traces of this transpersonal character—from explicit and more implicit citations and reenvoicings to collective voices. Evidence comes from, among others, pronouns (e.g., “one,” “I,” “they”), normative statements (e.g., “must,” “should”), discourse markers (e.g., “well,” “like”), verba dicendi, and formal resemblances between historically related utterances (e.g., when a participant takes up something that was said earlier by someone else in a similar way of speaking).
Historicity means that any voice—through interiorization and projection—transcends the present speech event. This results in an analytical focus on developmental processes with regard to voice: Where does a voice come from, how does it develop, and what functions does it fulfill in what moment? In the analysis, I trace voices through successional situations (e.g., a text-feedback situation and the feedback-receiver’s subsequent reflections on their following text revision in a retrospective interview), focusing on the meaning and function of such “traveling” voices along their trajectories.
Positionedness highlights that performing a voice is an evaluative move, laden with “evaluative accents” (Vološinov, 1973). Analytically, the principle of positionedness leads to a focus on the exact addressivity constellation of an utterance (Bühler, 1990): Who speaks to whom (as whom) about what and in what way? I trace how voice performances shape such constellations in a very subtle manner, beyond the apparent constellation of who is actually present in the communicative situation. Voice performances often insinuate nonpresent past, future, and typified addressees as well as the anchoring, yet ever-changing, collectives or communities, who need to be taken into consideration analytically to grasp the evaluative meaning of an utterance.
Formedness refers to the idea that in order to be recognizable as a voice, the phonological, lexico-syntactic, intonational, bodily, spatial-visual, and other perceptible characteristics of a speech event need to form a pattern that is intersubjectively associated with meaning—a symbolic gestalt. This means that, analytically, the “amalgamation” (a term Matsuda, 2001, has coined with regard to voice) is central, when attending to evidence such as genre-typical forms, semantic fields, syntactic constructions, intonational, rhythmic, gestural, and visual patterns, pauses, etc. in the material, as well as being attentive to emic sense-making, that is, which voice gestalts are meaningful to participants as “insiders” of a community or collective (Lillis, 2008).
Putting a Dialogical Understanding of Voices to Work: Material and Method
Participants and Procedure
The material analyzed below involves three participants, Johanna, Anne, and Rebecca, who took part in a university teacher-training seminar on academic writing. I focus on Johanna here, a university teacher and PhD student in the humanities. Participants of the teacher-training seminar were asked to write synopses of their qualification projects in order to obtain interdisciplinary text-feedback from two other early-career researchers. In Johanna’s case, these were Anne and Rebecca from the department of natural sciences. The interdisciplinary text-feedback was immediately followed by an individual text-revision phase. Both the text-feedback conversations and the following individual text revisions were videotaped as a basis for later video-confrontation interviews between each participant and myself. I also collected Johanna’s text before and after peer-feedback and revision. The variety of Johanna’s material is described in the next section. Participation in the study was voluntary, and informed written consent was obtained from all participants. All participants’ names are pseudonyms, and information that could identify the persons (discipline, title, and topic of the dissertation synopsis) has been redacted.
Layered Material
The material, which I draw on in the following illustration from Johanna’s case, involves five successive moments: (a) Johanna’s writing of her dissertation synopsis in its first-draft form as it became the object of peer-feedback, (b) the feedback by Anne and Rebecca on the synopsis (Figure 1), (c) Johanna’s immediate revision of her text and, as a result, the synopsis in its revised form, (d) a video-stimulated interview with Johanna on the peer-feedback interaction (Figure 2), itself videotaped and transcribed, and (e) a second part of the same interview (also Figure 2), focusing on Johanna’s text revision, also videotaped and transcribed. There are three video products resulting from these moments: a 32-minute video of the peer-feedback interaction, a 15-minute video of Johanna’s revision phase, and a 123-minute video of Johanna’s and my video-stimulated interview based on the two former videos. Four textual products are involved: the first and the second draft of Johanna’s synopsis (Tables 1 and 2) as well as one transcript of the peer-feedback conversation (Tables 3 and 4) and one transcript of the video-stimulated interview (Tables 5 and 6). The video-stimulated interviews were performed according to the video-confrontation method I have developed for the dialogical study of writing processes (Karsten, 2014a, 2014b), which is methodologically inspired by the simple self-confrontation method from French sociocultural work psychology (e.g., Kostulski & Kloetzer, 2014). All interactions took place in German. The parts of the material used for this article are rendered in German and in translation, preserving as much of the original lexico-syntactic and intonational characteristics as possible. Transcripts follow the conventions of the GAT2 system (Selting et al., 2009). 5

Interdisciplinary feedback between Anne, Rebecca, and Johanna (from left to right).

Video-confrontation (Karsten, 2014a, 2014b) between Johanna and Andrea (from left to right).
Sentence A Before and After Feedback.
Sentence B Before and After Feedback.
Transcript 1: Interdisciplinary Feedback I.
Transcript 2: Interdisciplinary Feedback II.
Transcript 3: Video-Confrontation Interview I.
Transcript 4: Video-Confrontation Interview II.
Such a layering of moments gives methodological weight to the discourses surrounding, preceding, and following an academic text like Johanna’s and—through new dialogical situations—allows the interiorized and projected voices involved in the polyphonic becoming of the text to surface. I, thus, follow researchers who have used the dialogical connections between utterances in different speaker- or writer-audience constellations in a similar way to reveal voices that feed into the becoming of written texts, for example, Ivanič’s (1998) case studies with mature students; the two ethnographic studies reported in Lillis (2008); Olinger’s (2011, 2016, 2020) interview- and video-based ethnographic studies in which she studied, among other things, how visual embodied actions are essential to understanding writers’ experiences with particular writing styles; and Prior’s varied work in this area (e.g., Prior, 1998, 2004).
Illustration: Johanna’s Case
Textual Changes
The following analysis concentrates on two sentences from Johanna’s synopsis. These two sentences became the object of the interdisciplinary feedback between the participants and the subsequent video-confrontation interview with me. The two sentences belong to the same paragraph but are separated by a short passage (marked with [. . .]), where Johanna formulates her research question. The parts where Johanna introduces changes to her text after the interdisciplinary feedback from Anne and Rebecca are underlined in the pre-feedback and post-feedback examples.
At first sight, the changes are minor and do not go beyond a superficial stylistic level. Johanna splits up Sentence A at the beginning of the subordinate clause, substituting “whereby” with “Thereby.” In Sentence B, Johanna exchanges the verb “exemplifies” with “illustrates.” An analysis of textual voice alone would only reveal that Johanna tries to express her ideas in a syntactically and lexically less complex way. However, the subsequent voice analysis at a micro-level reveals that Johanna enters into several dialogues at once and that her stylistic changes have a complex dialogical history that involves interiorized voices of different kinds and thus projects a polyphony of values and expectations that characterizes Johanna’s writing.
“I Wonder if All Sentences Should Be That Long.”
Transcript 1 (Table 3) presents an extract from the interdisciplinary feedback between Johanna, Anne, and Rebecca, which took place immediately before Johanna’s text revision. This extract reveals voices that feed into Anne’s impression of Johanna’s text.
Anne’s first commentary envisions the length of Johanna’s sentences (Lines 01–02). In expressing her doubt about the adequacy of Johanna’s long sentences, Anne invokes both a personal voice (“but i also wonder,” Line 01) and a collective normative voice. This normative voice is apparent from the use of “should” (Line 01) and intermingles with Anne’s personal voice. It makes generalizations about Johanna’s text (“all sentences,” Line 01; “ALL in all, they are yeah (7.0),” Line 06) and transports widely familiar stereotypes about writing styles in the humanities. The normative voice apparent in Anne’s first commentary might also echo a powerful collective voice that Lillis (2008) finds judging “good writing” in terms of sentence length in the context of non-native writers of English handing in manuscripts to high-status Anglophone journals. Maybe the fact that Anne comes from a natural sciences discipline, where publishing in English is more common than in Johanna’s humanistic discipline, influenced Anne’s impression of Johanna’s text.
Anne’s second commentary qualifies Johanna’s sentences as having “so many [. . .] PENdants” (orig. “anHÄNGsel”; another translation could be “attachments” or “appendages”) (Line 05). The German term is a derogatory derivative of the verb “anhängen” (being attached) implying that the appended thing is small, superfluous, or of low quality. Anne specifies that it is the “juxtaposition[s] of affect and reason and [. . .] of will and concrete realization” (Lines 16–17) that informed her impression. Although Anne concedes (“okay,” Line 22) that she “can understand the sentence now,” on second reading (Line 24), she still repeats her position that “such kind of ((unintelligible)) ((are)) strenuous to READ” (Line 25). Her turning toward Johanna and Rebecca while saying this comment marks this turn as expressing a more definite and “official,” probably collective, position than the rather self-directed concession in Lines 22 and 24, which can be associated with Anne’s personal reader’s voice.
In the next step, Anne raises the issue whether Johanna’s lengthy and complex style is determined by disciplinary norms or personal skill: “but i don’t know °hh to what extent there are also (–) uh ***logists that rather (1.4) [. . .] write more clearly so to speak?” (lines 27–31). With this, Anne tentatively positions Johanna as someone who might lack the skill to write in a less complicated writing style. This evaluation is envoiced cautiously from a subjective stance, and marked by relativizing and distancing constructions (“but i don’t know,” Line 27; “rather,” Line 29; “so to speak,” Line 31)—an indicator that Anne is well aware of the delicacy of her implication. But the utterance also elicits a more objective voice. The constructions “there are” and “to what extent” echo the supposedly objective and neutral collective voice of empirical observation. This voice of objectivity calls into question the stereotype of non-English and humanistic writing style as a disciplinary fact. It refers back to transdisciplinary norms of good writing (“should,” Line 01), but it takes a more neutral and a bird’s-eye stance on how ***logists really write, asking “to what extent” an outside observer would find ***logists that write less complexly than Johanna does. In a polyphonic intermingling of personal (Anne as an attentive reader and as a polite feedback-giver) and collective voices (the voice of critique of a wordy humanistic writing style, the voice of neutral empirical objectivity), Anne’s perspective is constructed: Being a ***logist should not be an excuse for Johanna to produce long and complex sentences.
“A Little Light Flashes.”
Transcript 2 (Table 4) is the direct succession of the interaction in Transcript 1. The extract centers around Johanna’s reaction to Anne’s feedback.
Johanna’s reaction to Anne’s assessment of her text is twofold. First, she affirms that the text in its current state is still a draft that needs to be revised (Line 35) before it is given to any reader in the discipline: “of course the text now is not (-) uh (-) not what i would hand in right?” (Line 34). She positions herself as a fellow writer with the personally engaging voice of a peer writer, calling for understanding and affirmation by the others (“of course,” “right?,” Line 34). In a second step, Johanna enacts another voice that is part of her disciplinary identity. She explains that her way of expressing her ideas in the text is actually purposefully directed at other ***logists because the seemingly excessive abstract nouns “affect,” “reason,” “will,” and “concrete realization” are discursively laden terms that invoke strands of scholarly traditions and disciplinary discourses (lines 37–42); they address powerful collective disciplinary voices. Note Johanna’s switch from marking this fact as “natural_” (Line 37) to explicating this hidden dialogicality for Anne and Rebecca as disciplinary outsiders, explaining that what she writes is addressed at her community, “for ***logists” (Line 37).
The exact nature of the abstract terms becomes the object of a brief conversational negotiation. Johanna first explains her intended readers’ expected reaction in a metaphorical way: “for ***logists (.) a little light FLASHes when you hear things like affect and reason will and realization right?” (lines 37–38). Since Anne misunderstands this in terms of “an alarm signal?” (Line 40), Johanna further explains: “They are like/such discourses” (Line 39). Rebecca offers another understanding: “they are CONcepts” (41). Finally, Johanna adds: “there are strands of tradition attached” (Line 42).
In this second reaction to Anne’s feedback, Johanna speaks with the voice of a disciplinary scholar who affirms to have purposefully designed her text for disciplinary readers to understand her mastery of discursive strands of tradition by “pressing the right button.” This voice does a twofold job in the conversation: It both gives a reason why Johanna’s writing style is not a personal flaw but makes sense from a disciplinary perspective, and it explains to Anne and Rebecca as disciplinary outsiders that the complexly juxtaposed terms are not only disciplinary concepts, but they also refer to strands of scholarly discourses like orientation lights pointing the way for disciplinary readers.
With just a look at the feedback interaction, it could appear as if Johanna’s disciplinary voice, which addresses the collective voice(s) of her community, is strong, and produces a valid rationale for her textual voice. However, Anne’s feedback seems to have a great impact on Johanna’s text revision. The reason for that might be that Anne’s critical utterances exhibit a polyphony including collective voices with high degrees of social power: voices that (re-)enact widely familiar judgments of writing in the humanities and the ideal of objectivity in academia. The extracts given above from Johanna’s text before and after feedback (Tables 1 and 2) are indicators that Johanna must have interiorized and responded to these powerful collective voices that Anne enacted.
“Now That Makes It Easier of Course.”
In this section, Transcript 3 (Table 5), which is an extract from the video-confrontation on the text revision, sheds further light on which voices fed into Johanna’s revision process.
Line 43 is a typical instance of video-confrontation in that the writer-and-interviewee, Johanna, laughs while watching the video of her earlier writing process. Very often such short laughs (as well as other para-verbal or nonverbal expressions) indicate a dialogical reaction to the activity on the video. Such reactions are useful entry points to the mostly tacit inner dialogues that accompany and guide the writing process in the interview (Karsten, 2014a). Accordingly, in Lines 44 and 45, I respond to Johanna’s brief laughing by offering a new instance of co-analysis: “then this is the sentence that they read out to you earlier?” (Lines 44–45). As becomes clear, the sentence Johanna reacts to in the video-confrontation is the one that was the object of the feedback conversation rendered in transcripts 1 and 2 (Sentence A). Some moments later and after more laughter from Johanna, I stop the video (Line 49) in order to facilitate a dialogue about what just happened tacitly during Johanna’s writing process that was captured on the video. Johanna jumps in with a direct statement of the observation that caused her reaction to her earlier writing activity: “exemplified [. . .] replaced by ILLustrated” (Line 50, cf. sentence B). Johanna quite directly adds her present evaluation of this revision move: “interesting” (Line 51). My laughter (Line 52) supposes that Johanna’s choice of “interesting” as a qualifier for her revision move has an ironic quality and points toward some doubt regarding the appropriateness of the exchange of “exemplifies” with “illustrates.” Consequently, we conversationally establish that this change of wording—and of textual voice—is both meaningful and questionable. In the remainder of the extracts of our video-confrontation given here, Johanna and I unfold the dialogical meaning and purpose as well as the issues attached to this seemingly small lexical change in Sentence B.
Johanna elaborates on her doubts about the change of wording, again in an ironic voice: “of course that makes it easier now?” (Line 53). Her laughter as well as the rising intonation at the end of the turn are signs that her current evaluation of her actions is at odds with the goal she seemed to have pursued during revision. Note that Johanna formally uses a question intonation, but the utterance is not directed toward me as a conversation partner, that is, I am not being asked to judge if “illustrated” is more straightforward than “exemplifies.” From its form and its place in the conversational flow, Johanna’s voice instead utters a critique directed toward Johanna’s past self during revision. With it she performs a “cross-chronotopic [. . .] infringement” (Karsten, 2014b, p. 497). Cross-chronotopic infringements are critical developmental moments when interviewees during video-confrontation cross the boundaries of the here-and-now and speak to themselves in the past time-and-space on the video. Johanna’s ironic comment makes clear that in her current time and place, she would have chosen a different textual voice. Johanna’s turn also shows that making her text “easier” (Line 53) to read is the overall objective of her text revision—pointing to a hidden dialogical response to the critical voices performed by Anne in the feedback interaction.
Even though I am not primarily addressed by Johanna’s commentary, I present a controversial position toward her doubt if substituting “exemplified” with “illustrated” really makes her text easier: “well, maybe a LITTle bit it does. it is still a loanword but” (Line 55–56). My turn echoes collective voices about “good writing style”: It makes one of the possible meanings of “easier” (Line 53) explicit, namely, the restrained use of loanwords. My point is that “illustrated” is a more widely used loanword, also present in German everyday register, whereas “exemplified” is a scholarly term reserved almost exclusively for the academic sphere. With my objection and the argumentative markers “still” and “but” (Line 56) I reenact and respond to the discordant personal and collective voices in the feedback interaction rendered in Transcripts 1 and 2.
Johanna elaborates on one of her positions within this controversy. She does so in a prosodically demarcated way, starting with a turn-initial discourse marker introducing an objection (“well,” orig. “n(a)ja,” Line 58). This supports the interpretation that what follows is one voice in the controversy made audible. This voice has a commenting or elaborating character toward my challenging echo of the collective conventions of “good writing style” (“well the thing is,” Line 59). With the help of a disciplinary voice, Johanna positions herself as an expert in her field of ***logy, who can judge the suitable choice of loanwords beyond merely attending to stylistic considerations and can explain this to a disciplinary outsider like me (now) or Anne and Rebecca (before). As this expert, she clarifies: “exemplified there is um of course exemplar involved, and especially in the field of {mythological figure} research exemplar tradition is um an important °h topos?” (Lines 60–61). This disciplinary voice backs up her decision as a writer: “that is why i first used exemplified, because {historical author} really uses her {=mythological figure} as an exemplum?” (Lines 62–64), granting her textual voice a particular polyphonic history situated strongly in the discourses of her field. In Johanna’s explanation, several more “CONcepts” (Rebecca, Line 41), which belong to the hidden dialogicality of “discourses” (Johanna, Line 39) in Johanna’s text, come to the fore. All are terminologically related to the formulation in question, “exemplifies”: “exemplar” (Line 60) and “exemplar tradition” (Line 61) as well as “exemplum” (Line 64). Johanna’s explanation is uttered with a voice that enacts an expert and knowledgeable position (“of course,” Line 60) and shows two things. First, a decision for an “easier” (Line 53) wording and being responsive to the everyday collective voice of “good writing style” would be at odds with disciplinary meaning. Second, disciplinary meaning is dependent on linguistic form, such as using the word stem “exempl-” rather than another supposedly synonymous form like “illustrat-.” Only through form—and thus: a certain collectively recognizable textual voice—authors like Johanna are able to make “a little light FLASH[. . .]” (Line 37) for other members of the community. The fact that Johanna changed her textual voice during revision and used the “easier” form (“illustrated”) instead of her complex but discursively more appropriate textual voice (“exemplified”), is a signal that her writing and revision process is polyphonic and her intended addressees shifted. Her writing responds to and elicits both collective and personal voices related to expertise in her field (i.e., Johanna’s voice as a disciplinary scholar responding to the disciplinary discourses “behind” terminological markers) and the critical personal and collective voices interiorized from Anne’s feedback identified in Transcript 1 and in parts echoed by myself in the video-confrontation dialogue.
“On Top of Everything”
Transcript 4 (Table 6) renders the continuing dialogue during the video-confrontation, where Johanna and I reflect further on the choice of voice in Johanna’s text and her related evaluative positioning.
The excerpt is a typical example of the new as well as refracted polyphonies that co-analysis in video-confrontation can create (cf. Karsten, 2014b). It starts with some conversational work, where Johanna and I secure our mutual understanding—though only a partial understanding in my case as a disciplinary outsider—of the difference between “exemplum” (Line 64) and “illustrative example” (orig. “Beispiel,” Line 66).
In German, “Beispiel” (Line 66) does not exhibit the same Latin word stem “exempl-/exampl-” as it does in English, which is characteristic of the disciplinarily appropriate way of expressing the idea in question. In Line 75, Johanna sets out to start an explanation of her motives during revision (“and therefore °h i have then”), but after a pause she concludes her turn with a finalizing “yes.” Such a “yes” with falling intonation is often used in interpersonal dialogue to mark the end of an argument without agreeing on a consensus, thus suppressing one’s own or the dialogue partner’s objection. This response hints toward an inner struggle of voices and conflicting evaluative stances during Johanna’s revision phase, which is partly reflected in her explanations of her motives in the video-confrontation.
Johanna takes up her self-interrupted explanation of Line 75 in the next turn and now uses extensive prosodical and nonverbal indexicals to characterize her positioning as a strong here-and-now opinion: “I am now dissatisfied, ((points at herself with pen)), that i have THEN picked ILLustrated?” (Lines 76–78). The “I” in “I now” (Line 76) is prosodically stressed and thus opposed to “i (. . .) THEN” in Line 78. For Johanna, who uses gestures with restraint, pointing to herself with a pen (Line 77) is a strong nonverbal localizer of the origin of the present voice expressing overt dissatisfaction and disaccord. She backs up this “counterword” (Vološinov, 1973, p. 102) with a variant of her explanation of the difference between “example” (orig. “Beispiel”) and “exemplum” (Lines 80–81), speaking with a disciplinary voice. This time, the explanation involves a critical assessment of her changed wording: “that uh does not show it quite right” (Line 80). Thus, her countervoice is enacted from the position of a critical reader in scholarly text critique. The symbolic gestalt of Johanna’s critical assessment can be read as polyphonic, since in addition to Johanna taking the stance of a disciplinary expert, the gestalt recalls voices typical of supervisors and “the difficulties that supervisors face when they try to turn their procedural or practical knowledge of disciplinary writing into declarative or teachable knowledge” (Paré, 2011, p. 60). It is in line with Paré’s observation about supervisors’ feedback that Johanna’s critical assessment of her own textual choice remains rather vague.
In Lines 82–84, Johanna introduces conclusion markers again (“but (. . .) fine.”), as if to soothe the pending conflict between voices. Yet again, after a longer pause (Line 85), a new voice or a new quality to her critical voice enters the scene in Line 86. It can be described as lighter than the critical voice in Lines 76–78 and 80–82, characterized formally by laughter and semantically by a more complaisant stance: “whatever i was [. . .] thinking” (Line 86). This voice resembles Johanna’s ironic voice identified in Transcript 3. Our mutual laughter in Line 87 resolves the tension that the conflict of voices seems to have created. Like before (Line 55), I offer one possible explanation of what Johanna could have been thinking (“maybe for the others, or,” Lines 88–89), referring to Anne’s and Rebecca’s needs as readers and disciplinary outsiders. Johanna considers this option, as shown by the pauses in Lines 91 and 93 together with the particle “hm” in Line 90. She then utters a number of concessive response markers, characterized by a falling intonation, as if to close an argument or controversy: “yes, uh, all right.” (Lines 94–96). It is probable that Johanna performs a more or less explicit inner dialogue –of which only the end becomes voiced –as a response to my tentative explanation.
She then gives an overt and other-addressed answer, combined with a change of gaze toward me (Line 99): “yes well probably i wanted to avoid, having to explain now on top of everything what an exemplum is” (Lines 97–98). Even though Johanna addresses this explanation of why she chose “illustrated” instead of “exemplified” to me and uses past tense (Line 97), the construction “now on top of everything” (orig. “jetzt noch”) (Line 98), gives a glimpse of the quality of her responding voice during revision. Indirectly, this voice coaddresses Anne and Rebecca as peer-feedback givers and nondisciplinary readers. This concluding turn of the sequence produces the subtle impression that while revising her text, Johanna only reluctantly gave in to the interiorized comments by Anne and the choir of personal and collective voices that call for a less strenuous textual voice in her text as well as in texts in the humanities in general. Throughout the revision process, Johanna’s disciplinary voice as an expert in her field as well as the collective voices of her fellow “***logists, for whom she wants to make “a little light FLASH[. . .]” (Line 37) with her choice of textual voice, have been present, too. As a result, Johanna’s revised text responds to the interiorized personal voice of Anne and is in a polyphonic dialogue with a greater number of personal (probably other disciplinary feedback-givers and supervisors) and collective voices (the discipline, the wider academic community calling for clear writing)—while others are more pronounced in the writing process or the text, while others are more hidden, leaving traces via Johanna’s responses to them.
Synopsis: An Array of Voices
Throughout the various moments of peer-feedback, text revision, and video-confrontation, a number of personal and collective voices can be identified. They all feed into the formation of Johanna’s textual voice—or rather, the polyphony of her writing—and its further trajectory in video-confrontation. The following is a list of the most important of these voices that surface in the analysis:
Transcript 1 (Table 3):
Anne’s personal voice as an interdisciplinary reader and peer feedback giver, directed at Johanna-as-peer
A collective voice advancing stereotypic judgments of the wordy writing style in the humanities, enacted by Anne-as-natural-scientist, witnessed by Rebecca-as-natural-scientist
A collective voice of empiric objectivity in academia, enacted by Anne, directed at Johanna
Transcript 2 (Table 4):
Johanna’s personal voice as writer and fellow early-career researcher, directed at Anne and Rebecca as her peers
Johanna’s personal voice as a disciplinary expert, directed at Anne and Rebecca as disciplinary outsiders
Collective voices in the discipline of ***ology, addressed by Johanna through hidden dialogicality
Transcript 3 (Table 5):
Johanna’s here-and-now personal voice as a critical reader of her own text, ironically directed at her past self, witnessed by Andrea
Johanna’s personal voice as a disciplinary expert, directed at Andrea as a disciplinary outsider
A collective voice of “good writing style” in German academic discourse, echoed by Andrea
Transcript 4 (Table 6):
Johanna’s here-and-now personal voice in video-confrontation of being dissatisfied with her own text, directed at her past self, echoing typical supervisors’ voices in academia
Johanna’s soothing or appeasing voice, calming down a refracted inner dialogue of conflicting voices during writing
Johanna’s here-and-now lighter and more complaisant voice as a critical reader of her own text, witnessed by Andrea
Johanna’s past personal voice as a peer writer and fellow early-career researcher, as enacted by Johanna before (Transcript 1) and proposed by Andrea now (Transcript 4)
Johanna’s refracted inner voice as a disciplinary writer, co-addressing Anne and Rebecca as nondisciplinary readers and peer-feedback givers
This overview of the various personal and collective voices that are involved in Johanna’s writing gives insights into the transpersonal history of Johanna’s textual voice. Johanna’s writing involves, among others, the personal and collective voices representing other disciplinary values and norms as performed by Anne. These voices reappear later in an interiorized and self-addressing form in Johanna’s revision process as her guiding objective to make the text “easier” for interdisciplinary readers. In this process, these voices are countered by disciplinary voices from the field of ***ology, including Johanna’s personal voice as a disciplinary expert, but also the interiorized and projected voices of her fellow ***ologists, critical readers, and supervisors, as well as the collective voices of scholarly discourses in her field. These voices, too, introduced a range of evaluative stances to the negotiation of meaning and style associated with Johanna’s textual voice(s).
The video-confrontation interview makes understandable how Johanna’s stance toward her own text—through time and addressivity constellations—oscillates among all of the voices involved in her writing process. Her conflicting ideas of the “right” textual voice—and of herself as a writer, PhD student, and disciplinary scholar—develop gradually, not so much out of but as the complex relationships between her own and others’ personal and collective voices.
Conclusion
In the overview of perspectives on voice given above, I have argued that even though Bakhtinian conceptualizations have increased in the last two decades, there are still some theoretical and methodological inconsistencies in current dialogical studies of voice in academic writing. Some of these can be addressed by more detailed attention to the two core concepts polyphony and interiorization and the methodological implications these concepts suggest.
In the dialogical view of language that I have adopted in this article, writing and reading are not considered to be isolated constructive acts; they are seen as multivoiced events of address-and-response. Consequently, voice is not a thing set apart from such an event or its outcome, but it is its specific semiotic form. What a dialogical understanding of discourse thus has to offer to current conceptions of voice and identity in academic writing is a de-reifying notion of voice grounded in the principles of transpersonality, historicity, positionedness, and formedness. In this vein, it is true that voice is not “somehow in the language of the text” (Nelson & Castelló, 2012, p. 34). Voice is the language of the text in the sense that “language itself is understood as these wide-reaching dialogical relationships, echoing, questioning, re-taking, and altering each other in each specific moment of being uttered” (Karsten & Bertau, 2019, p. 13).
The Bakhtinian and Vološinovian idea that utterances form chains and “conserve” other’s voices leads to the conceptualization of all utterances not only as voiced, but as polyphonic. Moreover, typical ways of speaking in a community “transport” powerful collective evaluative stances in the form of collective voices. The notion of the collective voice—not in the sense of instantiating such a typical way of speaking, but in the sense of coproducing and coconstituting the communal (Prior, 2001, p. 72)—differs from the idea that writers make use of depersonalized social repertoires of language features to construct their textual voice (which, in turn, can be reconstructed and inferred by readers). This notion comprises that every act of speaking/writing and listening/reading involves social acts of identification, attribution, and evaluation. The Vygotskian notion of interiorization—understood as the ongoing becoming of persons in and through social voice practices—reintegrates the psychological, which was a main focus in the individualistic or expressivist tradition, into a theory of voice in writing, but without romanticizing it in terms of the writer’s true individuality. Such a view results in an understanding of the self as a process of ongoing social formation that reaches beyond situational acts of identity construction (Bertau, 2021).
Acknowledging interiorization in terms of the polyphonic history of every act of writing implies that not only texts but also the writing process itself is multivoiced. Therefore, methodologically, taking polyphony in writing seriously means attending to the social and psycholinguistic processes of addressing and being addressed, replying and being replied to, positioning-as and being positioned-as. Dialogical methodologies like video-confrontation can help to unfold the history of voices—performed, interiorized, and revealed in their forms across various constellations of speaker/writers and listeners/readers. These kinds of methodological approaches open up a wide field of possibilities for further research on the polyphonic meaning of a text and its author’s identity. Future studies can take up the trend to methodologically “propel” activities with a significant intrapersonal polyphony like writing and reading into new dialogical situations and develop and refine methodologies to study the dialogical refractions and reconstructions that happen there (recently, e.g., Olinger, 2020; Ware, 2022). My own analysis of Johanna’s material illustrates how in the ongoing social dialogue the voices of others become interiorized, and how—as “social contact to oneself”—they become intrinsic to the writer’s academic writing practice and her sense of self in a social field. As a polyphonic experience, the writing self has a rich dialogical history extending over the social, interactional, and psycholinguistic domain. It is this complexity that endows the writing self with a formative potential both for the social and the social-personal.
Footnotes
Acknowledgements
I would like to thank the editors Chad Wickman and Dylan Dryer for their helpful feedback and guidance and the two anonymous reviewers for their appreciative and careful responses both to the theoretical and the analytical arguments of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Part of this research was funded by a starting grant from Paderborn University and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): TRR 318/1 2021 – 438445824.
