Abstract
Images and texts, their qualities as empirical data, and their entanglements, have been the object of methodological and theoretical debates in social scientific research for a few decades now. Scholars have argued that images and texts are fundamentally different to the point of being incommensurable. Contrary to this line of reasoning, this paper suggests that images and texts are similar enough to be studied within the same framework, that of semiotics, understood as a non-reductive approach to the socio-cultural making of aesthetics, performances, and meanings in multi-media arrangements. Unfolding this argument, I do not suggest that ‘images are like texts,’ but instead propose that ‘texts are like images.’ Doing so, I discuss a few of the most frequently assumed properties of images, and how these may apply to texts. As such, this article sets the theoretical perimeters for more specific methodical approaches in social scientific research.
Introduction
Images and texts have long been the object of methodological and theoretical debates in the humanities and the study of cultures and societies in the broader sense (e.g., Boehm, 2004; Bräunlein, 2004; Breckner, 2010; Hartinger, 2007). When it comes to images and texts as primary research data or when their epistemological status is discussed (“What are images?”, “What are texts?”), scholars have often emphasized that images and texts are fundamentally different to the point of being incommensurable (e.g., Morgan, 2020, p. 2). In this methodological paper, I do not deny that images and texts are different kinds of data. But I do suggest that they are similar enough to be studied within the same conceptual and methodological framework. Unfolding this argument, I do not propose that “pictures are like texts” – because this often (and sometimes justifiably) triggers the criticism of ‘textualism.’ Instead, I argue that “texts are like pictures” – at least they are more similar to pictures than is often assumed.
In order to continue this methodological debate in a constructive manner, it is necessary to make a pivotal, but to my knowledge often neglected distinction: The distinction between texts and images as historical or contemporary primary research data in the study of cultures and societies on the one hand, and “text” and “image” as theoretical concepts and epistemological metaphors on the other. This difference is not always clearly marked in these debates: For example, when I state that “images should not be read like texts,” “images” are the empirical or historical data under study, i.e., the object of research. In the same phrase, the term “texts” can refer to both the textual data – in which case one compares two data genres –, or it can refer to the theoretical concept of “text” (in whatever meaning precisely). Conversely, the term “image” in the compound “linguistic image,” for example, does not refer to empirical images but to a theoretical (and metaphorical) concept regarding a certain function of language. “Images” are often described as things in the empirical world (past or present) to which certain properties are assigned. The term “text,” on the other hand, can sometimes refer to primary research data, and sometimes to a more or less extensively elaborated theoretical concept (as, for example, in Barthes’s essay “From Work to Text”, 1979). If these different uses of the terms “image” and “text” are not made explicit, there may be a danger of confusing an empirical object of study with its theoretical representation and vice versa.
In order to prepare the argument that “texts are like images,” at least more similar to images than is often assumed, I firstly summarize some frequently formulated methodological positions on images and texts as primary data of research. Summarizing these positions demonstrates that in the methodologically contested borderline between the data genres of image and text, scholars, perhaps unconsciously and unintentionally, reproduce dichotomies that they have long believed outdated: for instance, nature versus culture, body versus mind, feeling versus cognition – and that the image is often associated with the first side of the distinction (nature, body, feeling).
Moreover, the question of what images are is discussed in the methodological debate against the implicit or explicit comparative foil of the text. Often, scholars assume that it is more or less clear what textual data are (e.g., Banks, 2014; Boehm, 2004; Imdahl, 1996). Thus, provocatively said, approaches to studying images that explicitly distance themselves from a ‘textualism’ of other approaches may be subject to an implicit textualism themselves: what an image is and how it should be studied is often formulated from the perspective of the text and in the medium of the text. For example, Boehm writes: “Pictures possess their own logic, which belongs only to them. By logic we mean: the consistent generation of meaning from genuinely pictorial means. And by way of explanation I add: this logic is not predicative, that is, it is not formed according to the pattern of the sentence or other forms of language. It is not spoken, it is realized perceptually” (Boehm, 2004, pp. 28-29
Apart from the fact that Boehm uses a cognitive textual metaphor (“logic” from Greek λογικὴ τέχνη or λογικός = concerning the word, reason, or thought) to describe images, he also seeks to transfer a “non-predicative” logic of the image into the scientific medium of the text. Like other scholars in the field, he argues that the meaning of images is “not spoken” but “realized perceptually.” Nonetheless, the basic endeavor of this approach is to understand and, eventually, speak and write about the meaning of images extensively and precisely. This is challenging: How could something that is essentially non-predicative (the logic of the image, according to Boehm) be expressed in the realm of language? I agree with Boehm and others that images have something that is non-linguistic, a sort of meaning that we grasp “perceptually,” but I have doubts that this can be the object of social scientific research (phenomenological research tries to do just that). Eventually, and most important of all, texts have this non-predicative, non-spoken aspect as well, a feeling that comes over us when we read a text. Texts, just like images, sometimes have the capacity to affect readers immediately, without any predicative logic involved. But this affective involvement, although it may be framed by socio-cultural conditions, remains in the sphere of the non-social as long as it does not become part of a communicative interaction (Krech, 2021, p. 99). And the non-social cannot be the object of social-scientific research (it can be the object of other disciplines, such as psychology, or phenomenology, to be sure).1, 2
In contrast to approaches like this one, I propose to find an overarching perspective that understands images and texts as data that are different but similar enough to be studied within the framework of a common methodological paradigm, that is: as semiotic arrangements, as elements that offer and realize meaning in their specific empirical and historical configurations. In this line of thinking, Edmund Hermsen suggested already in 2003 that “by means of the idea that iconic and aniconic signs form only different modes of representation […] one can ‘unexcitedly’ resolve the competition between textual and pictorial codes” (Hermsen, 2003, p. 118).
As indicated, there are a number of influential positions that assume that images follow a fundamentally different logic than texts (and language) and therefore cannot be studied with methods developed for textual analyses. This includes Boehm’s approach, already cited, according to which images possess a “logic of their own, belonging only to them” (Boehm, 2004, pp. 28-29). Similarly, Hans Belting urges us to “no longer explain images with texts, but to distinguish them from texts” (2001, p. 15). This approach, also advocated by William Mitchell (1994), is received by authors in the study of cultures and societies and, at the same time, is treated as a counter-proposal to semiotic approaches: “Mitchell leaves behind the semiotic approach, which is characterized by the effort to establish a common basis of text and image via a theory of signs, which in turn is based on the model of ‘language’” (Bräunlein, 2004, p. 205). I support this statement that the semiotic approach seeks to formulate a common ground for the analysis of text and image based on semiotic theories, but I think there is need for a discussion about how such a theory of signs could be further developed and how it uses “language” – and what kind of language? – as a “model.” If one leaves behind a narrow sense of semiotics, referring only to written and spoken language, semiotics can also serve as a theoretical framework for the study of images and other non-textual media. For example, as early as the 1970s Umberto Eco emphasized that a semiotic perspective on culture does not necessarily reduce culture as a whole to text and language: “[…] to reduce the whole of culture to semiotics does not mean that one has to reduce the whole of material life to pure mental events. To look at the whole of culture sub specie semiotica is not to say that culture is only communication and signification but that it can be understood more thoroughly if it is seen from the semiotic point of view. And that objects, behavior and relationships of production and value function as such socially precisely because they obey semiotic laws” (Eco, 1976, p. 27).
There is a lot going on in the empirical world that is non-semantic, but, as soon as these phenomena become part of the social sphere (and, thus, communication), they can be analyzed in terms of semiotics. One last point should be made before I turn to the ‘typical’ qualities of images (as opposed to texts): Both texts and images are far from being homogenous kinds of empirical and historical data. Quite the contrary, they are heterogenous to a degree that it is sometimes difficult to distinguish them at all (graphic signs, abstract images, pictorial languages, etc.). Thus, it might be an idle effort to discuss “images” and “texts” in such a general manner at all. On the other hand, this rather general and broad discussion is only meant as a framework for much more detailed methodological considerations which should then be conducted with reference to specific sets of textual and pictorial data. Another point that justifies a general discussion about “texts” and “images” as data genres is that the respective literature has made this distinction many times and for quite some decades, perhaps even centuries now. The “text/image-divide” runs through quite a number of disciplines and, as such, can only be unsettled from a rather generalized point of view.
Semiotics
As semiotics as a general theory for the study of cultures and societies plays a central role in the argument of this paper, this section gives a very brief and selective overview, summarizing some of the central themes and premises of this approach. The literature about semiotics is broad and abundant, and therefore it is neither possible nor desired to attempt anything close to a comprehensive overview (for general introductions, see, e.g., Eco, 1976; Nöth, 2000; Volli, 2002; Kjørup, 2009; Yelle, 2011). Semiotics, broadly speaking, is the study of signs and meanings. As such, it has its roots in ancient philosophy and epistemology. It is closely connected to linguistics because spoken and written words have often been studied as prime examples of signs, their relationships (syntax), meanings (semantics), and actual use (pragmatics). With the so-called semiotic turn, some basic principles of the analysis of signs have been transferred to the study of cultures and societies (Hall, 2003, p. 6).
A sign, following Charles W. Morris and Charles Peirce, is “everything that, on the grounds of a previously established social convention, can be taken as something standing for something else” (Eco, 1976, p. 16). This formulation entails the three basic elements of a sign: That which stands for something else (the sign proper, or signifier), that which it refers to (its object), and that which establishes the relationship between sign and object (the “social convention,” or interpretant) (Krech, 2012, p. 52).
The semiotic approach to culture studies not only signs and symbols; neither does it consider all of culture to be ‘just’ signs. Instead, cultural processes are considered as elements that in and by their mutual relations constitute meaning and do so by processing media, semantic references, and contextual knowledge (Eco, 1976, p. 7). Semiotics, in Umberto Eco’s view, is concerned not just with language but with “ordinary objects insofar (and only insofar) as they participate in semiosis” (Eco, 1976, p. 16).
This approach entails a shift of focus: While human beings can, in their everyday perspective, use and understand signs, an analysis of sociocultural reality should focus on situations and communications, and their implicit patterns, without primarily seeking to understand individual human intentions and behavior. This also leads scholar to direct their efforts to the question how signs work (instead of what they mean) (Malafouris, 2013, p. 18).
A semiotic analysis can be structured by three aspects: syntax, semantics, and pragmatics. Generally speaking, syntax is referring to the ways elements can be related to each other; semantics refers to the potential and situational meanings of the elements, and pragmatics refers to the specific use of signs and their arrangements in social situations. From such a general perspective, it does not matter much whether the medium of communication is textual, visual, acoustic, or ritual. Indeed, virtually all social situations proceed in many different media at once, and their elements constitute relationships beyond and across specific media. This rather general understanding of semiotics is also at the core of Clifford Geertz’ understanding of culture as a web of meaning (Geertz, 1997, p. 9). It offers a fruitful starting point for understanding cultural processes in terms of their meaning without claiming that there is an essential, and unchangeable meaning to cultural practices. Quite the contrary, cultural (i.e., textual and non-textual) communication can better be described as the constant process of producing and developing meaningful contexts by means of different media, including, but not limited to, images.
Images and Their Properties
Synchronicity versus Diachronicity
Images, according to frequently expressed assumptions, are synchronous and non-linear (e.g., Lengwiler, 2011, pp. 133–136) – in contrast to texts, which are supposedly diachronic and linear. Instead of “synchronous,” some authors speak of “simultaneous” or of the “simultaneous structure” of the image (Imdahl, 1996, p. 23). In other words: In the image, everything is present simultaneously and holistically, whereas in the text, the elements follow each other consecutively. Ralf Bohnsack defines “simultaneity” as the coincidence of non-related events. “Synchronicity,” on the other hand, refers to the dimension of time: the assumption that everything is present at the same time in the image, and not one after the other as is supposedly the case with texts (Bohnsack, 2013, p. 88).
This understanding of images has analytical consequences: a “textual approach to the image is primarily characterized by its sequentiality, by an order oriented towards temporal succession” (Bohnsack, 2013, p. 82). Referring to Max Imdahl (1996), Bohnsack criticizes scholars who approach an image with linguistic models and look at it as a continuous step-by-step succession (Bohnsack, 2013, p. 82). According to Bohnsack and others, in the image everything is present simultaneously and synchronously, “in total presence” (Bohnsack, 2013, p. 82). Sequential methods of interpretation (such as sequence analysis as proposed by Oevermann et al., 1987) should therefore not be adequate for the analysis of images.
However, Bohnsack admits to an aspect of the linearity of images that only comes into play in the methodological procedure – when scholars have to figure out an “entry-point to wholeness by way of simultaneity […]. The path to a simultaneity that grasps the totality of the image can certainly proceed successively” (Bohnsack, 2013, p. 95). His essential methodological approach, thus, is that images must always be analyzed in their entirety due to the simultaneity of their semantics; the examination of just a fraction of the image (analogous to a selected passage from a text or interview protocol) is not sufficient to understand the image as a whole.
The aspect of simultaneity is also emphasized by Gottfried Boehm (e.g., 1987). From the perspective of the reception of an image, he emphasizes that images are not perceived as a number of isolated, consecutive elements but as a simultaneous and holistic sensorial and affective impression. Johannes Grave, however, concedes with reference to Boehm that “images are experienced both successively and simultaneously” (2014, pp. 59-60). He thus assumes a “temporality of the image in its aesthetic reception” (Grave, 2014, pp. 59-60). 3 For my argument, this is a hint that images, not only in their methodological reconstruction (Bohnsack, 2013, p. 95) but also in their perception by viewers (Grave, 2014, pp. 59-60), have a linear, diachronic aspect.
One could now try to argue that pictorial data are linear and diachronic (e.g., because they cannot be produced, viewed, and cognitively processed otherwise). But it seems to me more interesting to show that textual data have synchronous, simultaneous and nonlinear properties, i.e., all those properties that are usually attributed to images.
A simple everyday example of the simultaneity of texts is the genre of jokes. A joke is only a joke if the beginning is understood from the end, the punch line. Therefore, a joke, even if it is told diachronously, has to be grasped holistically in order to be understood (and, eventually, to be analyzed in a way that is adequate to the object under study). Of course, it does not make much sense to literally read a joke from the end to the beginning, but you can only understand it by judging its entire structure, including its beginning, from the end. More generally, this also applies to other textual types and parts of texts: Sentences, paragraphs, chapters, and books are only complete because the beginning has an end. Even if every part of the text may contain the entire structure of the whole text, the text as an empirical case is only complete once the beginning has been grasped from the end and the end from the beginning. Therefore, textual data have a recursive and simultaneous, synchronous aspect.
As for the reception of the text – understood from the perspective of the psychology of perception – psychological studies of reading show that the human eye does not follow the text letter by letter, syllable by syllable, or word by word in a strictly linear, diachronic fashion but simultaneously grasps larger units that are cognitively processed as such (an early study of this kind was done by Just & Carpenter, 1980).
Finally, if we think of the medial aspects of a text, that is, the way it is recorded on paper, leather, papyrus, canvas, etc., and the way it is shown (on posters, in books, in magazines, in digital formats, etc.), it becomes clear that texts can be synchronous and simultaneous in their medial constitution. The showing of the Bible as part of the liturgy is an example of this: The text as a whole is presented simultaneously and synchronously.
In short, the distinction between images as synchronous/simultaneous and texts as diachronic/linear is a simplistic one that does not do justice to the data we find empirically. Both kinds of data have synchronous as well as diachronic properties and the methods of their evaluation must do justice to these properties.
High Information Density versus Low Information Density
Scholars often write about images that they accommodate disproportionately more information in a relatively small space in comparison with texts (e.g., Lengwiler, 2011, pp. 133–136). Karis summarizes the prevailing opinion in recent literature precisely when he writes: “A picture contains more information than could be reproduced in a text, no matter how detailed the text. Pictures could be described down to the details of color, shading, and form, and still a transformation of this descriptive text into a pictorial representation would produce highly different results. Thus, while an image produces an infinite number of image descriptions that can never be exhaustive, an image description can produce any number of images that are all exhaustive” (Karis, 2013, p. 141).
In these approaches, the type of data called “image” is described from the perspective of the text (i.e., implicit textualism). A large amount of information is attributed to the image which is so rich that it can (supposedly) never be completely transferred into textual form. When Marcus Banks tries to justify why image analyses are still rare in the social sciences, quite strikingly he writes: “[V]isual information, visual data, visual results, are simply too messy, too rich, too particular to be reduced to abstraction and linear theorization” (Banks, 2014, p. 405). The dichotomy of emotion versus reason and irrationality versus rationality already alluded to resonate here, but the amount of information (“rich”) is also emphasized, making it (allegedly) impossible to develop an abstract and “linear theory” on the basis of pictorial data.
Still, the thesis about the informational content of images, which is so plausible and intuitive at first sight, can and should be questioned. On the one hand, we need to discuss what exactly is meant by “information.” If one follows a concept of communication based on systems theory, information is the result of a selection within the communicative process (e.g., Luhmann & Baecker, 2006, p. 128). In other words, information is not simply there, for example in a picture, waiting to be retrieved, understood, or transformed into text. It is only constituted as part of a communicative process when a selection of information takes place.
For images, this can mean that they contain a large potential of possible pieces of information but are not already and ontologically full of information per se (this is also true for texts). A particular shade or brushstroke in a painting is thus not information per se, but it can be made information if it is addressed in a communicative context that deals with the painter’s technique. A landscape painting rich in detail can be translated exhaustively into text if the respective communicative context is concerned with, for example, distinguishing landscape painting from portrait painting, because, in such a communicative setting, many details in the picture do not matter at all. Conversely, in texts, features such as font size, sentence length, or punctuation will only become pieces of information if the communicative context requires it. A text contains an ‘infinite’ potential of information down to the etymology of single words, their grammar, semantic associations, and phonetic properties, of which only the relevant part is realized and communicated in specific communication processes. Also, what is “relevant” is not decided by the researchers, but by the social embedding. 4
Finally, the thesis of the abundant information content of images stands in striking contrast to the often vehemently argued thesis that images are first and foremost non-linguistically and affectively effective (see below) and not primarily semantic (and thus potentially full of information).
In short: the assumption of different “information densities” of pictorial and textual data may be questioned considering the concept of information and because numerous texts can be cited as examples which are not “orderly” or “unambiguous” or could be “exhaustively” translated into a picture or even another text. The strands of traditions within and between religions often called mysticism bear eloquent witness to this (think, e.g., of the Jewish Kabbala [e.g., Scholem, 2001], or Christian mysticism [e.g., Heiler, 1934]). Apart from the fact that a good theory need not be “linear” (not to mention what this could mean), all the properties mentioned by Banks (2014, p. 405), i.e., “messy,” “rich,” “particular,” are as valid for texts as they are for images.
Ambiguity versus Unambiguity
Closely related to the assumption of a high information density of images, some authors assume that images, in contrast to texts, are ambiguous. The ambiguity of images – here of television images – is addressed by Karis who, with reference to Hans-Georg Soeffner (2000), assumes that images are generally more ambiguous than texts or language (Karis, 2013, p. 140). Bohnsack also emphasizes the challenge of doing methodological justice to the “ambiguity of the image” (Bohnsack, 2013, p. 87). This assumption is formulated even more strongly from the perspective of “visual linguistics,” for example when Roman Opilowski writes that images have an “open meaning” that must be “concretized” by text (Opilowski, 2011, p. 207).
It may be true for certain kinds of texts that they appear mostly unambiguous in the pragmatic context of their use (for example, news texts, or well-written instructions for use and operation of household devices), but the overwhelming majority of texts and text genres are anything but unambiguous – even if their ambiguity has to be temporarily ignored in social practice (otherwise, social processes would not even begin to work). Scholars who deal with ancient and/or foreign-language texts are regularly confronted with the problem that texts can mean many things and that supposed unambiguity can only be established through what is called “context,” i.e., through pragmatic communicative embeddings, which in turn have to be laboriously (and often incompletely) reconstructed. Methodological approaches such as sequence analysis also show that even simple, at first glance unambiguous text passages from supposedly well-known contexts are fundamentally polysemous and only become temporarily unambiguous through their immediate pragmatic connections which are established in specific social situations.
To summarize, images and texts are equally ambiguous and unambiguous. No sign, and no arrangement of signs, whether pictorial or textual, is unambiguous per se, but must be temporarily ‘defined’ in the specific context of use, and be it only in a vague manner, sufficient for social processes to continue. Sometimes ambiguity is also empirically addressed and discussed and then a sign remains ‘ambiguous’ in social practice: The ambiguity of texts – for instance in the history of religions – often becomes an explicit part of the historical and empirical communication about these texts. For centuries, there have been disputes within religious traditions about the correct interpretation of their textual traditions, even leading to schisms and reformations.
Immediate Affection versus Mediated Cognition
It has already been mentioned that in the bildwissenschaftliche literature and methodologies oriented to it images are understood as objects in the world that are more immediate, direct, and suggestive than texts (e.g., Lengwiler, 2011, pp. 133–136). The previously cited passage in Banks’ article formulates this aspect, too (images are “messy,” “rich,” “particular” [2014, p. 405]). Bräunlein also notes, with reference to Hans G. Kippenberg and Susanne Langer, that non-linguistic sensory experience is autonomous and that images therefore have primarily emotional effects (Bräunlein, 2004, p. 210). Conversely, texts would have to have a cognitive effect and at the same time they should argue in a factual, rather neutral and descriptive manner.
On the one hand, this thesis reproduces dichotomies of mind versus body, cognition versus emotion, culture versus nature, or rationality versus irrationality that scholars have hoped to overcome (e.g., Droogers, 1996, p. 47) and places the image on the side of the irrational and suggestive. On the other hand – and more important for my argument – this assumption fails to recognize the empirical diversity of texts and images. There are images that tend to offer factual-rational information (e.g., in scientific or medical contexts), but there are also texts that tend to be suggestive and affective (e.g., political speeches or poetry). Texts are thus again much closer to images than is often assumed: they can have an immediate affective effect and they can be suggestive, just like some images. But they can also offer rational, rather unemotional content. What texts (and images) are in this regard – emotional or rational – cannot be answered conclusively but always depends on the specific social situation in which the image or text occurs.
Non-linguism versus Linguism
It is largely common sense – both in the everyday world and in the Bildwissenschaften – that images are non-linguistic (e.g., Lengwiler, 2011, pp. 133–136). Still, authors do not always thoroughly elaborate what they mean by “language” and “linguistic.” Instead, many authors argue that the so-called linguistic turn in the social sciences and humanities has contributed to marginalizing images as data. Bohnsack assumes that due to a “paradigm of text interpretation” the investigation of images “must appear less valid from the outset,” because images, “if they are to become scientifically relevant, they must first of all be reformulated into observation sentences or texts” (Bohnsack, 2013, pp. 63-64). In other words, while it might be thinkable to communicate about images in the medium of images, the scientific system usually uses written language (texts) to communicate about images, and therefore the methodologically guided transfer of images into texts is unavoidable (even the bildwissenschaftliche literature now fills thousands of pages with text). That this makes image analyses less valid, however, seems like an unfounded assumption that moreover reproduces the aforementioned dichotomies of cognition versus emotion, mind versus body, culture versus nature. A textual analysis that interprets its data, i.e., written words and sentences, in an uncontrolled and nontransparent manner is just as invalid as a methodically unclear image analysis.
To explain the difference between language and image, literature states that images have no alphabet, no lexicon, no grammar, no “distinct basic units” such as letters, syllables, or words, and thus no “stable meaning-bearing and meaning-distinguishing units” that can be recognized and understood as signs (Grave, 2014, pp. 63-64). “What is identified as motif, object, or figure in the context of a pictorial representation,” Johannes Grave continues, “must rather first emerge from the continuum of visual phenomena in the image and be delineated by the viewer” (Grave, 2014, pp. 63-64). This is true, but it also applies to written and spoken language. Saussure already demonstrated that a (linguistic) sign becomes a sign only because it differs from others and not because it exists a priori as a sign: “Language […] contains only conceptual and phonetic differences that result from the system. What a sign contains in terms of concept or phonetics is less important than what is around it in the form of other signs” (Saussure, 1967 [1916], pp. 143-144; italics MR). In other words, it is not self-evident that a set of black lines and dots on paper represents a letter corresponding to a certain phonetic sound that, in combination with other letters, allows for semantic associations. What is identified as a letter, syllable, or word in the context of a written (or oral) representation must first emerge from the continuum of visual (or auditory) elements in the text or speech and be delineated by the viewer – as anyone who has ever learned to decipher an unfamiliar script knows.
From the assumption that images are not linguistic, many authors conclude that semiotic approaches to images are not fruitful (e. g. Bräunlein, 2009, p. 773). Karis summarizes a central aspect of Visual Culture Studies as follows: “The notion of images as signs or as phenomena built up from individual distinct signs, each of which could be named, is discarded in the perspective of Visual Culture Studies as a relic of early structuralism and replaced by a dynamic understanding of sign-systems. The meaning of an image is therefore not given per se, but results from the temporal and spatial-cultural context of its appearance” (Karis, 2013, pp. 138).
Against the background of what I have just outlined, one would have to add that the same applies to written and spoken language: The meaning of a linguistic sign or a set of linguistic signs is not given per se but arises only in an arrangement of signs. It is not for nothing that dictionaries usually give several equivalents for a foreign word and leave it to the translators to choose the ‘appropriate’ one depending on the ‘context’ (i.e., the arrangement) of the passage to be translated.
Karis further summarizes: “Images are not […], as semiotics inspired by structuralism believed, to be understood as arbitrary signs in an image system only through their relation to other signs. Rather, images can only be understood in the context of their use” (Karis, 2013, p. 140; see also Bräunlein, 2009, p. 777). One can resolve this supposed contradiction (relation to other signs in a pictorial system vs. use of pictures), I suggest, because a specific relation of signs to each other can only be established in use, that is, in pragmatic social processes.
Those who argue that images are not ‘semiotic’ often reject the so-called “similarity theory” too. This approach – as Bräunlein, for example, points out (2009, p. 776) with reference to Nelson Goodman (1976) – assumes that the meaning of pictures is based on their similarity with the depicted. This rejection, however, is based on a narrow understanding of semiotics. Following Peirce’s semiotics, one cannot argue that the aspects of signs (indexical, iconic, symbolic) are applicable to different types of signs, i.e., that there are “iconic signs” (such as pictures) and “symbolic signs” (such as words). Rather, all signs have these aspects or properties to varying degrees (Krech, 2012, p. 52). According to this, pictures often function iconically, because they refer to the depicted (object, referent or signified) in a manner of similarity, but they are at the same time symbolic, i.e., arbitrary and conventional, and indexical, because they point to what they depict. Such an approach to semiotics with a foundation in systems theory does not conceive of all sign types as “analogue to language” (Bräunlein, 2009, p. 772) but detaches itself from language and thus avoids the supposed danger of linguism.
The discussion about the dominance of the linguistic turn also fails to recognize what constitutes a turn: The linguistic turn does not demand that only language be investigated just as the performative turn does not demand that only performances be investigated or as the spatial turn does not demand that only spaces be investigated. Nor does the linguistic turn demand that all empirical data should be studied as if they were language. On the contrary, an epistemological reorientation of scientific fields calls for the development of a theoretical perspective and a conceptual as well as a methodological toolkit from the study of specific objects (e.g., language, spaces, rituals, or images): “Thus, for example, ‘ritual,’ ‘translation’ or ‘space’ as research objects become categories of analysis with which phenomena can then be studied that originally do not belong to the traditional object area in the narrow sense” (Bachmann-Medick, 2007, p. 26; italics MR). The semiotic turn, understood in this way, does not examine only texts or textual signs nor does it demand that all empirically and historically observable objects be examined as if they were texts or signs in a narrow sense. It requires to transfer the knowledge gained from linguistics and semiotics to socio-cultural phenomena as a whole. This means that we are discussing the reconstruction and explanation of socio-cultural meaning 5 processes. For this, concepts and explanatory approaches from semiotic theories are particularly helpful.
The criticism of a semiotic approach to images may be justified if one takes as a basis a narrow conception of semiotics that claims that there are a priori identifiable signs in written and spoken language to which one can assign a precise meaning. Bildwissenschaftler Martin Schulz (2005, p. 77) formulates this supposed “linguism” of semiotic approaches to the image very clearly: “Is there at all a system of the image comparable to the abstract and strongly coded sign systems of writing, a syntactic structure that could be similarly broken down into discrete units? […] Does an image always have to be a meaningful sign in order to be recognized as an image?”
Schulz continues: “[S]emiotic analyses [seem] to tend […] to read images primarily as stored, decodable, context-dependent messages that carry a certain measurable value, as if they were disembodied signs for an equally disembodiedly conceived pure cognition” (quoted in Bräunlein, 2009, p. 773).
This could be answered by a series of similar questions seeking to break the dichotomy of images and texts stated by Schulz. What are the discrete units of written language – letters, phonemes, syllables, words, or combinations thereof? Are there such things as the smallest semantic elements in spoken and written language? Theoretically, they exist in the form of so-called morphemes (e.g., Volli, 2002, p. 59). However, in the specific context of use, for example in a sentence, the lexical meaning of such a smallest semantic element of language often does not help much, as is well known from translation practice.
Does a text have to be a “meaningful sign” to be recognized as a text? Not only the history of religion shows that texts have an aesthetic dimension (they stand for themselves in the iconic sense) in addition to their semantic dimension (they stand for something else): here one could think of calligraphy, initials and capitals in manuscripts and hand-colored prints, of decorated and ornamented books that are shown, revered, and processed as material objects. Islamic calligraphy is a prime example of the ‘picture-less’ aesthetics of writing. In other cases, texts only ‘pretend’ to be texts, for example text-like talismans of early Daoism: These are “peculiar and pictorial compositions of Chinese lines of writing” that are to be understood as a “fusion of the medium of writing with the medium of the image,” without “representing anything pictorially in the strict sense or writing anything down textually” (Di Giacinto, 2019, p. 59).
Finally, I am not aware of any semiotic analysis that wants to “measure” something – as Schulz implies in the above quote. In this critique of semiotic approaches, the aforementioned dichotomy of cognition versus emotion is reproduced once more: Texts and text analyses are supposedly rational while images and image analyses are supposedly emotional. A semiotic approach in the sense of Peirce cannot claim – as Schulz seems to assume – that signs “carry a certain measurable value […], as if they were disembodied signs for an equally disembodied thought of pure cognition” (Schulz quoted in Bräunlein, 2009, p. 773). The semiotic approach knows that signs are not disembodied: Every sign needs a material carrier (i.e., medium or representamen), be it paper and ink, the human voice and hearing apparatus, or canvas and paint. The material presence of signs has, without a doubt, a somatic-affective effect on perceiving human beings. However, this takes place within the individual, situation-bound experience and only becomes a social fact – and, as such, a phenomenon investigable through social sciences – when it is communicated (verbally or non-verbally, in writing or non-writing), i.e., when it generates meaningful connections.
Closely related to the idea that images are non-linguistic is the idea that they are a case of “presentational symbolization” (following Langer, 1957) and thus about “non-linguistic sensory experience”: “Not conceptual symbolism, not language-guided introspection is at the beginning of all knowledge, but the emotional experience opens the gate to the outside world” (Bräunlein, 2004, p. 210). This observation can be agreed with from the perspective of developmental psychology or even that of everyday life. However, the “emotional experience” only becomes a social fact, and thus the object of a social science-oriented study of religion when it is communicated (Krech, 2021, pp. 29-30) – in whatever medium. On the other hand, this “emotional experience” is always already socially and culturally conditioned and framed.
Communication as a Semiotic Process (One that Does Not Only Operate Linguistically)
The concept of communication on which my suggestions are based requires further explanation: Communication, following Niklas Luhmann’s systems theory, is the elementary operation of social systems. These systems, and thus religion as a functionally differentiated subsystem of society, consist exclusively of communications. They do not consist of actions or of people, which are instead understood as units constituted in and through communicative attribution (Luhmann, 1985, pp. 192-193). This does not mean that systems theory denies the existence of people and actions. It instead theoretically reconstructs them in terms of the interaction between organic and mental systems and thus belonging to the environment of social systems (Krech, 2021, p. 59).
From a systems theoretical point of view, the colloquial understanding of the term “communication,” in which someone (‘sender’) transmits something (a ‘message’) to someone else (‘receiver’), must first be revised completely. Neither ‘sender’ (ego) nor ‘receiver’ (alter) are constitutive elements of communicative processes, according to systems theory. They are communicatively produced addressees of communication. Luhmann therefore turns around ego and alter and conceives communication from the ‘receiving’ end (thus: ego) which is not a person, in the common sense, rather an abstract communication-processing unit (Krech, 2012, p. 50). Communication as a self-running (autopoietic) process in social systems is then described as a process of three selections (Luhmann, 1985, p. 194): The selections of message, information, and understanding. From the horizon of possibilities, something (and not something else) is selected by alter to be communicated and provided with certain information (and no other). This information is understood by ego in a certain respect (and in no other respect), whereby “understanding” is not to be conceived in the psychological sense but in the sense that communication can continue (Luhmann, 1985, p. 196).
Whether the content of a communication process is rejected or agreed upon is irrelevant for the continuation of social systems because it is only a matter of offering and realizing connections (Krech, 2021; Luhmann, 1985, p. 212, p. 60). This understanding of communication is not bound to specific, intentionally acting persons who want to transfer something, a ‘message,’ to other human persons. It reconstructs communication as a social fact and semiotic process.
Communication requires media to proceed in the social world. These media include first and foremost spoken or written language that has been extremely successful in evolutionary terms (Luhmann, 1985, p. 220). Communication, however, does not process exclusively in the medium of language. There is “speechless communication,” Luhmann emphasizes, before continuing that communication is not possible entirely without language: “The fundamental medium of communication that guarantees the regular autopoiesis of society […] is language” (Luhmann, 1997, p. 205). While Luhmann describes “speechless communication” mainly in terms of gestures or facial expressions, i.e., the classical non-verbal aspects of social interaction, one can also understand built environments, architectures, or artifacts as “gestures” in a broader sense (on the gestural character of architecture, albeit with a phenomenological basis, see, e.g., Meisenheimer, 2008, p. 25). This means that communication, when it takes place in non-linguistic media, requires linguistic connections at some point – at the latest when researchers seek an understanding of its communicative properties: “In non-verbal or object-centered communication, bodily signs such as a certain posture (e.g., kneeling down or folding hands), a facial expression (e.g., rapture), and the bodily interaction with physical objects or technical instruments must also be perceived as communication and understood in their informational value in order to produce meaningful connections in socio-cultural reality. The same applies to pictures, buildings, utensils, archaeological finds and other objects, whose socio-cultural meaning arises qua attribution and has to be deciphered, i.e., to be read” (Krech, 2021, pp. 60-61).
Linking these remarks to the matter discussed here, we can assume that communication encompasses everything that affords semantic connections including misunderstanding and rejection. Because communication proceeds in empirically tangible media, it can always be recorded and transformed into a protocol. The concept of “protocol,” as introduced by Rudolf Carnap (1969, p. 235), has been adopted by Ulrich Oevermann for “objective hermeneutics” (2013, p. 73). In this sense, even images and architecture may be regarded as protocols of social interaction (for architecture, e.g., Egger, 2017, p. 65). Mediums of communication can include non-linguistic expressions such as clothing, sound, images, built and natural environments, gestures, sounds, smells, and tastes. All of these are, in the sense just outlined, non-linguistic media of social processes and may be analyzed as such.
Example: What This Means for an Analytical Perspective on Images
Although the more specific methodic application of the considerations outlined above require more work, and future publications, a brief example shall illustrate some of the arguments. Above, I have tried to demonstrate in which regards textual data are more like images than has often been assumed. These suggestions have implications on the analysis of images as well. Therefore, I use an example of the pictorial kind of data in this section and hint at some of its features based on the sections above (Figure 1). Holy card depicting Jesus as the good shepherd (public domain, https://commons.wikimedia.org/wiki/File:A033_Gesù_Bambino_buon_pastore.jpg).
I have suggested in section 2.1 that textual data have synchronous, simultaneous, and nonlinear properties, but this in turn means that images may have diachronic, linear features, both in their production and in their analysis. In the above picture, for instance, one could reconstruct the sequential process of visual perception by human beings. They would probably start with the women’s head, followed by the boy’s head, and then turn to the sheep, and the act of feeding them. Of course, to claim this with some validity, technical measures (eye tracking) would be necessary.
I have also argued in section 2.2 that textual data can be messy, rich, and full of information. Yet it is worthwhile to consider how images can have a low information density – a feature that has often been attributed to texts. Based on the notion of “information” elaborated above, information is produced in the social and communicative consequences and not something that is already within the image and would just have to be extracted with the appropriate means. Thus, if the above picture would be an item in an economic transaction (it could be offered in an online shop for collectors, for instance), the seller and buyer will at some point agree on a price and in that moment the image will be completely and exhaustively translated into an economic piece of information which, in that specific situation, is all the information necessary about this image. In another situation, it may be sufficient to distinguish this image from others in terms of its main characters (Jesus), and in yet others, it may be characterized fully by its style and genre.
I have showed in section 2.3 how textual data can be ambiguous and polysemic, features that are often ascribed to images, but I also want to show how an image can be unambiguous. The above image is a good case in point. For someone who is familiar with the Christian tradition and iconography, and possibly a believer as well, this image does not leave much to doubt. Mary, indicated by the blue dress, is teaching Jesus to herd the sheep, thus alluding to a very common metaphor of the “Lord Shepard,” which is also referenced in the caption “L’enfant Jésus Bon Pasteur.” For other viewers, those less familiar with Christian iconography, the image may be less clear, and there may be other contexts of use in which ambiguous meanings prevail. This is why the argument is to consider both images and texts as equally ambiguous and unambiguous, and always pay due attention to the specific context of use.
In section 2.4, I have suggested that the effects of textual data can be suggestive, emotional, and affective (effects that are often attributed to images), but it is also important to ask how images can have a cognitive and rational effect, depending on the context of use. While for a Christian believer, this image may be an emotional reference point in times of despair, its effect on a professional collector of holy cards may be more rational and cognitive, estimating its time of production, its economic value, or its shape and quality.
I have suggested in section 2.5 that textual data do not have clearly demarcated semantic units, or, at least, that it may be a huge effort to extract these from the material: What is identified as a unit must first emerge from the continuum of textual elements. But I also want to show how images can have these structural linguistic features and can be based on a more or less clear pictorial code that allows viewers to identify semantic units. Sometimes pictures function iconically because they refer to their object in a manner of similarity. At the same time, they may be symbolic, i.e., arbitrary and conventional. This is true for the above picture as well: There is an iconic similarity between the depicted persons, sheep, trees, fence, etc. to what they depict, and there are conventional, but arbitrary elements, such as the halo around Jesus’ and Mary’s head.
Much more could be said about this image, for instance regarding its time and place of origin, painter, dissemination, or technique and style, but these remarks should suffice here to illustrate how the arguments presented in this paper may shape an analytical perspective on images (and texts). Images are such a heterogenous kind of data that it is virtually impossible to choose an example that represents all variations and possibilities. Therefore, while much may be lacking from this brief analysis that may be important in other analytical contexts, this example may illustrate the points made in this paper.
Conclusions
The above-mentioned approaches in visual studies (Bildwissenschaften) often seek to distinguish themselves from semiotics. In doing so, they, unintentionally perhaps, may sometimes uphold the fundamental dichotomies of mind versus body, reason versus feeling, culture versus nature, language versus image, associating the image with body, feeling, and nature. They are also sometimes based on a potential misunderstanding of the linguistic or semiotic turn. Many times, the linguistic turn seems to be understood as putting language, more precisely, written texts, first as an object of research. The semiotic turn is often cited as claiming that social reality functions ‘just like language.’
But what is a “turn”? Doris Bachmann-Medick explains that a turn begins with the ‘discovery’ of a new subject area (e.g., language, ritual, space), which then sparks interdisciplinary interest (Bachmann-Medick, 2007, p. 26). But one can only speak of a turn in the true sense “when the new research focus ‘shifts’ from the object level of novel fields of investigation to the level of categories of analysis and concepts, i.e., when it no longer merely identifies new objects of knowledge, but itself becomes a means and medium of knowledge” (Bachmann-Medick, 2007, p. 26). In this respect, the semiotic turn does not urge researchers to deal only with obviously sign-like objects such as language and texts but to develop categories of analysis “with which phenomena can be grasped that do not originally belong to the traditional subject area in the narrow sense” (Bachmann-Medick, 2007, p. 26). I advocate this approach also for semiotics as a whole which thus detaches itself from its primary object, language, and can be transferred to other subject areas without taking them to be language-analogous. As we are focusing on social systems that operate exclusively communicatively, communication must be the process that is empirically investigated, however, as emphasized above, while taking into consideration all media of communication, including images and texts.
David Morgan formulates the criticism of a semiotically based analysis of images in a particularly pointed way. He writes that text and image often occur together empirically and historically but cannot be transformed into each other without loss: “So the idea that images might be treated as visual texts, as signs of meaning that are properly textual in nature, is a presumption” (Morgan, 2020, p. 2) – an assumption that is not held by the semiotic approach suggested in this paper. Morgan, on the other hand, argues that one cannot separate the medial and performative dimension of language, but also of images, from their meaning: “Entanglement means inextricability such that to change the configuration is to change the meaning” (Morgan, 2020, p. 2). The idea that meaning is assigned to media through and in their use in specific social contexts, in turn, is a generally shared assumption of semiotics, which points to the interplay of semantics, syntax and pragmatics (e.g., Krech, 2021, p. 77). Thus, Morgan’s argument may be rephrased in semiotic terms in a broader sense – despite his explicit rejection of a semiotic approach.
A final point which is not always reflected in detail in image studies, is the fact that images – just like texts – are not a homogeneous genre of data. Images are present in the historical and contemporary lifeworld in an abundance of forms and variations. Often, they are directly linked with textual signs (e.g., capitals, lithographs, advertising posters), so that even the empirical data in past and present times does not suggest a simple distinction between texts and images. Images can be abstract line drawings, photographs, paintings, sketches, or pictograms – they can stand alone or be connected in narratives (e.g., biblia pauperum, comics, films). Ultimately, the contribution of this paper to qualitative methods in general is that it may enrich the discussion about texts versus images, as they are – like other media (e.g., sound and music, rituals, clothing, and architecture, to name just a few) – possibilities of socio-cultural meaning production, all of which need to be studied in a way that is adequate to their specific properties and, at the same time, attentive to their shared features.
For a second paper, I plan to propose a methodic approach that outlines how the theoretical and methodological ideas presented in this article can be applied to the analysis of images in the study of cultures and societies. In the current paper, I have tried to set the theoretical perimeters, challenging researchers to pay due attention to the variety of textual and non-textual data in their specifications while, at the same time, focusing on their generic and, thus, similar elementary features which allows them to be studied and analyzed within the same theoretical framework: that of semiotics, understood as a non-reductive approach to understanding the socio-cultural making of aesthetics, performances, and meanings in multi-media arrangements.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by Deutsche Forschungsgemeinschaft (441126958).
