Abstract

The notion that the world is becoming increasingly multimodal is far from a novel one, nor is it distinctly recent. Yet just 10 years ago, quantitative approaches to multimodality were still considered ‘in its infancy’ (Knight, 2011) and researchers across disciplines continue even now to call for new approaches to large-scale multimodal analysis (Caple, 2018).
It would be going a bit too far to suggest that Pflaeging et al.’s Empirical Multimodality Research answers this call in full, but it does take a few smart steps in the right direction and provides some much-needed guidance to others attempting to bridge the gap between quantity and quality in multimodality. In this sense, the volume is aimed at both researchers interested in more data-driven approaches to multimodality and to those with a background in larger-scale research methods such as corpus linguistics and digital humanities. It also builds upon ongoing work in multimodality (e.g. Stöckl et al., 2020) as well as related work in corpus linguistics.
The edited volume has a three-part structure, with most of the space given to the contributions found in parts II and III, with a short and to-the-point introductory section by the editors. This gives the contributing authors space and time for both theoretical–methodological reflection in part II and ‘richer’ (p. 24), more visually dense case-study work in part III.
As the title suggests, the keyword here is ‘empiricism’, which is what Pflaeging et al. are ready to provide right from the beginning with an opening section that goes beyond mere theory by instead visualizing the recurrence of empirical multimodal articles in three key journals: Social Semiotics, Visual Communication and Multimodal Communication (pp. 12–15). The picture presented here is a positive one and the figures paint a clear picture of a field that has seen increasing interest and growth. Having highlighted this growth, the authors bring their initial discussion of multimodal empiricism to a conclusion by providing a framework for ‘good empirical multimodality research’ (p. 19) as dependent on a set of five ‘good’ criteria inspired by other empirically inclined scientific disciplines. These criteria (feedback loops, objectivity, reliability, validity and tentativeness of results) form the basic thematic framing for the contributing authors’ work.
And while the editor-led contributions are all good, it is arguably in the contributing work that the volume really comes into its own. An example of this in the theory/methodology section is Ewerth et al.’s Computational Approaches for the Interpretation of Image–Text Relations (pp. 109–138), which contributes to ongoing efforts in automated multimodal analysis (see Christiansen et al., 2020; O’Halloran et al., 2021). In a discipline like machine learning which often lacks a coherent theoretical and methodological focus, Ewerth et al. manage the tightrope balancing act of combining theory and methods from semiotic analysis with semantic image–text classification and computer vision. The data presented, around 220,000 image/texts, is also the clearest case of a large-scale study in the volume and serves as an excellent example of combining methods from computer science with classically qualitative checks such as inter-annotator testing. While the findings are still somewhat limited, it is in this balance that the text exemplifies the goal of the volume of ‘appropriate and productive contact with data’ (p. 5).
This creativity and innovation caries through most of the volume’s third part, which presents case studies of an even wider spread, from practical uses of biometric facial expression analysis in analysing framing of TV single-visuals as memes on social media (Rothenhöfer, pp. 141–158) to manual coding procedures in the exceedingly challenging act of annotating open-world video games (Stamenković and Wildfeuer, pp. 259–578). The former utilizes the complexities of a single, relatively innocuous moment in political communication and shows how, by instantiating the visuals of the moment as a series of facial landmark positions, researchers can more precisely identify and chart the rapid process of re-contextualizing political communication as memetic materials. The latter ends the volume by systematically dissecting 80 missions from one of the most popular video games in the world and highlights not only the potential brought by drawing on multimodal corpus methods in the field of game theory, but also the difficulties faced by researchers who are trying to innovate. This speaks to a consistent willingness within the volume to press beyond what can be considered methodologically ‘safe’ and to instead accept the setbacks and limitations that are to be expected in cutting-edge research.
If there is room for improvement, it is arguably in what the editors refer to as ‘large n’ studies. Only a couple of studies have more than a few hundred pieces of data, a limitation which likely speaks to what the editors identify as a general ‘hesitation to scale-up’ (p. 4). While none of the contributions lack anything in academic rigour, the limited scale of the data employed does at times bring into question the necessity of corpus linguistic or advanced computational approaches when the ability to perform pattern recognition is limited and visual analysis alone might have done the trick. Future iterations of the volume may be able to alleviate this limitation as more researchers start to adopt an empirical, large-scale approach to multimodality.
As it stands, this volume offers a solid contribution, and to some degree foundation, in the growing field of large-scale multimodality. What sets the volume apart is a sense of patience with a field that is still finding its footing and, in this way, it is helpful both as a way of guiding experienced quantitative researchers and as an introductory piece for those who are currently focused on one piece of data at a time. In a still burgeoning field full of enthusiasm and innovation, the volume is a welcome sign of patience and direction.
