Abstract
The paper proposes a quantitative approach to analyzing a “strong” form of multimodality, based on the co-emergence of a key connecting element in different modes of data. It provides a methodological template for the simultaneous analysis of visual and verbal data, which is applied to the domain of creativity. We expand the possibilities for analyzing creativity as a process by drawing on the “Janusian” tradition, using opposition as the key connecting element between the verbal and visual modes. The research context is the creative process of Vincent Van Gogh (during the period 1881–1890), as reflected in his letters and paintings. The visual analysis attests to the growing presence of contrast of complementary colors, while the textual analysis provides evidence for increasing affective, cognitive, and behavioral ambivalence. The results corroborate the validity of the method by demonstrating the co-emergence of visual and verbal opposition over time. The method can be used for purposes of exploration or validation. It has broad applications in organizational scholarship.
Introduction
Scholarship is increasingly moving “beyond words” by adopting multimodal approaches to organizational processes (Langley, Bell, Bliese, LeBaron, & Gruber, 2023). Such approaches have been applied to the analysis of institutionalization and legitimation (Meyer, Höllerer, Jancsary, & Van Leeuwen, 2013; Meyer, Jancsary, Höllerer, & Boxenbaum, 2018), to organizational identities (Bullinger, 2017), and internal and external communication (Barberá-Tomás, Castelló, De Bakker, & Zietsma, 2019). But despite growing recognition of the limitations of monomodal research (Boxenbaum, Jones, Meyer, & Svejenova, 2018; Höllerer et al., 2019; Quattrone, Ronzani, Jancsary, & Höllerer, 2021), and the embrace of qualitative multimodal designs (e.g., De Rond, Holeman, & Howard-Grenville, 2019; Heizmann & Liu, 2022), there is still little guidance for the application of quantitative multimodal analysis and for the computational identification of forms of association between modes.
This is surprising, considering recent advances in computer science and computational linguistics, which have opened new analytical possibilities. They have also made evident the necessity of translating methodological advances generated elsewhere to further theory development in management (Gruber & Bliese, 2024). With this objective in mind, we propose a quantitative approach for the simultaneous analysis of visual and verbal information. The approach is motivated by the need to understand “how visuals work together with other cultural elements in multimodal communication” (Barberá-Tomás et al., 2019, p. 1810).
Multimodal analysis is hampered by the fact that each mode uses a distinct set of semiotic and semantic resources to construct meaning (Höllerer et al., 2019). The issue of mode compatibility has so far been addressed through qualitative methods (see Boxenbaum et al., 2018; Meyer et al., 2018), but we harness instead the ability of quantitative methods to distill complex data into more digestible forms (Aceves & Evans, 2021). To this end, we propose an approach that involves the identification of a key connecting element between the modes, as guided by theoretical considerations. This element informs the subsequent choice of computational techniques to capture its mode-specific manifestations. The degree of correspondence between the mode-specific manifestations is then statistically established.
The proposed method avoids full automation of the procedure, as in the recent machine-learning data fusion method for analyzing multimodal data (Luo, Jia, Ouyang, & Fang, 2024). It retains control by researchers in determining both the form of connection between the modes and the appropriate analytical tools. It integrates the pursuit of theory–method package “fit” (Gehman et al., 2018), where methodological tools are tailored to the research context and are aligned with the research question and objectives of the analysis (p. 297). Our objective is to quantitatively extend, rather than surmount, the qualitative approach. We share the “pragmatist” assumption that qualitative and quantitative data are inextricably bound together, and that the method should be adapted to context, enabling the identification of general patterns in local contexts (Gillespie, Glăveanu, & Saint-Laurent, 2024).
The key advantage of the proposed method is its capacity to capture a “strong” form of multimodality, based on the “co-emergence” of elements in different modes (Iedema, 2007; Zilber, 2017). The strong form of multimodality allows scholars to ask new questions and offer new answers (Zilber, 2017), as illustrated by the establishment of a new substantive connection in our study between research on multimodality and creativity. Our multimodal method is applied to capture the continuous nature of creativity, demonstrating how the accumulation of experiences and experiments over time feeds into the present, motivating creative expression (Deleuze, 1991, 2014; Karakilic & Painter, 2022). Our main contribution is to scholarship on multimodality, providing a quantitative template for “strong” multimodal research. We illustrate the potential of the method for generating new theoretical insights in the domain of creativity. Our secondary contribution is in expanding the possibilities for analyzing creativity as a process of “creative-becoming.”
To illustrate the application of our method, we analyze the creative process of Vincent Van Gogh, as reflected in his letters and paintings. We connect the structure of the letters and paintings by means of a key element suggested by creativity scholarship: opposition (Rothenberg, 1971). Combining a computational method for color analysis with natural-language-based methods for textual analysis, we establish the association between the opposition of colors (“contrast”) in Van Gogh’s paintings and the opposition of emotions, ideas, and relations (“ambivalence”) in his letters. The analysis demonstrates the feasibility of the method, which is illustrated in the context of artistic creation, but is broadly applicable in organizational research.
Multimodality
The question of the correspondence between text and image is of fundamental significance in the history of culture. Despite a general expectation that visual culture is firmly anchored in language, many scholars subscribe to the view that visual and verbal modes of communication realize separate “messages” (Barthes, 1977). This debate has entered organizational scholarship over the last decade. The body of work is steadily growing, suggesting that multimodality should be at the core of our engagement with organizations (Höllerer et al., 2019). However, the priority attributed to the verbal over non-verbal sources of information endures (Meyer et al., 2013). It is customary to analyze organizational and institutional change through the production, dissemination, and consumption of texts, which then shape the conditions for the production, diffusion, and consumption of new texts (e.g., Phillips, Lawrence, & Hardy, 2004). The preoccupation with the spoken and written words is pervasive (Alvesson & Kärreman, 2011; Iedema, 2007).
As Höllerer et al. (2019) observe, by focusing on the verbal mode and treating other modes as if they worked the same, studies ignore empirical material and misrepresent the life-worlds of actors. The visual dimension is an “absent present” (Styhre, 2010): it is recognized, but often relegated to a secondary status. The visual and verbal modes both reflect social reality and assist in constructing it by materializing, organizing, communicating, storing, and passing on social knowledge within communities (Meyer et al., 2018). However, the performativity of visual and verbal framing proceeds differently, activating different cognitive processes (Boxenbaum et al., 2018). Scholarship attests to the ways in which visual information facilitates experimentation and construction of meaning, helping to externalize classification schemes, to realize statements of action, and to signify degrees of social intimacy and distance (Meyer et al., 2013).
The nature of the correspondence between the two modes remains contested. Boxenbaum et al. (2018, p. 604) observe that “most research points to complementary, mutually reinforcing roles” (e.g., De Rond et al., 2019; Heizmann & Liu, 2022). Similarly, Halgin, Glynn, and Rockwell (2018) articulate “likely synchronicity” in which each mode amplifies and extends the other. Yet, the evidence for synchronicity is limited, based largely on case studies and linguistic analyses. The need for simultaneous analysis of different modes, in order to capture the complexities of their association, is widely recognized (Boxenbaum et al., 2018; Meyer et al., 2013). Zilber (2017) observes that research on multimodality tends to be “weak” in form, looking at both discursive and non-discursive modes, but giving primacy to discourse and marginalizing the non-discursive. She advocates for “strong” multimodal research that regards the material, verbal, visual, and other modes not as separable, but as “co-emergent” (Zilber, 2017, p. 65). The key characteristic of this research is its focus on the co-emergence of the discursive and non-discursive (Iedema, 2007).
Advancing multimodality research requires the development of new methods (Zilber, 2017). We adopt this objective in the design of a quantitative method for the systematic analysis of verbal and visual data. It incorporates three principles identified in past research. First, it attributes a central place to time. Emphasis is not on occurrences in a confined period of time, but on a series of events that influence the present (Abbott, 2001). Actors are “historical,” as the past contributes to their actions at each moment (Deleuze, 1991, 2014). Second, it embraces complexity by aborting the fixation on results and causality, and making room for understanding organizational dynamics as ongoing and unfinished (Zilber, 2017, p. 76). We consider as more appropriate the principle of “synchronicity”: when two elements are connected, but neither is the cause of the other (Jung, 1952, 1959). And third, it shifts attention from actors to objects and elements (Zilber, 2017, p. 76), analyzing artifacts as a configuration of symbolic and material elements (e.g., Godart & Galunic, 2019; Sgourev, Aadland, & Formilan, 2023). The unit of analysis is not the actor, but an element in relation to other elements. We describe next our approach.
Analytical Approach
The verbal and visual modes are both multilayered and rich in meaning. Visuality affords spatial, integrative, and concurrent depictions of social reality, in contrast to the more linear, sequential, and temporally bound progression of written and spoken language (Meyer et al., 2018). Gentzkow, Kelly, and Taddy (2019, p. 535) emphasize the semantic density of the verbal mode by stating that “a sample of thirty-word Twitter messages that use only the one thousand most common words in the English language [. . .] has roughly as many dimensions as there are atoms in the universe.” The visual mode possesses its own lexicon, featuring elements such as perspective, color, and geometry. According to Meyer et al. (2013, p. 494), the system of visual meanings offers “an accuracy and plenitude of description that verbal language cannot match.”
To reduce this complexity, we propose the adoption of quantitative methods that synthesize high-dimensional data into an accessible format (Aceves & Evans, 2021). Reduction should not lead to oversimplification, retaining a level of precision that allows for meaningful comparisons across modalities. To this end, we propose a four-step procedure that provides a template for “strong” multimodal analysis that is temporal and “co-emergent” in nature, incorporating the principles discussed above.

A quantitative multimodal approach.
Toward a Multimodal Approach to Creativity
A key feature of the proposed multimodal method is that it integrates time, capturing the “co-emergence” of elements in different modes of communication. This makes it particularly suitable for analyzing processual social mechanisms (Abbott, 2001) and organizational “becoming,” as a continuous process of accumulation of experiences that shape diverse outcomes (Tsoukas & Chia, 2002). A notable advantage of the method is its capacity to provide a dynamic perspective on phenomena typically analyzed through observation at a given point in time. A particularly appropriate example is creativity: the invention of new configurations of cultural or material elements (Godart & Galunic, 2019). The study of creativity usually proceeds through experiments and case studies lacking a significant temporal dimension and multimodal focus (see Amabile, 1996; Csikszentmihalyi, 1997; George, 2007), despite the understanding that creativity emerges at the intersection of ideas and modes of expression in a logic of recombination over time (Godart & Galunic, 2019).
As Sonenshein (2016) remarks, organizational analysis is becoming more dynamic, but application of the dynamic perspective to some domains, such as creativity, remains limited. Research on creativity is still dominated by stage-based accounts (e.g., Amabile, 1996). Since its emergence in the 1950’s, the key objective of this scholarship has remained the identification and refinement of the stages of the creative process, analyzing subprocesses and factors (Botella, Zenasni, & Lubart, 2018). Thus, the highly influential “componential” theory identifies five stages of the creative process, running from problem identification to outcome assessment (Amabile, 1996), or from preparation to elaboration (Csikszentmihalyi, 1997).
The last decade has witnessed the growing criticism of the stage-based approach for ignoring the unpredictability of the creative process (Bilton, 2010), and for presenting an oversimplified, teleological picture that leaves little room for surprise and serendipity (Karakilic & Painter, 2022). This perspective promotes theoretical approaches and empirical methods that conceptualize and operationalize the creative process as continuous in nature, rather than as a configuration or a sequence of isolated events. What this change in perspective highlights is the need to develop methods that capture in empirically tractable ways “the relations and affects that have shaped creative production contemporaneously and historically, to make sense of the dynamics of production and the processes that shape creativity over time” (Fox, 2015, p. 533).
This perspective finds theoretical footing in the work of Deleuze (1991, 2014), for whom “becoming” is a process of transformation, through which an identity is constructed through experimentation in responding to changes in the surrounding world. Events are dynamic, parts of an ongoing process of change. Temporality is essential in this perspective, as the creative insight emerges from playful experimentation over time, from embracing unexpected flows of personal experience in relating to others. Creativity is the result of active engagement: of learning a knowledge base, struggling with contradictions, addressing problem spaces, filling gaps in information, and recombining symbols or elements in the form of a new configuration (Brower, 2000, p. 202).
An appropriate method for the analysis of the process of creative-becoming integrates experimentation over time, examining interwoven, co-emergent elements, but always retaining a degree of unpredictability about the observed process (Karakilic & Painter, 2022). This method is multimodal and temporal in nature, applicable to individual and organizational creation. An example is the interdependence between writing and the visual image in the creative process of one of the most original movements in the history of art: Surrealism. The Surrealists’ dazzling, mysterious art had its initial impetus in the field of writing. In their journals Surrealists frequently juxtaposed writings and poems with drawings, photographs, and collages (Caws, 2010, p. 24). Their creative- becoming can be understood and analyzed in terms of mutating, interwoven networks of visual and verbal elements.
The next section presents the application of our multimodal method at the individual level, to the creative process of a highly prominent artist. It attests to the method’s feasibility and demonstrates the key role of opposition in creation through a simultaneous longitudinal analysis of text and image.
Demonstration of the Method
Establishing a common analytical framework: Van Gogh’s creative process (1881–1890)
Höllerer et al. (2019) observe that the first source of intellectual inspiration for the engagement with multimodality is art history. The choice of art history as a research context facilitates the identification of a common analytical framework. Perhaps the most widely encountered unit of observation in art history is the artistic career (e.g., White & White, 1965). It represents a series of choices, developments, outcomes, and plans particular to an artist that accumulate over time in ways that in turn affect her mentality and creative process (Abbott, 2001). The creative process of an artist is the manner in which she conceives ideas and implements them by engaging with materials in applying paint on the canvas and positioning her work vis-a-vis her peers (Sgourev, 2021; Sgourev et al., 2023). This process is reflected in the artworks, as visual representations of aesthetic concepts, but also in textual artifacts, such as workbooks, studio notes, and personal diaries. Our temporal perspective on the creative process postulates that time allows for minor differences from one’s peers to accumulate into artworks as a result of the exposure to flows of experience.
We set our sights on the creative process of Vincent Van Gogh (1853–1890) for three main reasons. First, his prominence. He created more than 2000 works, many of them featuring a distinctive combination of bold colors and highly expressive brushwork that would establish his reputation as a pioneer of modern art. He decided to pursue an artistic career in 1880: the natural starting point of our period of observation. He honed his skills at drawing and painting in Nuenen and Antwerp (1883–1886), before moving to Paris in 1886. His most prolific period was in Arles in 1888. He struggled with his mental health, spent time in psychiatric hospitals and is believed to have shot himself on July 27, 1890. Second, the availability of complete digital editions of both the paintings and letters. A particularly important development in this regard is the publication of the complete, multilingual digital edition of Van Gogh’s letters (see Luijten, Jansen, & Bakker, 2009). This edition allows us to identify links between the visual and verbal sources and explore the creative process. And third, the multimodal nature of his creative process, unfolding in back-and-forth movement between writing and painting. As Brower (2000) notes, his letters are “thinking aloud” materials. The artist carried paper with him and put down his thoughts in the process of creation. The writings are not distorted by hindsight bias, as is often the case with artistic memoirs. The letters and paintings emerge together, complementing each other (Porter, 1982). The artist insisted that he was not just copying the images in his notebook, as he was “translating them into another tongue” (Naifeh & Smith, 2011, p. 947).
The creative process is a configuration of interwoven elements, shaped by affects and relations both contemporaneously and historically. We analyze the letters and paintings as clusters of elements that are related to each other and also to the social context (Godart & Galunic, 2019). Letter-writing is a relational act of conveying feelings and thoughts (Tamboukou, 2011). Collections of letters have a narrative structure that highlights key aspects of the writer’s identity (Stanley, 2004). Such collections provide an opportunity to identify a narrative sense from an agglomeration of elements and topics (Tamboukou, 2011). Likewise, paintings constitute an agglomeration of elements or dimensions related to the composition, forms or colors featured on the canvas. We analyze Van Gogh’s creative process in the period from 1881 to 1890, which encompasses his career as an artist.
Identifying a key connecting element: Opposition
Having constructed a unitary temporal axis, we proceed to the next stage of the procedure. We draw on an established tradition in creativity research to identify opposition as our key connecting element. Based on the foundational work of Jung (1952, 1959) on the role of opposite attitudes in mental processes, the “Janusian” theory of creativity (Rothenberg, 1971) focuses on the ability to imagine two opposite ideas, concepts, or images existing simultaneously. This ability is fundamental in the creative process. As Runco (1994, p. 102) observes: “Some kind of tension must precede the intrinsic motivation that characterizes the creative effort.” In this view, creativity tends to arise from the interaction of opposing dualities, paradoxes, or contradictions (George, 2007). Attempts to integrate divergent elements as a result of the juxtaposition of conflicting logics foster transformation, as tensions become creative material (Jones, Maoret, Massa, & Svejenova, 2012).
Rothenberg (1971) cautions that opposition is a complicated phenomenon, and can sometimes be so idiosyncratic as to have no bearing on creation. A wide variety of forms of opposition exists, from mild to strong. The methodological challenge is to assess the appropriateness of opposition in each context and pick the type corresponding to a specific creator. He gives an example of important opposition pairs that feature colors, such as red and green or blue and yellow, and recommends the measurement of creativity through opposite word associations (Rothenberg, 1971, p. 204). We adopt his suggestions by treating color contrast as the form of opposition in the visual domain and ambivalence in the verbal one.
Contrast
Contrast refers to how opposites (e.g., rough vs. smooth, light vs. dark) are arranged in aesthetic space to create noteworthy visual effects (Aloumi, Noroozi, Eves, & Dupac, 2013). Contrast is a key factor in aesthetic evaluation. Studies attest that high-contrast artworks or visual stimuli are preferred over lower-contrast ones; works featuring contrasting elements are better remembered and processed more easily (Winkielman & Cacioppo, 2001).
The concept of contrast refers in aesthetics to two related processes: the juxtaposition of dissimilar elements (i.e., color, tone, or emotion) and the degree of difference between the lightest and darkest sections of a painting. The latter is defined as “chiaroscuro” (“light-dark”). Artists known for their use of chiaroscuro include Leonardo, Caravaggio, Rembrandt, and Goya. The scientific principles underlying the other form of contrast were defined by Newton (1704). He presented a circle showing a spectrum of seven colors, observing that certain colors around the circle were opposed to each other, forming the greatest contrast. The circle was later divided into twelve sections to produce the now-familiar “color wheel”. In the late 18th century, the colors opposite each other on the wheel, most contrasting with each other, were defined as “complementary”. The chemist Eugene Chevreul (1839) conceptualized and popularized the notion of complementary colors. The theory was further refined by Charles Blanc (1867).
Ambivalence
Ambivalence encapsulates the experience of opposing forces: positive and negative, toward an object, a person, or an action (Ashforth, Rogers, Pratt, & Pradies, 2014). It may refer to the experience of opposite attitudes or feelings, but also to the continual fluctuation between them (Palmberger, 2019, p. 75). There are multiple ambivalence types in the literature, but we highlight three types capturing the main cognitive, emotional, and behavioral mechanisms that have been conceptualized so far (Rothman, Pratt, Rees, & Vogus, 2017, p. 35).
Affective (or emotional) ambivalence refers to the disparity between feelings. It transpires when individuals oscillate between positive and negative feelings, such as confidence and frustration (e.g., De Vaujany & Aroles, 2019; Resch & Steyaert, 2020). Affect is inherently relational (Endrissat & Islam, 2022), emerging from configurations of relations within a space (Gherardi, 2018). Affective ambivalence is often embedded in the tension between forces of belonging and individuation, observable in many contexts (e.g., De Vaujany & Aroles, 2019). Cognition and emotions are typically intertwined (Elfenbein, 2007), particularly in conditions of high ambivalence. Cognitive ambivalence captures the disagreement between beliefs or concepts that an individual is exposed to or adopts (Ashforth et al., 2014). A state of behavioral ambivalence derives from exposure to opposing social demands, regarding norms or relations (Merton, 1976). The presence of opposing demands in the social environment explains why a state of ambivalence may endure over time (Meyerson & Scully, 1995).
Harnessing appropriate computational techniques: Color and text analysis
Stage one: Measures
For the first step of our analysis, we drew on the compendium of digital images in the “Wikiart” art encyclopedia. To control for stylistic differences due to technical features of the medium (i.e., charcoal, oil, watercolor, etc.), we downloaded the 759 images of works executed in oil in the period 1881–1890. Van Gogh considered oil painting as the most demanding of all media and believed that he needed to develop his skills at drawing before he took up oil painting (Brower, 2000).
To retain control over the procedure and ensure interpretability and replicability, we departed from unsupervised computer vision techniques (such as machine-learning methods), building instead on the computational technique in Sgourev et al. (2023). We first resized each image by 40% and converted it into an Nx3 matrix, where rows represent the pixels composing the image and columns are their respective red, green, and blue (RGB) color components. We then used k-means clustering to group the pixel vectors into 11 clusters (average quality = 94.240, sd = 2.646) and converted each cluster’s RGB centroid into its coordinates in the HSV (hue, saturation, value) and CIELAB color spaces (CIE, 2020).
For each cluster, we recorded its perceptual lightness (CIELAB’s L*) and its red to green and yellow to blue coordinates (CIELAB’s a* and b*, respectively). For each image we computed a measure of
The next step was identifying and coding colors in the images. Color identification is challenging due to variations in perception and naming across individuals, as well as biases in digital image capture by cameras and scanners. To address these issues, we opted for a moving-range strategy instead of imposing a univocal name on colors. Drawing on the HSV color space, we first divided the 360° hue distribution into twelve 30-degree intervals using three different starting points, at 345°, 340°, and 350° (Figure 2). We coded each cluster’s color according to the position of its hue degree in any of the twelve hue intervals. We also coded as black or white all clusters with a very low (L* < 0.05) or very high (L*>0.95) degree of perceptual lightness respectively. This produced three different color specifications, each composed of 14 different colors (Figure 2).

Colors’ degree ranges (left) and starting range for different specifications (right).
We then set a few additional parameters to extract the grays, the browns, and the pinks from the initial color specifications, as experimental research has confirmed that most individuals identify these tints as distinctive colors (Berlin & Kay, 1969; Regier, Kay, & Khetarpal, 2007). Again, instead of imposing a univocal value on these parameters, we generated discrete value vectors for each. For each color specification, we ran 200 models, altering one parameter per model. The average Pearson correlation and pairwise coding agreement between models were significant (correlation between 0.947 and 0.951, agreement 0.968), attesting to the robustness of the color specification method. We inspected the colors obtained at various parameters’ values and, for the next steps, used the three-color specifications with the same parameters. Finally, based on experimental findings on color classification (Berlin & Kay, 1969), we condensed the color spectrum by grouping spring green, green, and turquoise as green; cyan, ocean blue, and blue as blue; and violet, magenta, and raspberry as purple. This color space corresponds to the universal glossary system of colors categories (Lindsey & Brown, 2009), encompassing black, gray, white, red, yellow, orange, green, blue, purple, brown, and pink.
We coded each image as 1 if its clusters featured complementary colors according to any of the three-color specifications (Figure 2, left), and 0 otherwise. We then computed the yearly averages of all measures, with
Since absolute contrast cannot tell if any luminance difference exists between dark and bright yellows, or between blue and yellow, we used the yearly averages to develop two measures of contrast.
Stage one: Results
Figure 3 presents key dimensions of Van Gogh’s use of color. For visualization purposes, all values were mean-centered and scaled. The dots in the graphs represent the observed values, and the solid line approximates the average trend. An examination of the trendlines allows understanding how the colors in the artist’s palette changed over time. The results for colorfulness and luminance confirm that Van Gogh’s palette was darker and less colorful over the first half of the decade than the second half (e.g., Bekker & Bekker, 2009; Naifeh & Smith, 2011). The most somber period is in 1884/85, featuring darker shades. The second pair of graphs illustrates the tendency for an increasing number of colors, and growing presence of complementary colors. The period 1888–1890 attests to the emergence of a colorful, vibrant, and luminous palette.

Six dimensions describing Van Gogh’s use of color.
Figure 4 further highlights the fundamental shift in the oeuvre of Van Gogh. Chiaroscuro contrast, an already well-established technique in painting, gives way to complementary contrast, a distinctly new form of capturing the effect of light. The turning point in the transition from chiaroscuro to complementary contrast is the sojourn in Paris.

Transition from chiaroscuro to complementary contrast.
Figure 5 summarizes the relationship between the colors dominating Van Gogh’s palette. The graphs were produced by counting the co-occurrence of pairs of colors in 1881–1883, 1884–1887, and 1888–1890. The width of the ties connecting the colors represents the frequency with which two colors appear together in a painting. The initial somber brown-orange-yellow palette undergoes a transformation during 1884–1887, becoming increasingly green and blue in the subsequent years. Brown, yellow, and orange are still key elements in the 1888–1890 palette, attesting to continuity in his evolution. This has to do with his technique, juxtaposing tiny brush strokes of colors to capture the effects of light on a surface.

Relationships between dominant colors in three periods of Van Gogh’s oeuvre.
Stage two: Measures
The second stage of our analysis examines Van Gogh’s letters. As at stage 1, we focused on the period 1881–1890. 1 During this period, Van Gogh wrote 654 letters, mostly to his brother Theo, but also to other family members and to fellow artists. We used diverse computational-linguistics tools to derive language-based measures of ambivalence. Computational methods for quantitative analyses of unstructured textual collections allow us to uncover meaningful patterns in large texts, reducing the high dimensionality of data (Evans & Aceves, 2016). They are very effective at capturing temporal trends and historical variations in concepts (Hamilton, Leskovec, & Jurafsky, 2016). We combine three of the most widely used methods for reducing text dimensionality: dictionaries, word embedding, and topic modeling, to analyze the ways in which intentions, hesitations, and emotions are expressed in Van Gogh’s letters.
We used a lexicon-based approach to construct our measure of
where
We derived our measure of
Given that word embedding represents concepts as points within multidimensional, geometric spaces (Kozlowski, Taddy, & Evans, 2019), we used the information encoded within geometric space to formulate a representation of the breadth of concepts evoked in Van Gogh’s writing (Aceves & Evans, 2021). To account for ambivalent cognitive representations, we represented each letter as a list of words, and measured the conceptual distance of each word pair. Formally, we define the conceptual distance between words
where
where
We measure
We created a list of words that define these two concepts, 3 and for each list we calculated its centroid vector in the embedding model. We then computed the cosine similarity between each of these centroids and the centroid vector of the words defining the eight topics. In agreement with the canon of NLP, we assume that the similarity between the two vectors reflects how closely associated a topic is with the concept in terms of meaning. Using this association, we identified the topics that manifest most strongly the co-existence of opposing forces of belonging and isolation.
where
Finally, we considered the three topics with the highest values of behavioral ambivalence and used their topic distribution in each letter to measure
where
Stage two: Results
We begin the analysis of the letters with affective ambivalence, capturing the co-emergence of words expressing opposite emotional states of joy and sadness.
Figure 6 shows a clear tendency for increasing affective ambivalence, which goes through two stages. At the first stage (1881–1885), there is a slight tendency for decreasing ambivalence, supplanted by a linear tendency for increasing ambivalence, starting in 1885. The second half of the decade was marked by intensifying emotional turbulence in Van Gogh’s life, as indicated by the co-existence of positive and negative emotions in his letters. Additional evidence is provided by a related analysis of the co-existence of opposite emotions, implemented using an aggregated measure of positive and negative emotion, returning a very similar pattern to the one in Figure 64 (available from the authors upon request). Van Gogh is often described as a violently emotional painter (Dow, 1964). Our analysis attests to growing emotional volatility towards the end of his life, as embodied in the choice of words composing his letters.

Affective ambivalence (joy and sadness).
Next, we consider the cognitive type of ambivalence, represented in Figure 7. The linear upward trend attests to increasing average conceptual distance over time. Van Gogh’s writing is indicative of a state of growing cognitive ambivalence or openness to divergent ideas and perspectives. This reinforces observations that the experience of affective ambivalence is associated with the tendency for recognition of unusual relationships between concepts (Fong, 2006). It also demonstrates that Van Gogh engaged in relational-synthetic thinking: a process of associating previously unrelated elements and synthesizing them (Brower, 2000, p. 185). The artist was looking for ways to establish a synthesis of elements, drawing on an extensive reading list, including Tolstoy, Hugo, Stowe, Dickens, and Carlyle, among others (Brower, 2000, p. 193). We confirmed the increase in cognitive ambivalence by measuring oppositional orientations toward another key concept for the artist (“God”). The evidence (available from the authors upon request) reveals a high degree of ambivalence vis-a-vis religion in the early 1880s, which later declines, to start increasing steeply from 1885/86. Scholars corroborate Van Gogh’s dual attitude toward religion, vacillating between passionate embrace and resolute dismissal (Apostolopoulou & Issari, 2022; Dow, 1964), finding expression in a subjectively experienced, modern sacred art.

Cognitive ambivalence (conceptual breadth).
Figure 8 presents the temporal dynamics of behavioral ambivalence. The content of the letters tends to become more ambivalent with respect to social relations over time. Figure 8 reveals the same shape as that for affective ambivalence (Figure 6): a slight early tendency for decreasing ambivalence, followed by a linear tendency for increasing ambivalence. We reached the same conclusions through two methods for capturing behavioral ambivalence. The first one measures the simultaneous presence of social pronouns (“we”) versus self-pronouns (“I”). The co-occurrence of these pronouns is an indication of behavioral ambivalence, as it reflects sources of tension within the individual’s sense of self (Berger & Packard, 2022). The second measure tracks the frequency of “I” pronouns in relation to words of affiliation, utilizing a pre-existing dictionary. Both measures yield a very similar pattern to that in Figure 8, revealing a state of oscillation between isolation and connectedness (the two figures are available from the authors upon request).

Behavioral ambivalence (belonging and isolation).
Validation of co-emergence: Trend test and association
The last analytical stage ascertains the validity of the measures and establishes the degree of association between manifestations of the key connecting element in the two modes. Table 1 presents the results of the Mann-Kendall trend test for the five measures in our analysis. This is a non-parametric test, analyzing data collected over time for consistently increasing or decreasing trends. The values for chiaroscuro/complementary contrast, affective and cognitive ambivalence reveal p-values at the accepted levels (p < 0.05), confirming the presence of significant trends over time. Behavioral ambivalence reveals a slightly weaker p-value (p < 0.1).
Mann-Kendall Trend Test.
Next, we tested the strength of the association between the five measures. The estimates presented in Table 2 corroborate the associations between the time series corresponding to contrast and ambivalence. There is a high probability of co-emergence of opposition: that the manner in which Van Gogh articulated contrast on the canvas (transitioning from chiaroscuro to complementary contrast) was associated with his exposure to affective, cognitive, and behavioral ambivalence during the observation period.
Tests of Association between Contrast and Ambivalence Measures.
Significance codes: **<0.05, ***<0.001.
Internal and External Validity
A necessary step in the testing of a new methodological approach is the establishment of the validity of the results generated through its application. We found support in the literature on Van Gogh and in his letters for the appropriateness of the use of opposition as the key connecting element (contrast and ambivalence). Scholars observe that contrast was ever-present in Van Gogh’s style. In the early 1880s it was dominated by muted colors, expressing the contrast between light and dark (Bekker & Bekker, 2009). In early 1883 he read Blanc’s book on the theory of color. He discovered the power of color in Paris in 1886, moving away from the muted colors to the brighter tones of the Impressionists (Bekker & Bekker, 2009). His exposure to Japanese prints in Paris reinforced his penchant for bright colors and intense expression. His objective was to release the expressive force that transfigures technique into art (Grant, 2014, p. 117). In his own words: “I want to reach the point where people say of my work, that man feels deeply. . .” (on or about July 21, 1882). From the mid-1880s his paintings embody his belief in the use of contrast of complementary colors to structure the composition and express the intensity of emotions (Dow, 1964). He recognized that “these things that are relevant to complementary colors, to the simultaneous contrasting and the mutual devaluation of complementary colors, are the first and most important issue” (October 1885): In another letter (August-October 1887) he states his objective as: “...seeking oppositions of blue with orange, red and green, yellow and violet... Trying to render intense color.” The patterns and timeline presented in Figures 3 and 4 are consistent with his writings and with observations in scholarship on the evolution of his style.
We also identified additional supporting evidence for behavioral ambivalence. For example, Porter (1982, p. 54) describes a “profound split between the urge for domesticity, marriage, calm, dailiness and home, and the attraction to isolation, chaos, anguish, visionary transcendence and even madness.” On the one hand, he expresses a desire for belonging and companionship, to found a colony of artists, to live with Theo or seek out company in an asylum. On the other hand, he admits to his reclusiveness: “One is afraid of making friends” (November 26 and 27 1882). The letters veer back and forth between these two moods (Porter, 1982). Similarly, Apostolopoulou and Issari (2022, p. 109) identify oscillation between two states: asceticism, allowing him to concentrate on his work, and a web of social relations: the centrifugal need to relate with contemporaries, friends or family, and the centripetal urge to withdraw from social life and relations, and devote himself to his art (p. 105).
Topic modeling 5 provided evidence corroborating the growing behavioral ambivalence. Topic 1 is defined as “practical”, as it deals with money and the practical aspects of being an artist. Topic 2 is defined as “technical”, dealing with artistic techniques and improving skills. Topic 3 is defined as “romantic”, reflecting the expression of intimate sentiments. Topic 4 is about the role of color in art, Topic 5 is about light, weather, and nature. It is Impressionistic in nature, capturing the effects of light in changing weather. Topic 6 is defined as realism, about representation of real life. Topic 7 is about artistic exchange, the desire to found a community of artists. Topic 8 is defined as existential, featuring thoughts on the mysteries and challenges of life. The analysis reveals that the topics with the highest level of behavioral ambivalence are Topics 8, 7, and 3.
Figure 9 presents the distribution of topics in the corpus over time. One tendency that emerges is that the practical and technical topics are decreasing in importance, as Van Gogh approaches the end of his life. This is probably related to his changing priorities, but also to the fact he had already mastered the technical aspects of painting. Similarly decreasing in relative weight are topics related to the Realist and Impressionist styles, as Van Gogh forged ahead with his idiosyncratic style. In the second half of the 1880s he moved away from Realism, looking to represent on the canvas his subjective reality, not objective reality (i.e., Nature). Unsurprisingly, his interest in rendering light, weather, and nature (Topic 5) is declining too.

Distribution of topics over time.
The evidence indicates that what becomes increasingly important to him is captured in two topics: artistic exchange (the creation of an artistic colony), and color. His letters reveal growing preoccupation with the principles of application of color, and the pursuit of shared experiences with fellow artists. The extent to which the letters in the last years of his life feature the topic of artistic exchange is remarkable. This attests to increasing behavioral ambivalence, as this topic captures the tension between belonging and isolation. The distribution of the Realist topic suggests that the artist is moving away from objective reality and toward the inner self, in pursuit of a highly expressive style. Brower (2000, p. 197) observes that Van Gogh became more alone, as he evolved a more novel statement of his art. It is only natural that delving into the inner self would exacerbate his feelings of loneliness, increasing the need for artistic exchange and for participating in a peer community. The ambivalence between productive isolation and pursuit of belonging is fundamental to his identity (Dow, 1964; Porter, 1982), intensifying in the last years of his life.
We also captured the relational source of ambivalence in a more direct manner by examining his social network. Figure 10 presents the aggregate number of mentions of social contacts in his letters. The results attest to a dramatic upturn in the density of the network from 1885, supporting the tendency (Figure 9) of increasing interest in “artistic exchange.” A closer look at the network reveals a twofold development: it becomes simultaneously denser and more centralized, dominated by Paul Gauguin and Émile Bernard.

Mentions of social actors over time (aggregated).
Having identified the centrality of Gauguin, we examined the affective and cognitive ambivalence in his regard. The results confirm that the ambivalence toward him increased in the last three years of Van Gogh’s life. The results for affective ambivalence (Figure 11) attest that Van Gogh alternated between positive and negative emotions in regard to Gauguin. Cognitive ambivalence returned an identical pattern.

Emotional ambivalence in sentences mentioning Paul Gauguin.
These findings demonstrate that ambivalence had a strong relational component. The mystery of Van Gogh’s last years is captured in his relational duality: the density of the network increased, but so did his ambivalence toward his peers. He was mentioning more people but becoming more focused on a few of them. Contrary to popular wisdom, he was not an outcast, but increasingly alternated between the relational states of isolation and connection.
A single case and a distinctive context naturally raise questions about the generalizability of the findings. What alleviates this concern is that our findings are based on a 10-year period, where the unit of analysis is an element observed in its evolution over time. Distinctive contexts can be analytically advantageous when presenting the opportunity to examine a key element or principle in considerable detail (Siggelkow, 2007). It has long been recognized that Van Gogh’s letters are a precious resource, capturing his creative process in unusual depth (Brower, 2000). Our method allowed us to portray the complexity and temporality of his creative-becoming. However, the fact that we dispose of so few (if any) similarly extensive artistic diaries in history, makes it difficult to assess how common the observed pattern is. The procedure can be applied in any context featuring concomitant verbal and visual data, upon adjusting for contextual factors. We cannot but encourage the application of the procedure in diverse contexts in order to establish its external validity.
Discussion
Scholarship recognizes the need for multimodal analyses integrating text and images (Boxenbaum et al., 2018; Höllerer et al., 2019; Quattrone et al., 2021), encouraging the development of new methods that are broadly applicable (Zilber, 2017). To this end, we devised a multimodal procedure for the analysis of the association between visual and verbal modes, incorporating guidelines for “strong” multimodal research, such as temporality and co-emergence of elements in the two modes (Halgin et al., 2018; Zilber, 2017). We described the method and illustrated its application to the creative process, analyzing the letters and paintings of Van Gogh.
The proposed multimodal method partially integrates machine-learning techniques but eschews full automation of the procedure (e.g., Luo et al., 2024), retaining scholarly control in determining the key connecting element between modes and selecting the suitable computational tools. This was motivated by two considerations. First, we sought to develop a framework that would broaden its applicability and transferability across research contexts. Whereas machine-learning approaches are typically used to identify patterns of differentiation from population-level data within a specific context and for specific types of data (e.g., Le Mens, Kovács, Hannan, & Pros, 2023), our approach enables the adjustment of the computational tools to suit the context. We applied our method to the domain of individual creativity, integrating contextual knowledge of Van Gogh’s oeuvre. Second, we designed the method with the explicit intention to quantitatively extend the established qualitative approach to multimodality. As a result, the approach facilitates alignment between theory and methods, ensuring that the computational tools are well suited to the research question and theoretical framework. As machine-learning methods feature high prediction accuracy (the ability to make correct predictions) and low theoretical interpretability (the degree to which a model allows for human understanding) (Luo et al., 2024), we adopted the recommendation to prioritize the interpretability of models (Rudin, 2019).
Methodological contributions
Our main contribution is in providing a template for quantitative multimodal analysis, which can be adapted to serve diverse research objectives. It can be used as an exploration or a validation tool. It may help illuminate how legitimation draws on different communication modes (Meyer et al., 2018; Vaara, Aranda, & Etchanchu, 2024) or how organizational actors construct a new market category leveraging visual and narratives elements (Jones et al., 2012). It can also be scaled to massive databases to enable comparison of text and images over time. Recent methodological advances offer unprecedented opportunities to access organizational websites and capture both visual and textual data longitudinally (Haans & Mertens, 2024). Websites are valuable historical repositories of information on how organizations reach customers, attract talent, or connect with stakeholders. The proposed method is particularly suitable for analyzing the degree of alignment between the visual content of a website, the projected image of a company, and the strategy or identity derived from texts, such as company reports or marketing materials.
The method facilitates the analysis of how organizations convey a key value (e.g., “sustainability”) across both textual narratives (e.g., website text, mission statements, or press releases) and visual representations (e.g., photos on the website, packaging design, brand imagery, product catalogs). This bolsters our capacity to measure the consistency of multimodal representation and capture the level of authenticity of organizations (Lehman, O’Connor, Kovács, & Newman, 2019). Furthermore, by combining new methodological tools, scholars become better equipped to register and interpret how the meaning attributed to concepts, such as “sustainability,” changes over time.
It is also possible to shift the analytical perspective by exploring how the process of co-emergence affects the evaluation of products or producers by audiences. Our account is on the producers’ side, but the method can be applied to explore how the perceived alignment between the communication modes of a producer (e.g., PowerPoint slides and oral pitch: Elsbach & Kramer, 2003, or images and text on social media: Roccapriore & Pollock, 2023) shapes audience evaluations.
New insights about how organizational culture is constructed, contested, and continuously reshaped may be obtained by exploring how cultural values are manifested across modes. Consider as an illustration the decision of Leiden University to relocate a painting depicting a group of elderly white men, members of the 1974 university board, because some faculty members found it in disagreement with the values of diversity and inclusivity espoused by the university and actively promoted in teaching. 6 By mapping textual and visual sources, scholars can explore the alignment between the values these sources embody, providing new insights into how cultural conflicts emerge, persist, and are resolved over time (Gregory, 1983). It is also possible to examine in greater depth the impact of the social context. We captured this impact through the concept of behavioral ambivalence, through topic modeling and the analysis of a key social relation (Paul Gauguin), but more comprehensive treatment necessitates the collection of extensive relational data.
Our procedure is relatively flexible, leaving discretion to the user in identifying a key connecting element, choosing secondary principles of mode connection that are spatial or thematic in nature, and selecting suitable computational tools. A spatial approach would consider the locations in which modes are produced or used, thereby capturing how meaning is constructed across distributed communicative environments. Thus, one may explore the role of space within the temporal framework. This may consist of examining how the association between text and image changed as Van Gogh traveled or selecting a spatial theme as the key connecting element, exploring the co-emergence of the word “Japan” and of Japanese stylistic elements in his paintings. In the organizational context, one can explore the correspondence between text and image across company subsidiaries or regional markets. Exploring spatial variation in this manner can be helpful in capturing the degree of consistency of organizational culture or identifying differences in the proclivity for innovation across geographical units or markets.
A thematic approach groups modes by content or purpose, enabling the examination of how different modes become aligned on key ideas or concepts. For example, in the organizational context one can investigate the co-emergence of the words “ecology” or “sustainability” and images of trees or the use of green color in company reports. Or analyze the correspondence between the word “international” in strategic documents and the degree to which slideshow presentations or marketing materials feature references to external markets. Focusing on the theme of “design,” for example, can help to understand how new designs emerge in the flow of communication between team members.
The use of a key connecting element makes our approach flexible and adaptable to context. However, as with any other method, ours has limitations and boundary conditions. Most importantly, the method is designed to capture a “strong” form of multimodality through the analysis of co-emergence over time and is not appropriate for “weak” forms of multimodality and cross-sectional analysis. The thematic and spatial options in our framework are necessarily longitudinal in nature; they provide an opportunity for customization within a temporal structure. We encourage the development of cross-sectional applications using clustering techniques or cross-tabulations, to evaluate how modes converge around similar meanings, and call for more attention to the possibility to integrate cross-sectional features within the longitudinal design.
Our method can be further extended by developing a capability to use two (or more) connecting elements, as well as by capturing the alignment between text and materials or image and materials. It is relatively unproblematic to integrate diverse tactile measurements within the framework. The key connecting element can be a particular physical property, such as color, hardness, or malleability. It can also be a smell or the sentiment provoked by touch. We envision our method as a platform onto which applications can be added.
Theoretical contributions
Our secondary contribution is in establishing a new substantive link between multimodality and creativity scholarship. The application of the method enabled the analysis of creativity as continuous in nature: as creative-becoming through iterative experimentation over time (Karakilic & Painter, 2022; Sawyer, 2012). By exploring the co-emergence of elements between modes, we made explicit the ways in which past experiences and relational tensions feed into the present and find aesthetic expression. The proposed method contributes to the theoretical reorientation of research on creativity from instrumental, linear, and stage-based accounts, to those emphasizing the continuous, improvisational, unpredictable, and fractious nature of the process (Fox, 2015; Karakilic & Painter, 2022). The volatile and contradictory creative process that emerges from our context is not the one in accounts that emphasize balance, predictability, and “flow” (e.g., Amabile, 1996; Csikszentmihalyi, 1997). It echoes Jung’s (1952, 1959) expectation that creative people are more likely to be divided within themselves, motivated to make sense of the divisions in the inner world. The pursuit of a tentative solution to perpetual tensions propels the quest for original forms of expression (Storr, 1988).
To identify a theoretically grounded connecting element, we drew on the “Janusian” tradition (Rothenberg, 1971) and adopted the principle of opposition, as reflected in word associations and visual forms. Results confirmed the appropriateness of this choice, corroborating the pertinence of a state of exposure to oppositions to the ability to connect unrelated elements (Fong, 2006). Ours may be the first quantitative, non-experimental analysis to substantiate Rothenberg’s (1971, p. 202) suggestion that “the particular content of a Janusian thought is very likely highly related to conscious and unconscious emotional conflicts in the creator himself.” Our quantitative multimodal approach enabled us to confirm that this relation is longitudinal and cumulative in nature, as the creative insight emerges from alternating exposure to opposite stimuli: a process that is difficult to reproduce in experimental conditions.
The proposed method improves our capacity to capture the ambivalence of modern organizations, defined by simultaneous demands for autonomy, control, order, freedom, imagination, and effectiveness (March, 2008). The analysis illustrated the process whereby contradictory social and psychological forces are connected by their simultaneous presence. It contributes to scholarship on ambivalence by reinforcing arguments that ambivalence can be an enduring state, defining identities (e.g., Ashforth et al., 2014; Meyerson & Scully, 1995). The results highlighted a dynamic state of ambivalence, revealing a tendency for oscillation between states, rather than their static co-existence. We propose text-based measures of ambivalence that can be instrumental in facilitating the study of emotions, which remain difficult to examine empirically (see Zietsma, Toubiana, Voronov, & Roberts, 2019). Multimodal tools have the potential to offer substantively new insights into the interface between cognition and emotions (Elfenbein, 2007) by allowing us to explore variation in individual and collective affective states identified through textual sources. This can be particularly useful in capturing the dynamics of organizational crises or the genesis of collective resistance by analyzing personal correspondence or the minutes of meetings.
The growing accessibility of large-scale, longitudinal, and multimodal organizational data (Haans & Mertens, 2024) has writ large the lack of suitable methodological tools to systematically engage with such data. By expanding the toolkit available to scholars with a flexible and scalable method, we hope to spark new research and obtain fresh insights into the dynamic interplay of texts, images, and organizational processes. As demonstrated, the application of the method can also help illuminate the complexity of the creative process and of personal identities. Van Gogh’s embeddedness in a rich emotional and cognitive world (Brower, 2000) propelled the pursuit of new forms to express the intensity of opposition to which he was exposed. This pursuit was simultaneously fulfilling and distressing, requiring psychologically costly perseverance (Rothman et al., 2017). The artist suffered as a result of his increasing volatility, but volatility motivated self-expression. The legacy of this extraordinary artist is his aptitude at creating a visual idiom that articulates the harmony of opposites. He turned ambivalence into an image even before it became a word.
Footnotes
Acknowledgements
The authors would like to thank Dennis Jancsary, three anonymous reviewers and participants in the 2023 AOM PDW on Qualitative and Quantitative Analysis of Visual Data, for their helpful comments and suggestions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. Donato Cutolo gratefully acknowledges financial assistance from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no.101103930.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
