Abstract
This article proposes that, through synthesizing language acts with visualization processes, generative AI is currently altering the architecture of human cognition, necessitating a new way of understanding our relationship with images. The multifaceted concept of ‘image thinking’ offered in its title refers to: (1) our human thinking about images – a key area of enquiry for Visual Culture; (2) image generation through cognitive processes, be they human or machinic; and (3) images themselves being agents of thinking. Both Visual Culture and Neuroscience use the notion of ‘the mental image’, which is premised on the belief that humans form internal pictures in their minds from words and concepts. Those pictures subsequently facilitate how we see – and know – objects and phenomena. Today a novel feedback loop between language, vision and imagery has emerged, with machines being able to generate images from textual prompts. This latest technological development blurs the line between human imagination and algorithmic generation, thus calling for a radical reassessment of how we perceive, imagine and understand the world. Moving beyond the frequent dismissals of synthetic imagery as average, banal or ‘mean’, the article offers a Vilém Flusser-inspired critical approach that seeks cognitive opportunities for us within the current cultural moment.
Introduction: human and machine thinking
Alan Turing’s (1950) landmark question, ‘Can machines think?’, still resonates. As computing machines now perform feats once seen as uniquely human, from composing symphonies to drafting legal arguments, the enigma persists: how, exactly, do they think and, more importantly, do they think at all? Their algorithms, datasets and neural networks form a conceptual opacity: a black box that covers up our ignorance not just about machinic cognitive architectures but also, more importantly, about the machinery of human thinking. 1 For centuries, Western philosophy has mapped thinking through metaphors of vision, equating seeing the world with knowing it, and knowing with seeing more generally. The Latin videre (‘to see’) sharing a stem with verus (‘true’) and veritas (‘truth’) accentuates this connection. Philosophers from Bishop Berkeley to the early phenomenologists entwined cognition with mental imagery, treating those transient inner pictures as a vital link between perception and understanding, sight and insight. Recent work in philosophy of mind and cognitive science marks a shift from the introspective model of mind premised on the static ‘pictures in the head’ model to the generative account of the mind’s operations: it understands perception as prediction under uncertainty, imagery as a generative workspace and language as a prior (i.e. initial assumption, or entry data) that shapes what becomes visible. It is through this register that we approach ‘image thinking’ here. But if the architecture of human thinking remains opaque, how can its artificial counterpart be engineered with confidence? How can thinking be made visible, including to the thinker, and what part do images now play in this process?
The article explores what we term the ‘latent space’ of culture (Somaini, 2025) where human and machine thinking intertwine in ways that are both disorienting and promissory. To this end, merely asking whether machines can think is inadequate (Hayles, 2017). 2 We must also examine how their posited thinking redefines our own, enabling us to embrace a more complex, multisensory and experiential model of cognition – one that integrates concepts, images, percepts and various forms of unsymbolized experience (Hurlburt and Schwitzgebel, 2007: 104, 137–138). Yet our primary interest here lies in the evolving reciprocity between thoughts and images in humans and machines. By tracing the transformation of this reciprocity, we aim to illuminate the contours of a cognitive landscape currently being reshaped by synthetic imaging. The multifaceted concept of ‘image thinking’ we offer refers to: (1) our human thinking about images – a key area of enquiry for Visual Culture; (2) image generation through cognitive processes, be they human or machinic; and (3) images themselves being agents of thinking. With this, we aim to take some initial steps towards developing a new conceptual framework for Visual Culture. This framework will allow us to reposition both our research objects and our own role as researchers within the contemporary epistemic landscape which is being increasingly shaped by machine learning and generative AI. Central to this enquiry is an intensive engagement with the philosopher of technology who took both thinking and images seriously: Vilém Flusser. We will map Flusser’s programmatic account of technical images onto diffusion-based synthesis in machine learning while clarifying why prompt-driven imaging is epistemically different from mimetic capture. With this, we will bracket the important infrastructural issues of platform economics and labour analysed by others (Crawford, 2021; Pasquinelli, 2023) in order to position image thinking itself as a cognitive infrastructure: a recursive circuit in which concepts become images and images, in turn, mediate concepts. This approach, jointly elaborated in response to the current developments around synthetic imaging, extends our earlier philosophical work on perception, computation and photography (Toister, 2020; Zylinska, 2023) towards an explicitly cognitive model, albeit one anchored in critical perspectives drawn from Visual Culture.
Generative AI can assist us in this attempt to rethink human and machine image thinking. Models such as Stable Diffusion, Midjourney or DALL·E conjure images from textual prompts with great fluency, producing outputs that blur the line between imitation and imagination. Yet, their operations, driven by natural-language prompts coupled with vectorization processes, which place probabilities of this or that image cluster occurring within a multidimensional space, hinge on a not unproblematic assumption: namely, that human thinking can be distilled into computable routines. This state of affairs demands a deeper reckoning with the concept of the ‘mental image’, not merely as a visual construct in the mind but as a dynamic interplay of memory, imagination and sensory perception. Neuroscientific research reveals that imagining an object (i.e. bringing it up in one’s mind) activates neural pathways that are akin to those triggered by actually seeing it, thus suggesting that mental imagery may be a cornerstone of not just visualization but also thinking (Nanay, 2021). However, this cornerstone is not universal. Some people experience aphantasia, the inability to form mental images, while others report hyperphantasia, the ability to form intensely vivid mental depictions (Milton et al., 2021). This phenomenon reveals that human thinking is a complex network of neurological variations which resists simplification into a unified algorithmic form.
Importantly, AI image generators do more than just replicate our mental images translated into prompts; they actually intervene in the cognitive loop, reshaping how we think about both imaging and thinking. The feedback loop is recursive: machines generate images that humans view and internalize, recalibrating our imagination in the process. The line between human and machine cognition and perception thus begins to blur, while something novel arguably emerges: a composite space where human thinking is being gradually reconfigured by computational operations. We propose to call this process ‘cognitive hacking’. This is not hacking in the traditional sense of breaching secure systems; rather, it is an epistemic hack, one that involves reconfiguring the perceptual and mental system through which we process the world. Moving beyond the explicitly negative connotations of ‘hacking’ adopted by cybersecurity discourses, as well as beyond the scepticism implied in statements describing AI as merely a ‘meaning-devoid stochastic parrot’ (Bender et al., 2021), we embrace Ted Underwood’s (2022) proposition to ‘approach neural models as machines for mapping and rewiring collective behaviour’. Through this framework, AI models are seen not so much as models of human intelligence or cognition, but primarily as models of culture. Important economic and labour questions aside, the near-existential panic around AI imagery manifested by many thinkers, journalists and general public at this present moment may therefore have more to do with the epistemic void, the possibility of there being nothing essential ‘there’, at the heart of our human thinking and imagination, being raised by those models, rather than with any specific aesthetic or conceptual flattening they supposedly enact.
Western knowledge between projection and prediction
The assumption that clear sight naturally yielded true knowledge, which had shaped Western thinking since at least Plato, began to fray under philosophical scrutiny by the early-modern period. Descartes’ Meditations (1984[1641]) unsettled confidence in sight, demonstrating how easily the senses misled even the most disciplined thinker. Berkeley (1949[1713–1734]) intensified this doubt through his radical esse est percipi tenet, relocating reality into the ambit of perception and inner experience. In Kant’s (1998[1781–1787]) philosophical framework, perception synthesized sensory inputs through the categories imposed a priori by the mind. Vision thus ceased to be understood as just a transmission of reality: it became a perpetual negotiation between the outer world’s multiplicity and the mind’s organizing faculties. Twentieth-century phenomenology embraced this epistemological shift. Merleau-Ponty (1945, 1968) poetically described perception as a contingent weave, emphasizing that seeing emerged within fluid and ever-shifting interactions between the human body and the world(s) it inhabited. This embodied outlook exposed sight as inherently interpretative, perpetually reshaped by subjective context, memory, emotion and expectation. Cognitive science corroborates Merleau-Ponty’s insight, revealing perception to be less a recording of external stimuli and more a continual prediction and hypothesis: an anticipatory dialogue between neural hierarchies and sensory evidence (Bruner, 1957; Neisser, 2014[1967]). In this framework, seeing is positioned as quintessentially generative. We do not merely discover what is but actively ‘categorize’ (Bruner, 1957: 123–125), and therefore ‘construct’ (pp. 127–138), what can be known, creating and recreating the visible world with every glance.
There are clear parallels between the current neuroscience research and the intimations about the constructedness of both images and vision coming from Art History and Visual Culture – many of which actually precede the latest scientific discoveries. Aby Warburg’s (1999[1781]) concept of ‘pathos-formulae’ highlights how emotional gestures and expressive motifs, such as intense movement of bodies or natural elements, traverse history, shaping cultural memory and conditioning how we respond to images. John Berger’s (2008[1972]) argument that ‘ways of seeing’ are culturally and historically situated emphasizes, in turn, that sight is inextricably tied to the ideological context in which the image was produced. Martin Jay’s (1988) articulation of ‘scopic regimes’ expands this position by detailing specific historical configurations through which visuality is governed, regulated and rendered meaningful. Collectively, these scholars argue that the private, internal act of seeing is always already embedded within broader social practices, cultural conventions and symbolic economies. This epistemological negotiation is reincarnated in WJT Mitchell’s (1984, 1986) influential taxonomy which gathers graphic, optical, perceptual, mental and verbal images into a ‘family’ cluster. Mitchell makes it clear that images are never only received optically, nor are they ‘just’ graphic markings to be deciphered by us. Rather, images are actively constructed, thought into existence through layered processes that encompass linguistic, symbolic and affective codes, all of which constantly interpenetrate, shape, skew and inform one another. ‘Like an actor on the historical stage, a presence or character endowed with legendary status’ (Mitchell, 1986: 9), the image also possesses a unique form of agency.
Crucially for Visual Culture, for centuries images have been summoned into existence by encoded scientific and aesthetic formulae, yet these formulae are now being radically reconfigured by computation. Generative AI, with its diffusion models trained on vast image corpora, alters the structural dynamics of visuality, shaking its ‘existential ground’ (Denson, 2024: 170) and expanding its imaginative horizons ever further into realms of invisibility. Such models do more than reproduce familiar conventions: they evaluate, reinvent and propagate novel tropes and motifs. For Visual Culture this amounts to a state of events whereby computational processes actively determine aesthetic norms and the cognitive frameworks that accompany them. We now inhabit an environment in which machines not only generate (‘write’) and interpret (‘read’) images – and thereby operationalize them. They also prescribe the terms of visual meaning-making and, by extension, the visual per se. Consequently, the remit of Visual Culture now extends beyond the interpretation of imagery to scrutinizing the infrastructural conditions and technological substrates that govern its production, reception and circulation. Herein, our interest is in image production with and through language – and the novel role of imagery within human thinking. Over the last decade media theory has shown how much of contemporary Visual Culture operates beyond human-facing display, functioning primarily as platform seeing (Mackenzie and Munster, 2019; Toister, 2024) within operational chains that rank, verify and optimize imagery for machine readers. Building on this work, which provides a bridge between the historical accounts of vision sketched out earlier and our own focus on images as procedures that co-produce cognitive priors, we use the concept of in-visuality (Parikka, 2023) to name the space where the most consequential work is performed before images appear (to humans) as pictures.
The ‘imagetext’ and the prompt loop
In 1986, Mitchell proposed the concept of the ‘imagetext’ as a way to understand the interpenetration of image and linear text as a new hybrid form that demanded a joint reading (Mitchell, 1986, 1994). ‘All media’, he wrote, ‘are mixed media, and all representations are heterogeneous; there are no “purely” visual or verbal arts’ (Mitchell, 1994: 5). The imagetext designated ‘composite, synthetic works (or concepts) that combine image and text’ (p. 89) but it also named the ‘problematic’ of the relations between the visual and the verbal (p. 7). It was a site of semiotic enmeshment, which Raymond Bellour (1996) described as ‘a double helix’. The braiding inside this helix was ‘a site of conflict’ where ‘political, institutional, and social antagonisms play[ed] themselves out’ (Mitchell, 1994: 91). Mitchell’s formulation was timely, enabling scholars to account for the mutual imbrication of language and image, and to do so across a wide range of (mostly static) artefacts. Yet it still assumed a human ‘image-reader’ at its epicentre, one who deciphered signs, negotiated tensions and interpreted semiotic interplay. It did not therefore account for readers augmented by, and entangled with, machines – or for the consequences of this entanglement for our relationship with images.
Enter Vilém Flusser, whose media philosophy, also outlined in the 1980s, prophesied such a shift. Having taken images seriously – not just as artefacts or, more narrowly, art, but as paths towards cognition and knowledge (Flusser, 2011) – Flusser’s work offers a unique epistemic perspective for developing image thinking today. In a striking passage, Flusser (2000: 12) proclaimed: ‘Behind one’s back, the hierarchy of codes is overturned. Texts, originally a metacode of images, can themselves have images as a metacode.’ By positing this reversal, and without ever conceding that it rested on a false binary, Flusser sought to dismantle the longstanding metaphysical scaffolding that had privileged linguistic abstraction over visual immediacy. The ‘metacode’, once the hallmark of human superiority, returns here as substrate, a source code for a new loop of image production. Accordingly, the move from text to image entails a tectonic shift: media forms are subsumed by the operational logic of ‘semiotic machines’ (Nake, 2005: 61), which manipulate sign-processes rather than matter – and which thereby usher in a new regime of sign production.
The black-box apparatus of text-to-image synthesis in generative AI models literalizes Flusser’s insight, with language serving as not just an abstraction of imagery but as an engine for its computational becoming. With this, text not only describes, captions or even anchors images; it conditions their very emergence. It is not that words illustrate pictures, or vice versa, but that the distinction itself dissolves into a recursive logic of a deep learning network. Words become embeddings, prompts are operations, while images become probabilities, modulations and derivatives. Yet the prompt window merely foregrounds what has long been implicit in all kinds of ‘synthetic images’ (a term first used in the context of computer vision, Horn and Bachman, 1978): namely, that they are ‘programmed possibilities’, to use Flusser’s (2000: 69) phrasing. 3 In AI models specifically, when language enters machines through a box of text to travel through the model’s proprietary parameters and throttles, it returns as an image modulated by content filters and style weights. Crucially, each such iteration reorganizes the palette of collective imagination: someone’s fetish or fantasy becomes a probability field tuned to platform convenience.
Traditional iconology as an interpretive method within Art History decoded meanings once the image had (supposedly) stabilized. By contrast, diffusion in AI models compels us to begin interpreting much further upstream: at the stages of model choice, dataset construction and parameter tuning. Asking what, if anything, a synthetic image might signify is inseparable from asking which captions were scraped, whose labour annotated the corpus and who owns the weights that steer the output. And, since such systems mediate Visual Culture, the AI model as its cognitive infrastructure rather than the isolated output should be our unit of analysis. What returns to users as an ‘image’ is a distribution produced earlier, by dataset composition and labelling, model selection and parameterization, safety filters and style weights. Interpreting such images then necessitates attending to dataset provenance, as well as information on watermarking, because these prescribe the epistemic conditions within which imagination is made operational. In short, the model is not only a medium of (pictorial) representation; it is a system that organizes the likelihoods of what can be pictured at all.
Visual Culture thus needs a more dynamic term than imagetext to capture how images become images today. As they are not only a site of readability but also of writability and re-writability, and as encounters with durable and static imagery are increasingly scarce, we want to offer textimagetext to foreground how images now circulate in incessant loops. Not only do they appear and reappear in news, promotional campaigns or as memes on social media; they are also fed into invisible training datasets for AI that are then used to generate new images. When prompts become images and those images feed back into further prompts, we enter another loop where the boundaries between image making and thought generation blur. In a textimagetext circuit, language initiates an image, which in turn becomes legible to both humans and machines as new text. This loop is distinct from the earlier models of image–text interplay. In Mitchell’s imagetext, the relationship between image and text was still largely human-controlled and, more importantly, human-oriented – a hieroglyph to be deciphered, a cartoon to be read, a film to be interpreted (notwithstanding Mitchell’s [2005] recognition of image agency in his later work). In the textimagetext, the image is to be read by both humans and machines, serving as data for further iterations of the generative process. 4 The boundary between thinking images and visualizing them becomes porous here, with language actively designing images rather than just describing or annotating them. Images thus revise and reformat the linguistic space from which they emerge, but they also bring in the human user as part of the loop. The textimagetext draws our attention to the temporal and procedural structure of meaning production in AI systems, whereby language-based prompts, images, captions and viewers co-produce meaning across multiple cycles of generation, feedback and refinement.
Mental imagery across minds and models
A constitutive ground for articulating the textimagetext loop is the ‘mental image’ – a term that has gradually come to supplant the classical notion of ‘imagination’ and that now traverses philosophy, psychology and cognitive science. At its simplest, a mental image is a ‘perceptual representation that is not directly triggered by sensory input’ (Nanay, 2003: 4). Without the causal anchor of immediate sensation, it functions as a rehearsal of perception. A mental image need not be exclusively visual and might incorporate auditory, olfactory or other cues. Moreover, percepts are never just generated by and for the self. The mental image operates within a cultural space even if it may feel intensely personal.
Generative-AI image models furnish a mechanical counterpart to the mind’s process of imaginative rehearsal. A diffusion engine begins, much as the mind does in the absence of sensory input, with structured indeterminacy: a canvas of noise embedded in a high-dimensional latent space. This is to the model what mental imagery is to one of us: a generative workspace where half-formed shapes negotiate with memory before surfacing into sight. A compressed linguistic cue (the prompt) then warps this information cluster towards coherence. At each de-noising step, the model advances a provisional hypothesis of form, consulting the statistical memory of hundreds of millions of captioned images. Possibly like the brain’s predictive loops, it revises its guess iteratively, weighting patches of the image according to the transformer’s attention, so that ‘red pomegranate’ pulls emerging colour and contour away from, say, ‘red planet’. After 30-odd passes, the loop collapses the latent representation into pixels, producing an image that maximizes the joint probability of visual pattern and verbal cue, an executable surface of what, within provisions, the prompt ought to look like.
The analogy between mental imagery and generative AI is justified. Both mechanisms draw on prior experience, personal memory in the first case, vast scraped datasets in the other, while making predictions guided by cues. Each fills occlusions, interpolates trajectories and biases its output towards familiar styles. This analogy allows for both systems to be described as predictive. In the human case, the cognitive apparatus proposes hypotheses that the incoming sensory signals then correct, with imagery being part of this generative workspace. Words modulate the workspace: even minimal verbal cues act as priors that tilt inference towards certain features and away from others. In the model’s case, a prompt similarly reweights a latent trajectory in the de-noising process, biasing the emerging form towards label-consistent features. Combined, this reframes the textimagetext not just as a semiotic braid but as a predictive circuit whereby language conditions image, image returns as language and the loop updates the priors that will guide subsequent perception and prompting. 5
Notwithstanding, the divergences in this analogy are significant too. Human imagery is embodied, affect-laden and multisensory. Its priors are weighted by emotional salience and proprioceptive feedback. Its ‘loss function’ is pragmatic success in lived situations. Conversely, a model’s priors are sheer frequency counts harvested from the web and its loss function is the minimization of pixel-level error against captioned ground truth. Where the biological mind optimizes for survival and meaningful lived experience, the synthetic mind optimizes for statistical fidelity. This contrast can be specified without promoting anthropomorphism – or, conversely, computational reductionism. Human perception minimizes prediction error under its embodied constraints (such as metabolic budgets, affective salience, social context), so ‘accuracy’ is continually negotiated with viability and value. Contemporary generative models minimize loss on training distributions, optimizing statistical fidelity to caption image pairings with no organismic stakes. The consequence is not that the former ‘thinks’ whereas the latter does not (Bratton et al., 2022; Zylinska, 2024), but that they both optimize different things. Coming together in our use of machine learning models, those different objective functions co-produce our cognitive environment.
As posited before, the operations of both human and synthetic minds remain opaque to us. Human introspection only ever reveals the finished qualia, concealing the neural routes that produced them. Likewise, a diffusion model’s pathway is mathematically explicit yet cognitively obscure: billions of parameters encode dispositions that no unaided human can read or fathom. This reciprocal blindness fuels the ‘blackbox’ anxiety surrounding AI outputs. It clarifies that transparency and intelligibility are not synonyms: what is transparent to machine code remains enigmatic to phenomenology, and vice versa. Mental imagery is still tied to perception, though not causally, while diffusion imagery, by contrast, prompts ‘in-the-wild’ iterations only after it has been viewed and recaptioned (although this is likely to change). Nonetheless, both propagate textimagetext loops as synthetic images now seed new visual expectations that migrate into advertising, game design and personal fantasy, thereby altering the cultural priors on which the next training set will rely. In this sense, the model does more than imitate culture: it shifts something within it.
Cognitive hacking
Recognizing this shift encourages speculation on what machines might teach us about our own thinking. Because models lack the human trait of motivational grounding, their outputs often arrive hyper-detailed yet can be interpreted as semantically shallow. Engaging with such models nevertheless highlights how much our human imagery is conditioned by the pre-existing language which carries multiple priors: every prompt externalizes inner speech, exposing persistent clichés, biases and lacunae within our human communication. To generate images, AI models gradually transform noise into pattern in the process of negentropic unfolding. That this process can yield misaligned or surreal outputs (e.g. distorted anatomy) should not therefore be read as failure to represent reality, but as the visual signalling of an emergent epistemology: one in which error may indicate generative divergence rather than systemic failure (Zylinska, 2024). Of course, generative AI also risks reinforcing clichés and market-driven sameness. But to stop there, as many critics of generative AI do, is to ignore its latent cognitive affordances. Generative AI images can instead serve as windows opening onto our own thought processes, which are increasingly entangled with those of computing machines. We propose to call the consequences of this entanglement cognitive hacking.
While its origins lie in the 1970s communal activities of enthusiastic coders at the MIT, in contemporary parlance the term ‘hacking’ conjures images of rogue actors infiltrating digital infrastructures, bypassing firewalls and commandeering servers. We want to revisit that original sense of radicalism implied by this term and propose to see hacking as a form of systemic reprogramming: the manipulation and transformation of a given structure to produce new, and not always controllable, outcomes. It is in this wider sense that we offer the concept of cognitive hacking as a description of the transformations currently being wrought by generative AI on the structures of human perception, thinking and imagination. This form of hacking is not being carried out covertly by malicious agents; rather, it is an emergent feature of contemporary cultural and technical systems. Instead of targeting our cognitive apparatus directly, it works by rerouting our cognitive circuits in their interactions with computational processes. Cognitive hacking therefore should not be seen as just a metaphor: it refers to some actual material processes with real-life consequences. It is indeed our contention that the rise of generative AI is resulting in a systematic reconfiguration of the conditions under which thinking occurs, in a process that is collective, recursive and largely involuntary. When we thus speak of AI’s incursion into cognition, we are referring to processes that reshape the fundamental structures by which information is transformed into cultural meaning. Each new synthesis modifies the user’s imaginative repertoire. Over time, users learn to anticipate the system’s ‘signature’ patterns, modulating their verbal expressions and conceptual frames to coax specific outputs, whether for artistic expression, scientific visualization or quotidian use. To these ends, generative AI operates imperceptibly, gradually recalibrating the user’s own internal processes of thinking and imagination, silently ‘hacking’ them from within. We thus use ‘hacking’ non-dramatically, to denote the gradual alignment of users’ priors, preferences and expectations through repeated interaction with model-mediated images.
Textimagetext loops do not of course affect all users equally: not everyone is a prompt engineer or even a casual user of AI tools. Yet the permeation of AI-generated imagery into public discourse via social media and other online platforms means that we are all exposed to its visual logics. The result is a gradual harmonization of our collective imagination: we begin to think with, and through, the image grammars inscribed by the AI models. The recursive nature of AI image generation means that each new image is shaped by the cumulative weight of previous outputs. What we call ‘style’ or ‘coherence’ is actually a form of statistical inertia. Artist Felicity Hammond’s mixed-media series V3 Model Collapse (2025) distils this dynamic into a single gesture: by looping synthetic data back into itself, the work enacts the very ‘model collapse’ that plagues generative AI and, in doing so, invites reflection on a parallel ‘human model collapse’, the erosion of our imaginative range when we, too, recycle AI-conditioned imagery. As users adapt to the visual languages of platforms, their imagination becomes modulated by those languages. We don’t just write images: they write (us) back, recalibrating our cognitive frames through continuous exposure. This entanglement raises the question as to whether cognitive hacking is a matter of degree or kind. Some might argue that every medium rewires cognition: the printing press standardized our knowledge base; photography shifted our perception; cinema refocused our attention. Generative AI may be just the latest in the line of cognitive prostheses that have transformative effects. Yet the recursive, collective and automated nature of generative AI arguably makes it qualitatively different, as the loops it initiates are not sequential but exponential. The year 2025 saw multiple concerns being raised about ‘the creeping uniformity of AI voice’ but also, more importantly, about human voice and expression in the aftermath of our engagement with models such as Midjourney or ChatGPT. An article in The Verge captures the more existential aspect of this anxiety by pointing out that the speakers themselves ‘don’t realize their language is changing’ (Parker, 2025). While recognizing that AI models and systems will become more expressive over time, reducing their bland eeriness, ‘the deepest risk’ is said to lie not so much in linguistic uniformity but in ‘losing conscious control over our own thinking and expression’.
What are the philosophical stakes of this development? If, in the traditional Enlightenment schema, vision was a pathway to singular truth, in the current paradigm, vision itself has become plural and contingent. We confront a world where reality is routinely fabricated, where ‘our experiences are the content that the brain predicts from the inside out’ (Seth, 2021; see also Zylinska, 2023: 147–165). Neuroscientist Anil Seth argues that reality is essentially a ‘controlled hallucination’ held together by just enough sensory data to keep us alive (Seth, 2021). Generative AI extends this condition outward: it provides external hallucinations that sync up with our internal ones. It hacks the prediction machine of the brain by offering ready-made, hyperreal predictions (i.e. images) that can overwrite or enhance our own. The danger is that we might become passive consumers of these ready-made visions, our imagination overfit to AI’s training data. But we might also learn to play with the collective latent possibilities that no individual mind could encompass alone.
Rather than oppose this development or romanticize a purer pre-AI form of human cognition, our task here is to critically map its vectors. As previously stated, cognitive hacking should not be seen as inherently malign. But, like all hacks, it reveals the vulnerabilities of a system: here, our epistemological system and the values that underpin it. If our thoughts are now co-produced by machines with unprecedented intensity and on a scale never experienced before, then Visual Culture scholarship must rigorously interrogate the infrastructural forces that delimit the field of the visible. It is to Flusser that we turn once again in our attempt to seek openings within this (currently dominant) narrative of model collapse, with a view to exploring some alternative possibilities of thinking with and through the current computational system.
How to think like AI: a Vilém Flusser–Mira Schendel remix
In his philosophical reflections on the work of artist Mira Schendel, a fellow Jewish émigré with whom he maintained a long-term friendship in their adopted home of São Paulo, Flusser outlines two distinct modes of thinking, with the image playing an important role in both. These thinking styles are not just cognitive preferences: they reflect distinct ways of engaging with the world – and of creating meanings in it. In the current cultural moment, where generative AI systems become active agents in the production of our visual culture, Flusser’s insights are worth revisiting because, as is often the case with his work, they reveal an uncanny prescience about future media. Schendel’s work as read by Flusser becomes a site for a conceptual experiment – on Flusser’s part, but also ours here – with how humans see, imagine and conceptualize the world. The comparison is not incidental: both Schendel’s artworks and generative AI systems operate at the threshold between language and image, concept and artefact.
Flusser begins his ‘Mira Schendel’ essay by outlining what he terms the ‘traditional’ mode of thinking, which he describes as a sequential operation that moves from the concrete to the abstract. In this style of thinking – one which Flusser the philosopher identifies with, but which also represents for him Western scientific rationality tout court: I encounter something concrete. I form an image of this concrete thing (I ‘imagine’ it), so as to acquaint myself with it. And then I translate my image into a concept, so as to understand the concrete thing and be able to handle it. (Roth, 2014)
6
In other words, imagination comes first and analysis follows: the mind ‘sees’ an object, then explains it by generating an idea from it. Flusser links this successive process to historical stages of consciousness – a progression from a ‘mythical–magical’ imagination (which involves making images to grasp the world) to an ‘epistemological–technical’ reasoning (which derives conceptual knowledge from those images). In this schema, the image functions as a stopping point; it is a representational placeholder en route to a more rigorous, and supposedly truer, conceptual grasp. Transparency is largely metaphorical here: to understand is to ‘see through’ the image in order to arrive at its hidden meaning. Meaning itself is something to be discovered through layers of symbolic reduction. The image is initially opaque and must be interpreted, step by step, to reveal the concept behind it. In Flusser’s view, this mode treats images as subordinate to ideas: one looks through the image to get to the idea behind it. The concept (i.e. the ‘message’, or abstract meaning) only becomes clear after this sequential intellectual effort, much as a reader processes words line-by-line to grasp their full significance. The cognitive sequence from object to image to concept reflects a historical process in which human beings gradually removed themselves from immediate experience, layering symbolic orders atop the receding world of concrete things. As Flusser notes, this double abstraction results in a condition where ‘we ourselves are somehow not there. For to be there, to exist, means to be in a world of objects’ (Roth, 2014). In other words, through abstraction we arrive at our own alienation – which also involves sensory detachment from the world.
Schendel’s mode of thinking, according to Flusser, operates in the reverse direction. It begins with the concept – an abstract idea – and seeks to materialize or image this idea into visual and spatial form. Rather than use imagination to ascend toward abstraction, Schendel the artist employs it to give form to what would otherwise remain immaterial. Flusser perceives this reversal as a radical epistemological inversion: ‘Mira tries to translate the concept of transparency and the concept of meaning into a realisation (performance) [image] of transparency and a realisation [image] of meaning’ (Roth, 2014). Put simply, she takes an idea encountered in daily life and gives it visible form, and finally that image is materialized as an (art) object. This inversion upends ‘the traditional relationship between imagination and discursive reason’ (Flusser, 2017: 244). Rather than use imagination to eventually generate a concept, Schendel starts with a concept and uses her imaginative–artistic act to incarnate it. The result is a concept made perceivable: it is embodied while demanding a corporeal response from the viewer. Rendering concepts imaginable and concrete, Schendel’s approach has a ‘violently dealienating function’ (p. 245). Our cognitive processes become ‘imaginatively synchronised’ (p. 245) rather than being successive; meaning is grasped in a unified structure, not deferred along a chain of reasoning.
Crucially, this alternative thinking style engages transparency and meaning in a literal, immediate sense, with the image serving as an instantiation. Flusser refers to Schendel’s Untitled [Disks] artwork from 1972 to illustrate this point. The work features two transparent round sheets of acrylic layered with diverse scripts and symbols, hanging in an exhibition space. The work’s transparency is literal as well as conceptual: the viewer sees both the image and through the image, engaging with a field of signifiers whose meaning must be actively constructed while resisting closure. Flusser admits, through a mixture of emotions with some intriguing gender undertones, that Schendel ‘force[s] my vision into a reverse dynamic . . . I am not supposed to “explain” it . . . I am meant to “assemble” it’ (Roth, 2014). This inversion of the traditional cognitive order marks a shift in how knowledge itself comes to be: not as an abstract proposition to be verified but as a visual situation to be experienced. With the symbolic and the sensory collapsing into each other, meaning is not discovered but actively generated in the act of viewing. Concepts are made visible – and this visibility has a reality effect. It allows us the opportunity, in Flusser’s words, to ‘live among realised concepts’. By transforming abstract ideas into images and then into things, Schendel exemplifies a world where ideas are not confined to minds or written texts but float out there in the world, as objects to be viewed, grasped and (re)imagined. We begin to ‘live . . . not among concepts, but among images of concepts’ as Flusser (2017: 245) puts it.
It is within this cognitive architecture embraced by Schendel that the operations of generative AI can be situated. An AI model begins not with objects but with concepts encoded in text prompts. Stripped of any direct phenomenological grounding, these prompts function as abstract linguistic cues. The system then traverses its latent space, a statistical realm of learned associations and visual probabilities, to generate an image that embodies, or at least approximates, the conceptual structure embedded in the prompt. This process is structurally homologous to Schendel’s method. The model, like Schendel, starts from the symbolic order and moves towards its visual manifestation. The result is not an image of the world but an image of the concept. As such, AI-generated images cannot be seen as representations in the classical sense; they are visual syntheses of semantic vectors. They do not reflect reality but reconfigure the conditions under which reality is imagined and perceived.
In spite of its originary opacity, the AI image, like Schendel’s transparent sheet, invites iterative engagement. It emerges within a feedback loop where the prompt is refined in response to the generated image, and the image in turn reshapes the prompt. This recursive dynamic collapses the boundaries between author, viewer and system. It produces a compound form of cognition and perception, in which concept and image co-evolve. Rather than being extracted from the image, meaning is inscribed into the process of its generation. The philosophical stakes of this shift are considerable. If, in the traditional schema, knowledge required the stabilization of meaning through conceptual clarity, the contemporary moment, exemplified by both Schendel and AI systems, offers a different epistemology: one grounded in conceptual flux, aesthetic instability and the posited transparency of the symbolic. To borrow Flusser’s phrasing, ‘everything means everything, which is essentially to say it means nothing at all’ (Roth, 2014). The world is reconfigured as a field of potential images – and the act of seeing becomes a form of writing.
For many thinkers today, this shift in image-making inaugurated by generative AI is both deeply dangerous and deeply unwelcome. Flusser’s insightful prediction could thus be read as its unambiguous repudiation. A recent article in New Statesman encapsulates this sentiment in its assessment of AI imagery as depressing and dangerous. Predicting not just ‘a homogenisation of art, but also a homogenisation of culture’ as a result of the wide adoption of AI models across all spheres of our creative lives, it sees this trend as a manifestation of the pessimistic belief that ‘we have reached the outer limits of what we can achieve – that only robots can advance our content from here. In reality, generative AI leaves little room for anything new as it continues to pull from what already exists’ (Manavis, 2025). Yet this critique still very much relies on Flusser’s first mode of thinking, believing that the creation of artefacts can only proceed in a linear manner, from the supposed richness of concepts associated with human culture to dehumanizing and self-same ‘AI slop’. The transformational parameters of both images and thinking already seem pre-decided in this narrative, with their direction of travel only ever being entropic, i.e. full of chaos and noise. In a similar vein, media theorist Roland Meyer assumes that the very rationale of image generation is the production of images ‘that already exist’ (Meyer, 2025: 3), ideally matching textual descriptions and customer expectations. Meyer posits that realism is ‘the holy grail of AI image synthesis’ (p. 3), yet this seems more like an expectation issued by media and visual culture critics, who then proceed to castigate those images for failing the realism mark (e.g. humans having an ‘incorrect’ number of fingers; see Zylinska, 2024) rather than by the wider public – who seem to be engaging in a variety of knowing uses of this technology, beyond just generating polarizing memes and deepfake porn.
This reading positions AI as a mere echo chamber for cultural realism – at best, a mimetic machine; at worst, a generator of banal distortions. But such a framing forecloses on the deeper epistemological and cognitive transformations unfolding in generative AI systems, particularly when approached through Flusser’s alternative mode of thinking as exemplified by Schendel’s work. Where Meyer sees generative AI as blindly reconstituting a visual database, Flusser offers a more generative lens through which the architecture and potentiality of the latent space could be understood. In Schendel’s reversed cognitive model, thinking begins not with the object but with the concept – an abstract structure – made visible through an imaginative, and often spatial, form. This inversion does not aim to reproduce reality but to reconfigure it, inviting the viewer to assemble meaning rather than extract it. Generative AI, when viewed in this light, operates similarly. We can only describe it as a ‘mash-up of visual elements’ (Meyer, 2025: 4) if we are prepared to see, for example, Hilary Mantel’s Wolf Hall as a mash-up of the British Library. While generative AI does statistically recombine elements within a latent space to synthesize images that are not directly present in the training data, this is not a retrieval or a mash-up: it is projection, an imaging of thought. Generative AI does not search for pre-encoded visuals; it projects them through probabilistic recombination of features abstracted from prior data. What is therefore synthesized is a new instantiation of visual possibility: it is a construction rather than a recollection.
Latent space, the mathematical core of generative AI models, is not a vault of images but a vectorized map of visual potential. Each coordinate represents a weighted entanglement of styles, forms and semiotic weightings – none of which exist as discrete indexable objects. To prompt the model is to traverse this multidimensional landscape and settle at a point where various visual traits converge. What emerges from this process is a speculative materialization of associations, a probabilistic capture of what the model has inferred from culture. Meyer is aware of the statistical operations within latent space, of course, but to him they are overdetermined by the expectations ‘we’ supposedly have of images, be they photographic or AI-generated ones. He thus overlooks the possibility of anything novel, surprising or creative emerging through the process. The political economy critique of the platforms and their big tech owners overdetermines and overshadows for him (and many others) any theoretical potential inherent in the technology. His reading echoes Hito Steyerl’s (2023) notorious repudiation of synthetic images as simply ‘mean’: ‘averaged versions of mass online booty’, converging ‘around the average, the median; hallucinated mediocrity’ (p. 82).
The need for such a political critique notwithstanding, it may be worth considering whether generative AI is engaging in what we earlier called the imaging of thought beyond its mere flattening by externalizing a conceptual possibility-space in visual terms. Through this process it allows us to see our own ideas from outside. The prompt becomes a heuristic probe: in return we get a tableau of what the model thinks we mean. The results are often surprising, revealing unanticipated associations or creative options, effectively expanding the user’s imaginative horizons. The process recalls what Flusser identifies in Schendel’s practice, where concepts, once regarded as an immaterial support system for representation, are made concrete through visual form. For both Schendel and generative AI, the image is thus not a mirror held up to nature (or culture), but a surface upon which thinking inscribes itself. This projectional character of generative AI stands in marked contrast to traditional mimetic regimes. While photographic realism aims to stabilize presence, AI images are precarious constructions, contingent on prompt, model architecture and user interaction. Their generativity lies not in their resemblance to the real, but in their capacity to instantiate the unrealized – to visualize thought experiments, metaphors and counterfactuals. Thus, to frame these images as too constrained by the data engineers’ or their promoters’ visual and cultural standardized preferences is principally to reveal the critic’s own familiar modernist dismay at popular tastes, while foreclosing on the possibility of seeing their structural novelty. Yet, for us, such images can be considered impressions of thinking, not mean or poor imprints of the world. The panic around AI imagery demonstrated by many scholars and artists perhaps results from the sense of horror vacui they evoke, revealing as they do the predictive generativity of our own supposedly original ideas and aesthetic effects. It is not just latent space and its statistical outputs that are feared to be vacuous: it is the fear of the ultimate void at the core of our own thinking, of there being nothing ‘there’, that presents itself in the transfer of negative affect from us to those artefacts.
Conclusion: towards image thinking
This article has traced the shifting locus of Visual Culture into what we have called ‘image thinking’: a recursive domain where cognition, computation and visuality entwine. Image thinking unfolds along three interlaced threads: (1) analytical–critical perspective, diagnosing how images operate before, beneath and beyond visuality while mapping the infrastructural and cultural forces that precondition what can be seen and said; (2) generative process, involving the algorithmic coupling of language and vision that turns prompts into pictures and pictures back into textual data, inflecting both human and machinic ways of imaging – and imagining; (3) autonomous operation, with images acting as semiotic agents that circulate, decide and anticipate for us, rewriting the coordinates of attention and perception inside a distributed cognitive stack. Having now become infrastructures of thought, images shape decision-making processes, circulate affect at planetary scales and frame what is thinkable within any given techno-cultural regime.
From caves to windows, from imagination to the predictive-brain theories of today’s neuroscience, images have always underpinned cognition. What has now changed is the velocity, scale and agential force with which they do so. Two features are distinctive today: recursion at scale (synthetic outputs re-enter training corpora) and prompt-level authorship (linguistically steered synthesis becomes a form of mass literacy). In the process, images shift from being primarily artefacts to serving as interfaces in cognitive infrastructures, resulting in the recently observed uniformities. Generative AI and its attendant systems have introduced a recursive modality into thinking, bringing forth not just a new medium but a new epistemological condition. Contemporary image systems do not simply represent the world, be it well or badly; they shape the very conditions under which representation is possible. This recursive logic of the textimagetext instantiates a new feedback system, wherein images give rise to prompts, prompts to further images and so on. This loop, unlike earlier image–text pairings, is neither stable nor humanly readable in its entirety. It remains partially occluded by the image infrastructure: by latent spaces, image–text–data embeddings and vector fields that operate well below the level of our conscious perception. It is precisely within this in-visual substrate that much of contemporary image thinking now unfolds.
To return to Flusser and Schendel, image thinking today does not proceed linearly, from perception to imagination to concept, but unfolds as a recursive dynamic in which concept and image come together, and minds, machines and media co-constitute one another and co-generate Visual Culture. The future we visualize will already have passed through the machine’s filter of plausibility. Put plainly, machines trained on human culture are now generating visual traits that humans adopt, effectively sealing a feedback loop in which AI-driven patterns become irreversibly woven into cultural evolution. Here the stakes for Visual Culture intensify. Semiotics, iconography and textual as well as art-historical analysis remain important, yet they must now join forces with digital humanities methods that embrace computation and ecological systems thinking. If images actively configure the very conditions under which representation becomes possible, image thinking must attend to the interplay of surface and pipeline, sign and circuit, tracing how invisible infrastructures of statistics and calculation co-author the visible manifestations of thought. Moving beyond interpreting images as cultural artefacts, scholars must now ask how images shape cognition: how images think and, in doing so, how they reconfigure our own thinking.
Footnotes
Notes
Address: Tampere University, Kelevantie 4, 33100 Tampere, Finland. [ email:
Address: King’s College London, Strand Campus, London WC2R 2LS, UK. [ email:
