Abstract
This demo introduces prompt design as a research approach to studying visual generative AI, distinguishing it from prompt engineering practices that focus on producing aesthetically pleasing or technically polished outputs. Drawing on the query design framework from digital methods, we outline several strategies for cultural and social research: ambiguous prompting for bias research; comparative prompting for cross-model and cross-term analysis; evocative prompting for probing model logic and training data; provocative prompting for examining content moderation; and reverse-engineered prompting for machine critique. We illustrate these strategies through a series of experiments using biodiversity as a case study, examining how visual generative AI represents this concept across models, geographical contexts, and time (2023–2024). These experiments reveal recurring patterns in AI-generated imagery, including idealized, aesthetically driven depictions and the persistence of distinct model “house styles” over time. We further explore the use of large language models as research assistants for analyzing AI-generated images, a process characterized by iterative, supervised collaboration rather than full automation. Finally, we position prompt design as a critical and interventionist method that not only audits but also engages with and reshapes generative AI systems, foregrounding their extractive dynamics and opening possibilities for more reflexive and participatory forms of AI research.
Keywords
Introduction: Prompt design strategies for biodiversity 1
This demo proposes a shift from prompt engineering, a more technical approach to mastering the language of generative AI, to prompt design, which employs visual generative AI for cultural and social research on themes such as bias, AI aesthetics, machine critique, and content moderation (Niederer and Colombo, 2024). Building on the query design framework developed by the Digital Methods Initiative (Rogers, 2017), we outline a conceptual and practical agenda for prompt design that enables research about and with visual generative AI.
We begin by exploring the history of prompt engineering, which has evolved from the creation of awe inspiring images to the specialization and commodification of what constitutes a “good prompt.” As we outline, prompt engineering for generative visual AI has transitioned from natural language prompts to more machine and code oriented prompting strategies. Following this, we present several prompt design strategies for using visual generative AI for research purposes, which can serve as starting points for studying issues as mediated by visual generative AI.
These strategies include what we call ambiguous prompting for bias research, comparative prompting for cross-model analysis, comparative prompting of controversial issues, evocative prompting for training set analysis, provocative prompting for content moderation research, and reverse-engineered prompting for machine critique. We also describe experiments using AI as a research assistant for the visual analysis of generated images, an approach that at this point requires active collaboration rather than the efficient outsourcing of tasks.
We present and test these strategies through a series of experiments that examine how visual generative AI represents the concept of biodiversity. While our primary goal is to showcase various approaches to prompt design, the theme of biodiversity serves as a consistent thread throughout the text, allowing comparison of their analytical affordances.
Prompt engineering: Learning the language of the machine
The term “prompt” traditionally evokes the idea of inspiring creativity, stemming from its origins in theatre, where a prompter, hiding in a prompt box, would cue actors who had forgotten their lines. In creative writing or drawing, prompts are used to spark the imagination. In the context of generative AI, the meaning of “prompt” has evolved, especially as models have shifted from traditional graphical user interfaces (GUIs) to text based user interfaces (TUIs). In text based interfaces, users input natural language descriptions, or “prompts,” to guide AI systems in generating visual content.
This shift transforms the creative process by replacing the direct manipulation of graphical elements (e.g., selecting shapes, colors, and tools from menus, sliders, and buttons) with a more intuitive interaction that enables users to express their ideas in everyday language. The text based interaction model is marketed as democratizing creative processes, allowing users without technical expertise to engage in image making. However, the shift from graphical to text based interfaces also redefines creativity itself, as users become conceptual directors and the AI executes their ideas. This raises questions about authorship, agency, and the direction in which human machine collaboration is evolving.
Early prompts often explored the creative capabilities and limitations of visual generative AI by pushing it to produce absurd or surreal combinations, highlighting the technology's ability to merge unrelated concepts into unexpected images and remixes. Well known examples of such prompts include scenarios such as “Freddie Mercury eating spaghetti inside a washing machine” or “Donald Trump as the Nirvana Nevermind album cover baby.” These prompts challenge the AI model to navigate cultural references and imaginative juxtapositions (Figure 1). Additionally, visual generative AI platforms have promoted their services since their release with similar images on their splash pages, displaying outputs generated from prompts such as “an armchair in the shape of an avocado” or “a snail with the texture of a harp” (DALL·E, Figure 2).

Images generated with DALL·E (2022) for the prompts [Freddie Mercury eating spaghetti inside a washing machine] (source: https://archive.is/MtvgN) and [Donald Trump as the Nirvana Nevermind album cover baby] (source: https://archive.md/MtvgN).

Sample prompts and their corresponding images featured on DALL·E’s splash page. The platform promotes its technology by emphasizing its imaginative capabilities, encouraging users to combine “unrelated concepts” and generate visuals of things “unlikely to exist in the real world.” Source: OpenAI, 2021.
As visual generative AI has evolved, prompting has shifted from a mere playful experiment to a skill that users can refine and master. The emergence of prompt books, tutorials, and guides has provided users with strategies to achieve specific styles or outputs. The art of crafting the “perfect prompt” has become a commodity, with paid services and marketplaces, including Etsy, offering pre-crafted prompts tailored to various desired effects or themes (Mileva, 2023). While the effectiveness of such prompts sold online may vary, their wide availability and circulation nonetheless signal that prompting skills have become a digital service in demand. Additionally, the rise of social events such as “prompt battles” (Schmidt and Schmieg, 2022), where participants compete to produce the most impressive AI-generated outputs, underscores the growing recognition of prompting as a form of creative expression. These developments point to a mainstreaming of prompting, from a recreational activity for a niche group of early adopters eager to test this technology, to a marketable and creative discipline, complete with a dedicated economy and community of expert “AI creators” (Figure 3).

Landing page of PromptBase, a marketplace for buying and selling prompts for different generative models (left) or hiring a prompt engineer (right).
Good prompting goes beyond simple natural language descriptions and moves toward more structured and precise input that resembles machine language (see Figure 4). This may entail deconstructing a prompt into distinct components such as “subject, medium, style, art sharing website, lighting, additional details” (Stable Diffusion Art, 2023a) to achieve a more fine-tuned result. More structured prompts can be used to enhance results by allowing users to tweak individual keywords and observe the effects of small changes, such as replacing “misty” with “bright,” on the output. Negative prompts may be used to exclude unwanted elements or steer the output toward a specific style; for instance, adding “painting, cartoon” as a negative prompt can direct the output toward greater realism (Stable Diffusion Art, 2023b).

Example of two images generated using highly engineered prompts. Left: “a dense, ((misty)) forest at sunrise, with soft golden light filtering through the trees, in the style of impressionist painting—no people.” Right: “a dense, misty forest at (((sunrise))), with soft golden light filtering through the trees, in the style of impressionist painting—no people.” Source: Images generated by the authors using Stable Diffusion SDXL, accessed through AUTOMATIC1111 via Google Colab in March 2026. A single image was generated for each prompt submission.
Additionally, weighted prompts allow users to assign varying levels of importance to specific elements, often using syntax such as parentheses or symbols to indicate that the AI model should prioritize certain components over others. For example, writing ((misty)) within double parentheses indicates greater emphasis on this atmospheric aspect compared to other elements (Stable Diffusion Art, 2023a).
These advanced prompting techniques produce outputs that are more closely aligned with the user's vision, while simultaneously making engineered prompts resemble less natural language and more a new language composed of codes, numbers, repetitions, tag salads, and symbols. For example (see Figure 4), instead of simply requesting a rendition of “a forest scene,” a well-crafted prompt might specify “a dense, (((misty))) forest at sunrise, with soft golden light filtering through the trees, in the style of impressionist painting—no people.”
Prompt engineering is primarily a matter of matching input with output “to get the most interesting outcomes” (Bridle, 2023). The prompt engineer masters the skill of crafting precise and detailed instructions articulated “in terms most clearly understood by the system, so it returns the results that most closely match expectations or perhaps exceed them” (Bridle, 2023). When prompts underperform, tools such as PromptPerfect (Jina AI, 2023) offer automated prompt rewriting and optimization, returning a more detailed prompt with its syntax fine-tuned to a specific model. In contrast to the practice of prompt engineering, we propose prompt design, which involves formulating prompts tailored for critical and visual research with and about visual generative AI.
From prompt engineering to prompt design
In the following, we discuss several strategies for prompt design. Prompt design is a direct reference to the digital methods work on “query design” (Rogers, 2017) and aims to contribute to its approach of “search as research.” While “search as research” involves crafting queries to investigate cultural and social dynamics through platform content and search engine results, prompt design adapts this methodology for visual generative AI. It focuses on developing conceptual and applied research strategies that repurpose the generative capacities of visual AI for cultural and social inquiry. Below, we briefly introduce several prompt design strategies.
First, we describe how ambiguous prompting enables bias research and illustrate this strategy with a study of the visual representation of animals in Midjourney (a text-to-image service). Second, we present comparative prompting, that is, prompting for comparative analyses across models and across competing terminology. Third, we introduce evocative prompting for training set analysis, where model outputs provide insight into their functioning. Fourth, we detail strategies for comparing moderation levels across different models using provocative prompting. Fifth, we describe reverse-engineered prompting as a means to explore and compare the aesthetics of different AI models. The topic of biodiversity, which runs as a recurring theme throughout this demo, serves to make these strategies more comparable. Finally, we discuss how using AI for visual research involves close collaboration rather than the efficient outsourcing of research tasks.
Ambiguous prompting for bias research
The rise of visual generative AI goes hand in hand with growing concerns about bias inherent in its training datasets, the design of its algorithms, the context in which these are developed, and the resulting outputs. Academic research and media reports have consistently shown that many visual generative AI models developed in Western contexts exhibit Western-centric, racial, and gender biases when tasked with producing images of human subjects. In response to these ongoing concerns, we present “ambiguous prompting” as a strategy for investigating different forms of bias in AI systems.
Ambiguous prompting involves the formulation of generic, or ambiguous, prompts, inviting the AI model to fill in the gaps and then examining its outputs. This approach contrasts with prompt engineering, where users strive for specificity and precision to achieve optimal results. In contrast with prompt engineers, who strive for specificity, when prompting for visual bias research, one aims for vagueness, designing the prompt as generic as possible rather than long and detailed. The method of ambiguous prompting we describe here is inspired by a query design approach from the “search as research” (Rogers, 2013) framework. With ambiguous queries, one might use generic terms such as [rights] and rely on the search engine to specify the term, creating “hierarchies of concerns.” Just as an ambiguous query relies on the search engine's ability to disambiguate a term, an ambiguous prompt pushes the generative system to compensate for the lack of detail and further specify the term, for example, when prompting for “firefighter” without mentioning a gender.
Testing racial, gender, and cultural bias with ambiguous prompts
With ambiguous prompting, one moves the prompt away from the quest for specificity, as suggested by prompt books and optimization tutorials, towards ambiguity and allowing the model to interpret and represent a particular topic in more detail. Ambiguous prompting has supported the study of different kinds of biased representation produced by visual generative AI, including gender bias. For example, the Stable Diffusion Bias Explorer (Luccioni, 2022), a research tool for studying the outputs of Stable Diffusion and other generative AI models, allows for the side by side presentation of AI outputs in response to generic prompts associated with traditionally gendered professions. The tool shows how the model perpetuates traditional stereotypes, for example by depicting men in professional roles such as CEO or doctor, while associating women with jobs such as nurse or social worker. Similarly, others have found that some generative models have substantial difficulty producing accurate representations of a ‘confident Black female instructor’ (York et al., 2024, p. 111) teaching a class (Figure 5).

Images generated with DALL·E, struggling to depict Black female instructors in a classroom, consistently placing them under the supervision of white figures in authoritative positions. Source: York et al., 2024.
Ambiguous prompts have also been employed to investigate how AI models often simplify global cultural diversity into stereotypes. An analysis of 3000 AI-generated images created with Midjourney by the tech news outlet Rest of World revealed that representations of people from various countries were frequently reduced to stereotypes, for example, consistently portraying an [Indian person] as an older, bearded man wearing a turban and a [Mexican person] as a man wearing a sombrero (Turk, 2023).
Moreover, ambiguous prompts have revealed additional forms of bias in visual generative AI, including those related to age, sexual orientation, and disability (Vázquez and Garrido-Merchan, 2024). For example, the Washington Post (2023) reported that prompts for [attractive people] predominantly generate images of young, light-skinned individuals. This issue of stereotypical outputs generated by text-to-image AI tools becomes increasingly pressing as these synthetic outputs are used as training data for AI models (Nyce, 2024), a process that poses a risk of perpetuating and amplifying these biases within the systems.
Testing visual bias beyond human subjects
Research on overrepresentation and underrepresentation in visual generative AI has predominantly focused on human subjects (Figures 5 and 6), but could also be applied to examining bias in the models’ depictions of animals and plants. A smaller number of studies have examined biases in the categorization of animal and plant species by AI models, such as computer vision and natural language processing models, highlighting problematic representations of animals, or “speciesism” (Hagendorff et al., 2023), and of plant species (Ghajargar, 2024).

Images generated by stable diffusion in response to the prompt [A color photograph of a drug dealer] starkly illustrate the racial bias embedded within generative AI systems. Such models disproportionately associate darker skin tones with criminality, producing images that amplify harmful stereotypes far beyond real-world statistics. Source: Nicoletti and Bass, 2023.
In broader discussions about climate issues, certain animals have emerged as more prominent than others. These “flagship species” (Meffe and Carroll, 1997) and their images proliferate across news outlets, blogs, scientific publications, and social media, thereby becoming emblematic of ecological conservation efforts and climate change. For years, the polar bear symbolized the climate crisis, prominently displayed on protest signs and magazine covers (Adkins, 2023). A study by the Digital Methods Initiative (Weltevrede and Niederer, 2008) found that images of the polar bear were the most used for the issue of climate change across different online platforms, or spheres, to use a throwback term, not only through direct depictions of the animal itself but also through photos of protesters dressing up in polar bear costumes at climate rallies. Other species, such as the monarch butterfly (since the late 1990s) and the bee (recently popularized with hashtags such as #savethebees), have also gained prominence as flagship species for communicating the biodiversity crisis.
These examples highlight how visual representation and aesthetic appeal play a significant role in determining which species capture the public imagination and receive conservation attention, creating a hierarchical system of environmental visibility that may not reflect actual ecological importance. Therefore, it may be worthwhile to examine how text-to-image models, which are increasingly used to generate images for various purposes, might reinforce these visibility hierarchies when depicting non-human species. In one experiment, we investigated biases in the visual representation of biodiversity generated by visual AI. Specifically, we asked which species are emphasized and which are excluded when a generative model is given the ambiguous prompt to depict [biodiversity]. Our analysis focused on the presence and absence of various species in images generated by Midjourney, which at the time of the experiment already returned high quality images, with particular attention to geographical specificity. Which animals are represented by Midjourney for the issue of biodiversity across different continents?
Analyzing the presence or absence of different species in these generated images provides insights into taxonomic bias, extending this line of inquiry to the realm of AI-generated images. Taxonomic bias has been widely documented in conservation funding and research (Clark and May, 2002) as well as in offline media (Clucas et al., 2008). Recent studies have also highlighted the uneven visibility of different species in online spaces such as Instagram (Heathcote, 2021; Shaw et al., 2022), Facebook (Rose et al., 2018), and Twitter (Kidd et al., 2018), often focusing on threatened species.
In this experiment, we focus on analyzing diversity by continent, paying attention to which species are foregrounded and which are left out when a visual generative AI model is asked to generate images of biodiversity for a specific geographic location. In the spirit of ambiguous prompting, we crafted a series of prompts composed of [biodiversity in (name of the continent)] and used each to generate 20 images with the text-to-image tool Midjourney (version 5.2). For each continent-specific prompt, the first 10 generated images were retained and analyzed further. We then extracted up to three animals per image (Figure 7), selecting only those that clearly occupied the largest proportion of the image. In cases where fewer than three animals were visually prominent, only those were extracted. The selection was performed using the Object Selection Tool in Adobe Photoshop, which automatically identifies and isolates objects within an image. We then arranged the cut-out animals flat in the shape of each continent to create a world map (Figure 8). This composition method draws on the technique of knolling 2 and is used here to observe, in a compact way, the variety of animals in the image set.

Example of the process of extracting animal cut-outs from an image generated with Midjourney (version 5.2) for [biodiversity in South America]. Using the Object Selection Tool in Adobe Photoshop, all prominent animals were selected and cut out of the image. This allows the analysis to focus on animal representations, often embedded in highly complex images, by isolating individual species and centering the analysis on those represented more or less frequently. Source: Image produced by the authors, based on an image generated with Midjourney (version 5.2).

World map composed of cut-outs of animals extracted from images generated with Midjourney version 5.2 in 2023 (above) and version 6.1 in 2024 (below), using the prompt [biodiversity in (name of the continent)]. Source: Image produced by the authors, based on images generated with Midjourney (versions 5.2 and 6.1).
The world map (Figure 8) displays geographical hierarchies in the representation of different species: across continents, Midjourney associates biodiversity with different species; in some cases, certain species dominate a continent. For example, Antarctica is solely populated by penguins, unsurprisingly. South America is exclusively associated with colorful tropical birds, while North America features a more diverse fauna, including colorful butterflies, eagle-like birds, and some hybrid mammals. Oceania focuses entirely on marine biodiversity, with mainly fish and only a few birds. The continents of Africa, Europe, and Asia show the greatest diversity of animal species. Africa features elephants, zebras, giraffes, a flock of birds, and one hybrid animal resembling an okapi. Europe holds mostly birds, butterflies, and some mammals, including the platypus. Asia shows a strange array of animals, including several fictional hybrids and ill-shaped creatures.
Overall, birds are the most prominent species associated with biodiversity across continents, while insects, aside from colorful butterflies, are nearly absent. This observation aligns with other research showing that birds are significantly overrepresented in biodiversity data, while insects are notably underrepresented (Troudet et al., 2017). In conservation studies, flagship species, particularly vertebrates, tend to “draw financial support more easily than stinging insects or obscure mussels” and “extract more sympathy from the public than do most plants or insects” (Meffe and Carroll, 1997). In addition, the fact that the only insects depicted in the generated images are colorful butterflies, and that, in general, all animals are quite colorful and visually appealing, highlights a common, and for some problematic, emphasis on the aesthetic value of biodiversity (Mikkonen and Raatikainen, 2024).
The analysis of the most and least prominent species in the generated images was repeated one year later, in 2024, to assess the extent to which Midjourney updated its depiction of biodiversity. In 2024, many more insects have entered the realm of biodiversity, and the colors have become more muted in some continents, for example, Europe and North America, indicating that the model tries to better replicate the fauna of the continent, at least in its color palette. The model still has the greatest difficulty depicting biodiversity in Asia, rendering fictional species, including giraffe-hybrids. In depictions of biodiversity in Oceania, most species are still shown underwater, even the birds, and Antarctica is still only populated by penguins. The species bias found in 2023, with insects mainly represented by colorful butterflies, seems to have been partially compensated for in 2024, when more insects, including fictional ones, are depicted.
Comparative prompting
Comparative prompting is a method for systematizing research through comparative analysis, enabling the comparison of outputs across models and across different issues and terminologies. This approach is similar to the comparative analysis of issues across social media platforms (see, e.g., Rogers and Niederer, 2020; Pearce et al., 2020) or the comparison of search engine results over time (Pearce and De Gaetano, 2021).
When ambiguous prompts are intentionally vague in order to invite a model to specify them and thereby reveal some of its biases, this approach can be implemented more systematically and developed into a comparative exercise across models prompted with the same term or through the use of differently loaded terms, as outlined in the next sections.
Comparative prompting for cross-model analysis
Comparative prompting for cross-model analysis entails inputting the same ambiguous terms into different visual generative tools in order to observe how they interpret the same concept. By allowing for the side-by-side analysis of different models and their visual representations of particular topics, comparative prompting can reveal distinct aesthetic “house styles” (Manovich, 2024).
When comparing the outputs of different models given the same prompt, it is important to note that models may not interpret prompts equivalently, as they can be fine-tuned to accept different prompt grammars and exhibit varying levels of prompt adherence, that is, the ability of a model to follow the instructions provided in a prompt. Nevertheless, it is precisely by using the same minimal prompt, without attempting to engineer it to cater to each model's specific grammar, that one can compare their interpretative tendencies and stylistic defaults.
As part of our investigation into the depiction of biodiversity by visual generative AI, we analyzed model specificity in response to the generic prompt [biodiversity]. To understand how different models interpret this concept, we posed the following questions: Which styles dominate the representation of biodiversity in visual generative AI? Do the models exhibit distinct “house styles,” and have these styles evolved over time (2023 to 2024)?
We used the prompt [biodiversity] across four models: Midjourney, DALL·E 2 and 3 (via Bing and ChatGPT), Stable Diffusion (via the web interface DreamStudio), and Adobe Firefly. As different models are offered through slightly different interfaces, we adapted the prompt accordingly. For Midjourney in 2023, used through its dedicated Discord channel, we used the command /imagine followed by the word “biodiversity,” while in 2024 we entered the term into the prompt input field of the new web interface, similar to what we did for Stable Diffusion and Adobe Firefly. For DALL·E, a model accessed through a conversational chat interface, we simply asked it to “generate an image of biodiversity.”
We generated 20 images per model in July 2023 and replicated the process in July 2024. We kept the number of generated images low to minimize the study's environmental footprint, well aware of the irony of studying climate issues through a highly extractive and energy-intensive technology. From each image set, we selected the five most emblematic images. The selection was conducted through collective close reading by groups of three to five researchers, who examined each batch of twenty images per model and identified recurring features, compositional structures, and color palettes. The five retained images were those that together reflected the diversity of outputs within each set. These images were then aggregated into a composite and described, enabling comparison between outputs generated in 2023 and those generated one year later (Figure 9).

Five most emblematic images for the prompt [biodiversity] for each of the four models, comparing images generated in July 2023 (top row) with those generated in July 2024 (bottom row). The most emblematic images were selected by groups of three to five researchers through close reading each set of twenty images per model and retaining the five that best represented each set. Source: Composite images produced by the authors using images generated by different models.
Looking at images from 2023, despite differences in style and depicted subjects, each model produces an idealized and stereotypical portrayal of biodiversity, framing it positively in the present. Notably, the human drivers of biodiversity decline are absent from all the images, and humans themselves are almost entirely omitted. While models share a similar positive framing of biodiversity, that is, as abundant and thriving rather than under threat, they each “do” biodiversity in their own distinctive style, offering a glimpse into the kind of imagery likely associated with the term “biodiversity” in their training sets.
Midjourney generates symmetrical images in the style of classical nature and botanical art, all featuring abundant birds, winged creatures, and luxuriant vegetation. DALL·E adopts a mysterious, fantasy illustration style, often depicting mystical underwater forests. The tone of these images is primarily cyan, with a focus not only on animals but also on the broader ecosystem. Stable Diffusion showcases the most diverse range of styles, including photographic landscapes, macro-focused animal images, diagrams, and illustrations. Adobe Firefly consistently adopts the style of educational posters for children, highlighting diverse wildlife and flora.
The comparison with images generated in 2024 shows that prompting the same word one year later reveals substantial changes in colors, subjects, and styles across all models, except for Midjourney, which remains consistent in color, layout, and details, while also introducing slight variations compared to 2023. DALL·E's mysterious atmosphere from 2023 has given way to brighter and more colorful daytime jungle scenes with a detailed and colorful illustration style. While in 2023 Stable Diffusion showcased diverse styles, in the 2024 dataset all images share the same style, consisting of simple scientific botanical drawings presented in 2D. Adobe Firefly shifted to a highly detailed, ad-like style in 2024, with fewer animals and a focus on fruits. Overall, Adobe Firefly has undergone a more substantial style change than DALL·E and Stable Diffusion, while Midjourney remains the most consistent. Interestingly, despite certain models changing significantly in style over the course of a year, they have not become more similar to one another. Instead, each has retained a distinctive style.
This finding partially challenges the concept of AI homogenization, a process in which the outputs of various AI image generators converge toward a shared visual language, with AI-generated images becoming increasingly similar (Nyce, 2024). Such homogenization is exacerbated by a feedback loop in which AI-generated images are re-incorporated into training datasets (Gibney, 2024), potentially reinforcing and amplifying specific styles. While this phenomenon requires further systematic exploration, potentially using larger datasets, our analysis of a small subset focused on biodiversity suggests that these models have maintained their distinctive styles over time without converging toward a homogeneous aesthetic.
In the example above, comparative prompting is employed to investigate the differences and similarities among various models tasked with generating images for the same prompt, including examining, longitudinally, how the representations of a concept, and the models’ “house styles,” might evolve over time. Other potential applications of comparative prompting include exploring the dynamics of controversial issues by prompting competing terms (see next section).
Comparative prompting of (controversial) issues
Another approach to comparative prompting involves using opposing sentences or terms to evaluate differences and similarities in their visual representation. One way into this strategy is to think about prompts as “side-taking,” building on research on query design, in which keywords are understood as aligning with or opposing particular positions. In this approach, keywords are considered as parts of “programmes or anti-programmes,” where “programmes refer to efforts made at putting forward and promoting a particular proposal, campaign or project,” and “anti-programmes oppose these efforts or projects through keywords” (Rogers, 2017: 8). Similarly, prompting competing sentences or terms enables the exploration of contrasting outputs, highlighting underlying assumptions embedded in visual generative AI.
For example, when working with social media analysis, one can use competing terms to collect data for the comparative analysis of an issue, for instance by comparing content posted with the hashtags #guncontrol and #gunownership (Niederer and Colombo, 2023). The two datasets enable the comparison of the kinds of content that animate two opposing online spaces around the same issue, but named differently. Similarly, one can prompt the two competing terms [gun control] and [gun ownership] to observe which cultural and symbolic references an AI model will generate, for instance a scale and peace symbols such as a dove or an olive branch for gun control, and the Bill of Rights on a rugged wooden table with someone holding a gun for gun ownership (Figure 10).

Images generated with ChatGPT-4o to illustrate the topics of [gun control] (left) and [gun ownership] (right). The two images were generated in January 2025 by asking the conversational chat interface to “generate an image of gun control” and “generate an image of gun ownership.” Source: Images generated by the authors with ChatGPT-4o.
Evocative prompting for training set analysis
Evocative prompting entails formulating a prompt that includes non-existing evocative terms or invites the model to exaggerate its output. Evocative prompting builds on the tendency of generative AI models to always return an output. For example, ChatGPT has been dubbed a “mansplaining machine: often wrong and yet always certain” (Harrison Dupré, 2023). With evocative prompts, one can gain a sense of the training sets or even audit the mechanisms of visual generation. The underlying logic is that, in the visual generation process, made-up words with common morphological features trigger consistent associations with real concepts (Millière, 2022).
Prompting made-up words
One notable example of a crowd-sourced evocative prompt was [Crungus], a made-up word that consistently generated similar visuals when prompted repeatedly. Using the otherwise unknown term in DALL·E mini, a free, lightweight version of OpenAI's tool, repeatedly produced images of a “snarling, naked, ogre-like figure” (Bridle, 2023). Since the word had no historical visual reference, it was assumed to be entirely a creation of the machine's imagination.
In another experiment (Millière, 2022), prompting different models with made-up words resembling regional location names in various languages produces images linked to the features of these places. “Woldenbüchel” generates scenes resembling typical views of a German or Austrian village, while “Valtorigiano” evokes images of an old Italian town (Figure 11). It is also worth noting that when these behaviors are made public, whether through news reports or academic papers, model developers often rush to patch and “fix” the issues, making it difficult for researchers to replicate findings over time. For example, when attempting to replicate the aforementioned case, we find that ChatGPT is unable to “find any references” to the Italian-sounding village “Valtorigiano” and therefore does not generate an image immediately.

Images generated by DALL·E 2 with the prompts [woldenbüchel] (upper row) and [valtorigiano] (lower row). Examples reported as they appear in Millière (2022).
“Make it more” prompting
Another approach to evocative prompting is to ask the machine to exaggerate its output in order to observe how it develops a particular visual concept. This method is inspired by the “make it more” trend (Know Your Meme, 2023), a collective experiment in which users repeatedly prompt AI generation models to intensify specific attributes or qualities in an image. Each iteration pushes the AI to exaggerate particular visual characteristics, creating increasingly surreal or extreme representations. In this approach, the model is compelled to reveal its underlying logic by showing which elements it considers representative of a concept, making them more evident with each iteration. For example, a progressively “great” America may be depicted through repeated iconic buildings, an eagle, flags, and eventually an aggressive, armed Uncle Sam riding a large eagle (Know Your Meme, 2023).
The “make it more” evocative prompting approach can be applied to different concepts, such as biodiversity (Figure 12). When tasked with creating an image of [biodiversity], ChatGPT tends to respond to the instruction to “make it more biodiverse” by increasing visual density and the abundance of species. The resulting images show how ChatGPT's conception of “more biodiversity” is rendered as sheer visual abundance, with large mammals and colorful birds dominating the foreground, often placed in impossible proximity to one another. The “make it more” method thus suggests that ChatGPT interprets biodiversity primarily as species count rather than as complex ecological relationships, habitat diversity, or the critical but less visually striking microbial and plant communities.

ChatGPT-4o image generation produces increasingly biodiverse images of biodiversity. The four images were generated in January 2025 by asking the conversational chat interface to “generate an image of biodiversity” and subsequently, in the same chat, to “make it more biodiverse” three times. Source: Images generated by the authors with ChatGPT-4o.
The evocative prompting technique can also be used to circumvent content moderation systems and blacklisted prompts (Millière, 2022). Users can design prompts with morphological features similar to filtered terms, triggering the same visual associations without activating safeguards. The next section, dedicated to provocative prompting, explores how prompts can be crafted to test and evaluate the moderation regimes of different generative AI models.
Provocative prompting for content moderation research
Behind the promise of “imagine anything,” AI companies implement guardrails to prevent the generation of certain content, using a range of moderation strategies. These include blocking specific words and rewriting user prompts in the backend to ensure compliance with platform guidelines.
Designed for research on content moderation, provocative prompting entails crafting prompts that test the boundaries of what an AI system is willing or able to generate, providing insights into moderation policies and thresholds. This can entail prompting sensitive keywords to audit how different tools enforce content moderation or compiling a list of prompts, such as thematic keywords or political figures, and inputting them into different models to characterize and compare their content moderation policies, or comparing a single model's output over time. For example, Midjourney blocked “Donald Trump” and “Joe Biden,” and later also “Kamala Harris”, during the 2024 presidential elections, while other political figures were not blocked.
Another domain for inquiry into the moderation of AI models with provocative prompting is violence, as many interfaces block violent terms. Researchers can compile lists of terms to identify which ones trigger flags or even temporary or permanent blocks. It is important to note that prompt based testing of moderation might not provide a complete picture, as prompt based filtering can be bypassed using alternative terms that produce similar outputs. These are referred to as “visual synonyms” (Gavves et al., 2012) and might be used to circumvent filters by prompting for objects or concepts visually similar to those that are filtered, such as substituting ketchup for blood.
Testing words that trigger warnings can be applied over time or across models to evaluate shifts in moderation policies, consistency of enforcement, and potential biases in what content different AI systems deem inappropriate or harmful. In addition, one can take note of the different kinds of responses that models return for violent or otherwise “risky” prompts, ranging from flags and alerts to temporary or permanent bans (Figure 13), akin to what others have studied as “moderate speech,” rhetorical strategies used by Large Language Models (LLMs) to react to harmful prompts inserted by users (de Keulenaar, 2025).

Examples of different models reacting to risky, problematic, or flagged prompts. From the top, Midjourney responds to a flagged prompt, refusing to generate images of political candidates during election season; Midjourney temporarily blocks the account following a risky request; ChatGPT dialogically refuses to generate an image due to content policy restrictions. Source: Authors’ screenshots.
Reverse-engineered prompting for machine critique
Aimed at machine critique, this approach involves deducing the latent logic and aesthetic preferences of an AI model by asking it to craft a textual prompt from an input image. Feeding the same image to different models and comparing the suggested prompts can help test whether a model is skewed toward a particular kind of terminology. Do different models each speak a slightly different language? In some sense, this approach can also be described as ekphrastic prompting (Verdicchio, 2024), alluding to the literary practice of describing artwork through text.
In our experiments on how the concept of biodiversity is portrayed by visual generative AI, we asked two dominant generative AI tools, Midjourney and ChatGPT, to produce text describing a composite of images generated by Midjourney for the prompt [biodiversity]. The style and content of each set of five images were manually described by human researchers, as well as by ChatGPT and Midjourney (Figure 14). Comparing the descriptions generated by the two models with those produced by the researchers offers a glimpse into what each model prioritizes in terms of content and style.

Composite image as described by Midjourney: “5 designs of beautiful landscapes with birds, flowers and butterflies, fantasy forest, magical creatures, dark background, hyper realistic, hyper detailed, in the style of hans holbein the younger. --ar 128:93”; by ChatGPT: “Enchanted forest of birds and butterflies: A mesmerizing collage of vibrant birds and butterflies set against a lush, fantastical forest backdrop teeming with diverse flora and ethereal light”; and by a group of researchers: “Vision of Paradise: This dataset contains images in the style of classical nature and botanical art and all feature abundant birds and creatures with wings and luxuriant vegetation.”
While Midjourney focuses on visual references (Holbein the Younger), visual adjectives (dark, realistic, detailed), and image proportions, with “–ar 128:93” as Midjourney code for specifying the width to height ratio of the image, ChatGPT uses more discursive and evocative, not strictly visual, language (enchanted, mesmerizing, ethereal). Humans were the only ones interpreting this composite as a “dataset” and referring to “creatures with wings,” as many of the creatures depicted in the images were neither birds nor butterflies. Figure 14: Composite image as described by Midjourney: “5 designs of beautiful landscapes with birds, flowers and butterflies, fantasy forest, magical creatures, dark background, hyper realistic, hyper detailed, in the style of Hans Holbein the Younger. –ar 128:93”; by ChatGPT: “Enchanted forest of birds and butterflies: A mesmerizing collage of vibrant birds and butterflies set against a lush, fantastical forest backdrop teeming with diverse flora and ethereal light”; and by a group of researchers: “Vision of Paradise: This dataset contains images in the style of classical nature and botanical art and all feature abundant birds and creatures with wings and luxuriant vegetation.”
While reverse-engineered prompting can be a one-step process, feeding one image into the model and asking for a textual description, it can also be developed into an iterative process. Inspired by the “broken telephone game,” this approach involves generating an image, asking the model to describe it, and then using that new description to generate another image. Repeating this cycle several times and then comparing the initial and final images can reveal biases and patterns in AI image description and generation, for example, a tendency to progressively Westernize results regardless of the starting point, or to shift from political to religious and cultural contexts across iterations (Bastos dos Santos, 2024).
Using a similar iterative image-to-text-to-image approach with GPT-4o, others have found that ChatGPT systematically neutralizes sensitive or controversial content, “transforming tension into atmosphere, conflict into harmony, and controversy into an aestheticized mood” (Pilipets and Geboers, 2025).
Reverse-engineered prompting repurposes AI models’ ability to generate textual descriptions from images. In the following section, we explore deeper collaborations with AI in the research process.
AI as a research assistant: Using AI to analyze AI-generated images
Despite the non-deterministic nature of LLMs, their tendency to “lack self-awareness of their inconsistencies” (Jacomy and Borra, 2024), models such as ChatGPT have been employed as assistant researchers in numerous studies, particularly for text classification tasks (Gilardi et al., 2023). LLMs have been used by researchers to automatically tag social media posts’ alignment on charged topics, such as the war in Ukraine (Rogers and Zhang, 2024), identify hateful, offensive, and toxic language (Li et al., 2024), and perform summarization tasks with notable success (Pu et al., 2023; Zhang et al., 2024).
There is growing interest in assessing LLMs’ capabilities in image analysis (Johnson et al., 2023), particularly for tasks such as diagnosing medical conditions from images (see, e.g., Tian et al., 2024). Although less explored, promising studies have examined the use of LLMs for image analysis in the social sciences, including interpreting political imagery (Wang, 2024), evaluating urban aesthetics through street view images (Malekzadeh et al., 2025), and early applications in media forensics, such as detecting deepfakes (Jia et al., 2024).
In our study on biodiversity, we leveraged ChatGPT for detailed visual analyses of species represented in AI-generated images, employing the “prompt perturbation and iteration” approach (Mishra et al., 2025). This method involves refining natural language prompts iteratively to improve model outputs, including modifying phrasing or providing examples. Drawing from Törnberg's (2023) recommendations, we established a “research persona” for ChatGPT, tasking it with acting as a biodiversity expert.
To conduct the analysis, we uploaded eight image composites representing [biodiversity] generated by Midjourney, DALL·E, Stable Diffusion, and Adobe Firefly for 2023 and 2024. For each composite, the model was prompted to identify species within the images. ChatGPT generated categorized lists, dividing species into groups such as birds, insects, mammals, plants, and ecosystems. This categorization was initiated by the model without specific prompting, which we adopted for subsequent analyses.
The process required iterative refinements to address errors in species classification. For example, frogs were initially miscategorized as mammals. Once this was corrected, we asked ChatGPT to create a structured table for each AI model, listing the identified species and elements under their respective categories. As a final step, we explored ChatGPT's potential as an information designer by asking it to assign emojis to each species in the table (Figure 15). This required further iterations to correct errors and maintain consistency in the table structure. The resulting data were transferred to Google Sheets for layout adjustments.

Emoji table created with ChatGPT representing the species identified across all image sets, by model and by year. The process required several iterative refinements and human oversight to correct errors in species identification.
An important aspect of this collaboration involved prompting ChatGPT to reflect on both the biodiversity representations within the image sets and the analytical process itself. ChatGPT identified notable gaps in these collections, including the underrepresentation of microorganisms, small mammals, amphibians, and reptiles. It also highlighted the relative absence of certain habitats, such as wetlands and grasslands, compared to more visually striking ecosystems such as forests. Furthermore, it noted that ecological dynamics, such as species behaviors including foraging or group interactions, were minimally depicted. These observations underscore potential biases in AI-generated imagery, reflecting both the limitations of training data and the aesthetic preferences embedded within generative models.
This experimental collaboration demonstrates how LLMs, in this case GPT-4o, can support visual analysis, particularly for researchers unfamiliar with computer vision algorithms and their outputs. Using natural language prompts enables quick content recognition in small image datasets. However, to study larger image datasets, researchers might still want to rely on dedicated computer vision algorithms, more specifically trained on the types of images in their datasets. When asked to take on the role of an information designer, ChatGPT helped generate an emoji-based visualization, though this simplification introduced inaccuracies.
Additionally, ChatGPT misidentified hybrid or fictional creatures, reflecting its characterization as a “mansplaining machine, often wrong, yet always certain” (Harrison Dupré, 2023) and reinforcing the need for expert verification. Despite these limitations, one of the strengths of this approach is ChatGPT's capacity to remember and reflect on the entire conversation. This self-reflectivity allowed it to point out gaps in the collection, identifying underrepresented habitats and species and missing interspecies behaviors. While ChatGPT cannot replace human expertise, whether as a biodiversity expert or an information designer, this iterative prompting approach highlights both its potential and the necessity of careful oversight when using generative AI to support visual analysis.
Conclusions
This demo has explored prompt design as a research-oriented approach to studying visual generative AI and to experimenting with an LLM as a visual research assistant. By drawing on digital methods and query design, we have demonstrated a set of techniques for conducting research with and about visual generative AI, namely ambiguous prompting, comparative prompting, evocative prompting, provocative prompting, and reverse-engineered prompting.
In conclusion, we highlight several reflections and implications arising from this work, including methodological considerations, ethical responsibilities, and the potential for researchers to move from auditing AI to actively shaping its behaviors.
Studying models alongside their outputs
An important consideration when working with generative AI is the model itself: different systems, and different versions of the same system, produce distinct outputs, making it necessary to be explicit about which model is being used, when, and how. Tools such as Prompt Compass (Borra and Plique, 2024) can support cross-model comparisons, yet the rapid pace of model updates underscores the importance of documenting research conditions carefully if we want to study these models as dynamic cultural objects rather than static tools.
Attending to extractive practices
Generative AI systems not only generate images but also reproduce extractive logics. They demand vast computational resources and depend on unremunerated creative labor, disproportionately externalizing costs to the Global South while serving “elite users” in the Global North (Klein and D’Ignazio, 2024). Depending on one's research question, it may be worthwhile to consider using existing or limited datasets or to work with smaller models. 3 In the biodiversity case running through this demo, we decided to work with small datasets for our comparative analysis rather than render hundreds of images in a big data approach.
Moreover, the use of creative works as training data without consent has sparked lawsuits and protests, highlighting the need for alternatives. Data feminist approaches call for consent-based practices that aim “to build more relational, inclusive, and liberating systems, and to reject them if they are not” (Klein and D’Ignazio, 2024: 108).
Prompt design can help make these extractive dynamics visible while also experimenting with less extractive ways of working with generative AI, for example by engaging with forms of “small AI” grounded in transparently documented datasets, lower energy and more locally hostable infrastructures, and participatory forms of governance that keep AI systems accountable to the specific communities, cultural contexts, and environments in which they operate (Niederer and De Gaetano, 2026).
Studying AI beyond academia
Prompt design does not need to remain confined to academic settings. Expert involvement and public participation can transform it into a tool for dialogue about how AI shapes collective imaginaries. Involving biodiversity experts can enrich analysis with domain-specific knowledge, making visible ecological inaccuracies or blind spots that may otherwise go unnoticed. At the same time, public-facing events that invite critique of AI-generated imagery can reveal alternative cultural associations and ethical concerns, while also raising broader AI literacy. Rather than keeping prompt design within specialist communities, bringing it into public and interdisciplinary arenas underscores its potential as a method for shared inquiry into the role of AI in visual culture.
Such participation should not be limited to interpreting outputs but should also inform how systems are evaluated and governed, including which datasets are used, what forms of knowledge are prioritized, and which risks or uses are considered acceptable, potentially leading to the development of smaller alternative models.
From auditing to collaborating
Prompt design also signals a shift in how researchers engage with AI. Earlier work largely focused on auditing bias; more recently, researchers have begun experimenting with AI as a collaborator. In our study, large language models assisted in the categorization and close reading of visual outputs, but their contributions required oversight.
Prompt design, therefore, moves beyond bias detection to enable systematic and interventionist forms of machine critique. By varying prompts, comparing outputs, and tracing stereotypes across models and time, researchers can reveal how AI constructs visual meaning, structures knowledge, and embeds ideological biases. Yet these same practices also shape model outputs, especially in systems that retain memory. In this space between auditing and influencing, prompt design foregrounds the double role of researchers as both analysts and participants. This dual role introduces epistemological responsibilities, including transparency about methods, reflexivity about positionality, and recognition that research itself becomes an intervention.
Seen in this light, prompt design opens possibilities for critical intervention. Researchers can use iterative and conversational prompting not only to study but also to counter or reimagine the model's behaviors. Emerging work demonstrates how prompt design can be used to address representational gaps, for instance by foregrounding overlooked species (Livio and Sánchez-Querubín, 2025) or queering datasets and systems to imagine alternative ecologies (Tran et al., 2024). In this sense, prompt design is not only a method for AI critique but also a means of intervention in the cultural and ecological imaginaries that generative AI produces.
Footnotes
Acknowledgements
We would like to thank the participants of the project “Prompting for biodiversity: visual research with generative AI,” which we co-facilitated with Maud Borie at the 2023 Digital Methods Summer School in Amsterdam: Piyush Aggarwal, Bastian August, Meret Baumgartner, Tal Cohen, Sunny Dhillon, Alissa Drieduite, Xiaohua He, Julia Jasińska, Shaan Kanodia, Soumya Khedkar, Fangqing Lu, Helena Movchan, Janna Joceli Omena, Jasmin Shahbazi, Bethany Warner, Xiaoyue Yan. We also thank the participants of the second edition of the project, which we facilitated at the 2024 Digital Methods Summer School in Amsterdam: Catherine Bouko, Elif Bozkurt, Matilde Ficozzi, Nicoletta Guglielmelli, Mark Mets.
Funding
The authors disclosed receipt of the following financial support forthe research, authorship, and/or publication of this article: This research has received funding from the Project “National Biodiversity Future Center - NBFC” funded under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4 - Call for tender No. 3138 of 16 December 2021, rectified by Decree n. 3175 of 18 December 2021 of the Italian Ministry of University and Research, funded by the European Union – NextGenerationEU; Project code CN_00000033, Concession Decree No. 1034 of 17 June 2022 adopted by the Italian Ministry of University and Research, CUP D43C22001250001.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
