Abstract
Generative artificial intelligence (AI) is poised to transform how brands communicate with consumers. Recent research demonstrates AI's benefits in producing text, but marketing research has not yet explored how marketers can leverage AI to create visual advertising. Despite their impressive capabilities, “off-the-shelf” generative AI models are not aligned with marketing objectives, raising the question of whether it is possible to fine-tune generative AI directly on conventional advertising objectives (e.g., evoking attention, driving interest). In this research, the authors train an open-source generative AI model on marketing mindset metrics and show that the resulting visual content can match and even exceed conventionally produced advertising content in associated performance metrics. The results demonstrate that generative AI can be fine-tuned on multiple communication objectives simultaneously and adapted to specific audiences. In addition to highlighting generative AI's potential in marketing, this article explores the limitations of aligning visual generative AI with marketing objectives.
Visual content is a critical element of marketing communications. Indeed, images play a central role in online advertising, which accounts for more than 60% of advertising expenditure (Voicebooking 2022). According to industry estimates, more than $300 billion is invested globally every year into visual online advertising (Cramer-Flood 2023), with several multi-billion-dollar industries, such as advertising agencies ($380 billion) and market research companies ($80 billion), involved in ad production and testing (Statista 2024).
Against this backdrop, interest in generative AI's potential for efficiency gains is growing. Large-scale AI models such as Midjourney (Oppenlaender 2022), Stable Diffusion (Rombach, Blattmann, and Lorenz 2022), and Adobe Firefly can generate convincing high-quality images, suggesting that the costs of producing visual marketing content could drop precipitously. Marketers such as Coca-Cola (2023), Heinz (Campaigns of the World 2022), and Amazon (Amazon Ads n.d.) have explored the potential of visual generative AI. While the capabilities of these models are impressive, they were not designed with marketers’ objectives in mind (Dall’Olio and Vakratsas 2023; Keller and Lehmann 2006). For example, when prompted to “generate an effective advertising image,” Google DeepMind's Imagen 2 cautioned that doing so would be “impossible because it's a subjective concept that wouldn’t translate well into a visual representation” (Google 2024).
While clear links can be established between image objects like ketchup bottles and the corresponding text prompts, various visual cues can be linked to mindset metrics, such as attention or desire, that marketers often study for visual communication. What works in one product category and for one target market may not work in other contexts, making suitable marketing content much less generic than the image objects on which generative AI is typically trained. For example, while sports gear ads are often more effective when they showcase wide camera angles to signify action, the fashion industry relies on close-ups to convey quality, and automobiles benefit from highlighting desirable usage experiences. Given the complexity that arises from the combination of industry context and desired brand image, the extent to which generative AI can be trained to create effective marketing content is unclear. Brands’ communication objectives and market positions vary. Although well-established brands can leverage brand familiarity (Keller 1993), less established ones might require stronger visual communication. Brands that appear in diverse contexts and those with high levels of differentiation might seek elements of surprise that can be more challenging to create without creative human input.
The goals of this research are (1) to demonstrate empirically the opportunities that generative AI affords marketing researchers and practitioners to produce visual content that is aligned with marketing objectives and (2) to identify generative AI's potential limitations of which marketers should be aware. Through a series of studies, we investigate the following:
Whether generative AI can be trained to perform comparably with conventional advertising in terms of marketing objectives. How generative AI can be trained to produce effective visual advertising content. When AI-generated content is expected to be most effective for marketing objectives.
First, we develop and empirically demonstrate the potential to incorporate consumer feedback to produce effective marketing communication with generative AI. Second, we examine the efficacy of several training and prompting techniques. Last, we assess factors that may mitigate AI-produced content's performance in achieving marketing objectives.
We demonstrate visual generative AI's capabilities with respect to marketing objectives by aligning it with consumers’ perceptions. Using the automotive industry as a primary empirical context, we collect and analyze publicly accessible online car advertisements. We find that many AI-generated ads produced through our workflow score better on advertising mindset metrics than conventionally produced ads that appeared online. In exploring alternative approaches to applying generative AI, we highlight the importance of fine-tuning and incorporating consumer feedback. In addition, we identify and discuss boundary conditions to the use of visual generative AI. We also find stable hyperparameter settings across products, markets, and training material, suggesting the potential for standardization and automation.
The remainder of this article is structured as follows. We review related literature on advertising evaluation and computer vision in marketing. We then describe how we fine-tune generative AI to align with conventional mindset metrics. Next, we use the resulting model in a series of studies to show generative AI's potential to drive AIDA mindset metrics (attention, interest, desire, and activation), integrate brand-image-building, and cater to targeted segments. Though we predominantly measure the impact of visual communication on consumers’ minds, we also evaluate whether AIDA training generalizes to click-through rates (CTRs) as a behavioral performance measure. We conclude by exploring boundary conditions that affect the performance of AI-generated visuals, including brand familiarity, the product category (durables vs. consumables), and the differentiation of the brand's positioning.
Our findings suggest that a new era of visual advertising communication has begun. Marketers can funnel consumer feedback information directly into ad creation. They can create essentially unlimited variants of advertisements aligned with consumer preferences, thus enabling novel business models with more versions of ads that can support personalization and combat advertising fatigue at lower costs. We conclude with implications for marketers and advertisers and suggestions for applications of generative AI in marketing science.
Related Literature
Developing and Pretesting Visual Communication
Conventional advertising production involves engaging consumers’ minds (Chaudhuri and Holbrook 2001; Costello, Walker, and Reczek 2023; Malär et al. 2011; Park et al. 2010), building brand personality (e.g., Aaker 1997, 1999; Dew, Ansari, and Toubia 2022; Dzyabura and Peres 2021; Liu, Dzyabura, and Mizik 2020), and driving measurable performance outcomes, particularly in digital environments (e.g., Hartmann et al. 2021). Crafting compelling visual ads involves a range of complex activities, from concept development and photo shoots to CGI modeling, design, and image processing. These processes incur significant costs. For example, producing simple ads using stock material ranges from U.S. $2,000 to $4,000, 1 while multiple days of professional ad production with a dedicated photographer, traveling expenses, and licensed imagery might come at an additional price tag of up to $50,000. 2
Given these investments, pretesting ad creatives is a critical success factor (e.g., Rutz, Sonnier, and Trusov 2017), allowing advertisers to refine key messaging elements to ensure that the creatives resonate with target audiences. Conventional pretesting involves establishing communication objectives related to consumers’ hearts and minds (Malär et al. 2011; Park et al. 2010), mostly guided by market research to target audiences (Geuens and De Pelsmacker 2017). Hierarchy-of-effects models (Smith, Chen, and Yang 2008) such as AIDA outline consumer processing stages and are often assessed via surveys (e.g., Park et al. 2010). Mindset metrics are especially valuable in visual advertising because they capture intermediate effects, such as awareness, attitudes, and engagement, which are foundational for achieving broader marketing goals (Mizik and Jacobson 2008).
Mindset-based pretests allow advertisers to develop a nuanced understanding of consumer responses (Pieters and Wedel 2004). Moreover, recent research also highlights the relevance of more rigorous control of standardized survey-based measures instead of more noisy field data that is obtained with little control (Lewis, Rao, and Reiley 2015). Longitudinal studies have established the relevance of mindset metrics for diverse key performance indicators (KPIs) like consumers’ acquisition and retention decisions (Stahl et al. 2012), word of mouth (Lovett, Peres, and Shachar 2013), employees’ motivation and retention (Moorman, Sorescu, and Tavassoli 2024), and corporate financial performance (Mizik and Jacobson 2008).
Beyond advertising's impact on purchase funnel objectives, brands use advertising to cater to specific target groups by creating brand perceptions. Aaker (1997) identifies distinct dimensions of a brand's personality including “sincerity,” “excitement,” “competence,” and “ruggedness.” This has proven useful to marketers, as consumers prefer brands that are consistent with their self-image (Aaker 1999; Liu, Dzyabura, and Mizik 2020). Recent research continues to apply brand personality dimensions to understand how consumers perceive a brand (Dzyabura and Peres 2021) and to recommend the visual content that will yield desired perceptions (Dew, Ansari, and Toubia 2022). Given the importance of conveying a brand's personality, the capability of generative AI to evoke such perceptions through advertising must be addressed, as it is unclear whether multiple communication objectives might conflict with each other.
CTRs have become a prominent measure to assess the field performance of digital advertising (e.g., Hoban and Bucklin 2015; Wang, Xiong, and Yang 2019) in part because they are readily available from advertising networks and represent actual consumer behavior tied to individual advertisements. While clicks directly relate to advertising expenses, their link to ad performance is less precise—not all clicks result in conversions, and ads without clicks may still contribute to brand building (e.g., Lewis, Rao, and Reiley 2015)—making them a useful complement to other KPIs but poor performance measures on their own (Lambrecht and Tucker 2024).
Analogous to conventional ad crafting, we integrate pretesting into a generative framework of marketing objectives to yield high-performing visual content (Rutz, Sonnier, and Trusov 2017; Srinivasan, Vanhuele, and Pauwels 2010; Smith, Chen, and Yang 2008).
Applications of Computer Vision in Marketing Research
Image classification applications in marketing suggest that it is possible to train deep-learning models to predict marketing outcomes from visual information (e.g., Dzyabura et al. 2023; Liu, Dzyabura, and Mizik 2020; Zhang and Luo 2022). These applications were built on developments in computer science, most prominently convolutional neural networks (e.g., Dzyabura and Peres 2021; Hartmann et al. 2021) and (variational) autoencoders (V)AEs (Dew, Ansari, and Toubia 2022). Such models have been trained on large image databases with associated tags of visible objects (e.g., chairs, tables, cats, dogs, people) that enable classification models to “see” what is visible. Dzyabura et al. (2023) extract objective features like color distribution, texture, pattern, and shape, as well as product representation, from product photos. Li and Xie (2020) extract and evaluate features of social media images like colorfulness, the presence of a human face, and the image's source. Other research fine-tunes existing models and applies them to marketing contexts, such as Shi et al. (2021), who classify fashion-show images to identify textures and dress shapes.
Whereas the aforementioned applications focus on objective image features, other research proposes or fine-tunes models that extract consumer perceptions. Dew, Ansari, and Toubia (2022) extract visual attributes from logos that can be related to consumer brand associations. Researchers can also fine-tune existing models to relate image content to stated purchase intent (Hartmann et al. 2021) or perceived brand personality (Liu, Dzyabura, and Mizik 2020). This line of research suggests that computer vision can capture aspects of subjective perceptions, which relates to the types of mindset objectives that we examine.
Visual generative AI builds on these image classification models by improving images iteratively, testing text prompts with corresponding output. The work that is most closely related to our research is Burnap, Hauser, and Timoshenko's (2023) model, which combines VAE and generative adversarial networks to generate car design concepts based on consumer preferences, a technology similar to that discussed in Dew, Ansari, and Toubia (2022). While these approaches work particularly well for a single visual stimulus that is relatively homogeneous (e.g., cars, faces, logos), they were not tested to meaningfully combine features from a large list of candidates (e.g., representing a product in the right context to achieve desired advertising objectives).
In contrast, transformer-based visual generative AI models, such as Stable Diffusion, are designed to generate complex visuals from text prompts (Rombach, Blattmann, and Lorenz 2022). These models are pretrained on labeled images, enabling the assemblage of any objects that are included in the training data. Marketers using off-the-shelf generative AI need to discern which prompts to enter and how to assess the marketing potential of the output, requiring experience, intuition, and trial and error. Consequently, the marginal costs of each advertising image remain significant. Recent research explores generative AI prompting without model fine-tuning for advertising applications (e.g., Hartmann, Exner, and Domdey 2025; Kapoor and Madhav 2025; Zamudio, Grigsby, and Michelsen 2025). This approach, however, precludes novel products from being prompted and may result in the inclusion of unwanted artifacts from competitors. Such ads may also fail to align with consumer communication objectives. Ruiz et al. (2023) suggests a fine-tuning procedure to incorporate novel image objects that the AI model has not “seen” before. Daviet and Nishimura (2024) show that fine-tuning can improve the representation of advertising landscape backgrounds. Whether fine-tuning can be used to train perceptual communication objectives and create entire images is an open question that we investigate in this research.
To the best of our knowledge, researchers have not used fine-tuning to train generative AI directly on marketing mindset metrics to ensure that visual content will appeal to consumers. In this research, we empirically demonstrate a replicable and practical workflow that creates advertising content designed to align with a specified marketing objective through the incorporation of consumer feedback into generative AI. Table 1 summarizes how this research relates to extant marketing and computer science research.
Related Research on Visual Content Models.
Leveraging Visual Generative AI for Advertising
To explain how we fine-tune visual generative AI, we first summarize how standard generative diffusion models (GDMs) turn text prompts into images and how novel visual concepts and objects can be added. We then clarify how we translate these concepts to advertising objectives. Specifically, we create custom prompts to automatically align advertising visuals—without without trial-and-error prompting or creative human input.
Generative Diffusion Models and Model Fine-Tuning
GDMs like Stable Diffusion learn to map textual prompts (e.g., “dog”) to corresponding visual representations of objects (e.g., an image of a dog) by training on vast text–image datasets. This enables GDMs to generate images with infinite combinations of prompted objects. GDMs function similarly to an artist sketching a picture from a blank canvas. They start with a noisy image and iteratively refine it into a coherent visual representation of any input prompt (e.g., “a dog sitting on a car hood”). During pretraining, GDMs are exposed to images with varying degrees of noise, ranging from complete noise to none, enabling them to iteratively denoise images into coherent representations of objects (Rombach, Blattmann, and Lorenz 2022).
In our empirical setup, we rely on the pretrained Stable Diffusion 2.1 model, 3 which is trained on an aesthetically curated subset of the LAION-5B dataset (Beaumont 2022) to ensure aesthetically pleasing output. This makes Stable Diffusion particularly useful for advertising applications where aesthetic quality plays a role.
While Stable Diffusion is capable of producing a vast range and combination of objects and settings in aesthetically pleasing ways, it is not optimized to achieve specific advertising objectives. In practice, generating tailored advertising content requires careful prompt engineering and iterative refinement, which is time-consuming and costly. To enhance Stable Diffusion's capability, we build on Ruiz et al.'s (2023) fine-tuning techniques, training on new and unique text–image pairs (e.g., the name and images of a specific dog breed) while maintaining coherence of preexisting visual knowledge of the corresponding object class (e.g., dogs). To prevent language drift (i.e., Stable Diffusion forgetting what dogs in general look like), unique, semantically unrelated text identifiers need to be connected to new image content. Fine-tuning can also augment the style capabilities of Stable Diffusion, such as allowing images to be created in the style of new or unknown artists.
Stable Diffusion allows for independent fine-tuning of objects and stylistic attributes. This means that core visual features remain unchanged when different stylistic adaptations are applied. This is particularly valuable in advertising, where brands seek to balance creative differentiation (e.g., building a unique brand personality) and customer engagement (e.g., driving short-term AIDA purchase funnel). Independent fine-tuning might allow for brand differentiation without sacrificing engagement objectives within a single advertising image.
Fine-tuning success is measured by comparing generated images with actual input images based on a loss function that balances reconstruction loss (i.e., accurate object recreation) and class-specific perseveration loss (i.e., maintaining the integrity of category-related visual patterns of the base model) (see Ruiz et al. [2023] for details). Hyperparameters are chosen to achieve the best possible results. These include the “learning rate” (LR), which determines the step size to update the model weights; the “LR scheduler”, which dynamically adjusts the learning rate during training to optimize convergence; the initial “warm-up steps” with gradual learning rate increase; and the “training steps” (i.e., the total number of optimization iterations). Optimizing hyperparameters ensures accurate recreation of visual content while avoiding overfitting (e.g., limitations in flexibly positioning content across contexts and viewing angles; Ruiz et al. 2023). While too few training iterations can make representations inaccurate, too many can result in overfitting and poor representations across contextual settings.
Actual image generation involves additional hyperparameters, including the “diffusion sampler”, which controls the process of image generation; the number of “inference steps” associated with the number of iterations that generate the final image; and the “classifier free guidance” (CFG) scale, which determines how closely the generative model follows the text prompt. These values are typically chosen based on visual inspection of the output (Diab et al. 2022). In our applications, we seek to minimize image distortions and hallucinations.
A Generative AI Workflow for Ad Creation
Ad agencies design visuals to communicate effectively with consumers. Similar in spirit to how Reisenbichler et al. (2022) produce text, we train visual generative AI on high-performing content to identify the visual elements linked to successful marketing outcomes and incorporate these elements into newly created visuals. Figure 1 displays our proposed four-step generative AI workflow that creates unique text identifiers associated with successful advertising patterns, allowing marketers to prompt advertising objectives.

Schematic of the Proposed Generative Workflow.
In Step I, we collect three types of image input related to (1) the visual appearance of the product of interest, (2) the visual elements associated with funnel performance, and (3) the visual elements associated with the desired brand image or brand personality dimensions. To ensure an accurate representation of the product of interest, the model is trained on product-specific images (1). Note that this training is restricted to images of only the product. In our empirical applications, we use product shots from multiple angles with backgrounds removed. This is to ensure that other items that appear in the training images are not misidentified as being part of the product of interest. We associated the unique product identifier with the class prompt of the product category (e.g., “car”).
To identify desirable visual patterns (2), we collect online banner ads. To find ads for the relevant product category, we use the You Only Look Once (YOLO) object detector (Jocher, Chaurasia, and Qiu 2023). As fine-tuning visual generative AI based on images with text can create hallucinations and unwanted or distorted content, we use Keras-OCR (Faustomorales 2023) to detect text in the images and CV2 inpainting (Senyaev et al. 2023) to remove it. To capture brand personality traits (3), we follow Liu, Dzyabura, and Mizik (2020) and gather visuals from Flickr that are associated with each trait.
To assess consumer engagement and funnel performance, we measure how visual communication resonates with the target audience in Step II. We collect seven-point Likert scale ratings across all AIDA funnel phases (see the “Empirical Analyses” section for details) because such measures can be collected at low cost for any application. If available, other target metrics (e.g., word of mouth/virality, ad clicks, aided/unaided ad recall) could be employed. To ensure diversity in visuals, we gather the broadest possible range of competitive advertisements available to us. Survey ratings for all ads allow us to identify the style of high-performing visual content that we use as the basis for fine-tuning Stable Diffusion.
In contrast to Ruiz et al. (2023), we do not limit training to specific image objects, but extend it to training more abstract advertising objectives. We consider advertisers interested in purchase funnel objectives, with additional possible interest in brand image building. This requires fine-tuning the model on all three types of text–image pairs mentioned previously (see Point B in Figure 2). To ensure that the novel unique text identifiers remain semantically unrelated to the base model, we use random letter combinations as identifiers. In the interest of clarity, we replace these with descriptive identifiers (indicated by […] throughout the manuscript).

Ad-Performance-Guided Fine-Tuning of Generative Diffusion Models.
In direct application of Ruiz et al. (2023), we pair a unique text identifier with the isolated visuals of the focal product (e.g., the Polestar 3 vehicle) referring to the product category “car” class descriptor of the Stable Diffusion base model. That leads Stable Diffusion to train on what the focal product (a Polestar 3 vehicle) looks like, while preserving and drawing from typical visual representations of the class “car.” This allows us to depict the vehicle across contexts and viewing angles, but it does not guarantee alignment with funnel objectives.
To train the model to produce effective advertising, we fine-tune on high-performing competitive advertisements based on AIDA mindset metrics. The best-rated visuals are paired with unique identifiers for each objective (e.g., [high attention]). We assign the class descriptor “web banner ad” to ensure alignment with typical banner content. Creating a high-AIDA ad requires the model to (1) identify useful image object(s) and (2) combine and represent them. As conventional fine-tuning is limited to a correct representation of a single new object, it is not a priori clear that fine-tuning more abstract concepts like ad success is possible. It likely requires a different number of training images and hyperparameter settings that might deviate from conventional fine-tuning settings (Troncoso and Luo 2022).
Even if high funnel performance can be attained in this way, there is no guarantee that ad visuals convey the desired brand image. We address this by fine-tuning a unique brand personality by leveraging Stable Diffusion's style framework. This ensures that brand personality styles can be combined with any funnel metric (e.g., [rugged] associated with the descriptor “image style” and [high attention] with the class descriptor “web banner ad”).
Based on this fine-tuning, we can combine the three unique identifiers in step IV to generate ads. The generative prompt structure follows: c = {a [high AIDA] ‘web banner ad’ of a [Polestar 3] ‘car’ in a [rugged] ‘image style’}, where […] indicates the unique identifier and ‘…’ represents the class and style descriptors of the Stable Diffusion base model. Each trained concept can be selectively included or excluded, allowing for the generation of visuals that maintain product accuracy, as well as integrate funnel optimization and brand personality. To further refine the output, we conduct hyperparameter optimization, adjusting model settings to minimize hallucinations and ensure visually coherent high-quality ad creatives (Diab et al. 2022; Rombach, Blattmann, and Lorenz 2022) (see Web Appendix A).
This workflow can create as many ad variants as desired without creative human prompting or other human visual work. While collecting and rating training images has initial setup costs, the marginal cost of creating additional ad variants is low. Importantly, unlike conventional ad production, the workflow does not require expert input or market intuition. Contemporary ad images can be collected from prominent ad networks and uploaded into online surveys to collect mindset ratings and extract the best-rated ads to fine-tune Stable Diffusion and generate visuals. While consumer input is essential, all of these steps can be automated, particularly if coherent hyperparameter settings are effective across applications.
Empirical Analyses
We conduct a series of studies to explore visual generative AI's advertising performance. Studies 1–3 assess whether generative AI ads perform comparably to conventionally produced ads, whereas Study 4 considers alternative training and prompting strategies. Study 5 tests potential boundary conditions in terms of brand familiarity, while Study 6 tests another product category. In Study 7, we evaluate the role of product differentiation on the performance of generated ads. Table 2 provides an overview of the studies and their findings.
Study Overview.
Study 1: Generative AI and Performance-Related Advertising Goals
Study 1 investigates whether AI-generated ads can compete with conventionally produced online display ads in terms of performance on mindset metrics throughout the AIDA funnel. We study automotive ads because cars have functional (e.g., commuting), emotional (e.g., driving experience), and social (e.g., self-expression) brand value (Aaker 1996) to which advertising can appeal. Thus, we could test whether generative AI can produce competitive ads when multiple benefits may play a role.
Method
To obtain training material, we collect 211,429 publicly available online advertisements. Based on YOLO object detection we identify 543 automotive ads that have been paid for and displayed online. To identify the top-rated visuals, we mirror traditional advertising pretesting and construct a survey that measures consumers’ responses on seven-point scales for each AIDA funnel phase: attention (“This advertisement would stand out in comparison to other advertisements”), interest (“I find the product in this advertising interesting”), desire (“I like the product in this advertisement”), and activation (“If I were in the market for a car right now, I would consider buying the car in this advertisement”). We recruit participants from Prolific and administer the survey to 572 potential car buyers, as evidenced by ownership of driver's licenses (46.2% female, 53.5% male, .3% preferred not to state; agerange = 25–73 years). Each participant rates ten randomly selected ads on all four AIDA phases, ensuring at least five ratings per ad.
We chose the Polestar 3 car as the object of our advertisements because it caters to a broad market. During this study, the Polestar 3 had been recently introduced and was not included in the standard Stable Diffusion model. Thus, generative AI could not simply reproduce an existing ad. For this investigation, we collect product images from the Polestar website with different viewing angles. We remove all background information to fine-tune the model exclusively on the Polestar 3 design. Since these images are nonstandardized, 23 images were needed to ensure correct vehicle representation. Following Ruiz et al. (2023), we refer to the “car” class prompt when fine-tuning.
We train five models, one for each of the four AIDA funnel phases and one for the average across all AIDA phases to determine whether training on individual phases improves performance in a specific phase. We find that 30 training images per abstract concept suffices. This is more than the 3–5 images recommended by Ruiz et al. (2023), which is likely due to the complexity of more abstract marketing objectives. For model training, we use the 30 highest-rated images of the 543 car ads (see Web Appendix A for further details).
Using random search over the parameter space, we tune the generative AI's hyperparameters “LR scheduler” [Polynomial, Constant], the “learning rate” [1e-6, …, 1e-1], the “learning rate warm-up steps” [0, …, 200], and “training steps” [1,000, …, 10,000] for model fine-tuning. Our best results deviate from the typical settings for conventional image objects. While conventional recommendations include a constant learning rate scheduler, with a learning rate of 5e-6, zero warm-up steps, and 1,000 training steps, we find that a polynomial scheduler with a learning rate of 1e-6, paired with 100 warm-up steps and 4,000 training steps, better balances reconstruction loss and class-specific prior preservation loss for our purposes. Similarly, for generating output visuals, we search over “diffusion samplers” [DDIM Scheduler, PSLM Solver, DPM Solver], “inference steps” [20, …, 150], and “CFG scales” [5, …, 15] for image denoising. While the standard hyperparameter settings are set to the DDIM Scheduler, 50 inference steps, and a CFG scale of 7, we find that the “DPM Solver” diffusion sampler, 100 inference steps, and a CFG scale of 7.5 result in the most coherent visual output, balancing detail and image quality while minimizing distortions and hallucinations (see Web Appendix A for full detail).
We generate 10 display ads for Polestar 3 from each of the five models, resulting in 50 ads. We do not edit the images or add any other contextual prompts, allowing us to study the performance of visual content produced without human input. While the online ads used for training undoubtedly involve some degree of human creativity and might also have been selected based on consumer feedback or observed performance, the visuals we produce are a direct output of the proposed AI workflow. To gauge the AI-generated ads’ performance, we benchmark them against 499 randomly selected actual ads from competitors and the 10 Polestar 3 ads we found online (see illustrations in Figure 3 and Web Appendix B). We recruit 569 potential car buyers from Prolific (49% female, 51% male; agerange = 25–73 years; all had a driver’s license), and randomly assign 10 ads to each participant, from which they provide four AIDA funnel phase ratings (>20,000 ratings in total).

Illustrations of Conventional and AI-Generated Ads.
For each of the four individual AIDA phase AI models (e.g., attention), we use the prompt c = {a [high individual AIDA phase] ‘web banner ad’ of a [Polestar 3] ‘car’}. We also use the average AIDA score based on all four measures and the custom prompt c = {a [high AIDA] ‘web banner ad’ of a [Polestar 3] ‘car’} to generate images associated with high performance across the different stages of the funnel. The ratings of generated ads across the funnel phases are similar, irrespective of which of the five models generates the images (p-values between .231 and .980; see Web Appendix C for further detail). A Cronbach's α of .89 among the ratings of the four funnel phases on all ads suggests that this is driven by participants’ correlated responses (see Web Appendix D for correlations). Thus, for the remainder of our studies, we pool all 50 AI-generated ads from the five models and compare these images’ average AIDA scores with those of the conventionally produced ads.
Results
The performance of conventionally produced display ads varies widely. The best-scoring of these ads receives an average AIDA rating of 5.57 on a seven-point scale, and the lowest rating is 1.88, resulting in an average rating of 3.79. This average is more than .75 scale points lower than that of the AI-generated ads (M = 4.55; t = 9.33, p < .001). Only 3 of the 50 AI-generated ads performed below the average conventional ad (3.08, 3.61, 3.71 vs. 3.79), while all other generated ads perform better, suggesting that selecting an AI-generated ad at random is likely to score at least as well as conventionally produced ads.
Figure 4 shows the distribution of the 543 ads from competitors, 10 actual Polestar ads, and 50 AI-generated ads. The best-performing conventionally produced ad is approximately one scale point more successful than the average AI-generated ad (Mhighest market competitor = 5.57 vs. MAI trained on AIDA = 4.55, t = 13.64, p < .001), suggesting that human creativity can outperform randomly selected AI-generated content. However, the best AI-generated ad performs even better (M = 6.00). The average rating of the conventional Polestar ads (M = 3.80) is almost identical to the average of all market competitors (M = 3.79, t = .05, p = .963). However, the Polestar ads perform worse than the AI-generated ads (MAI trained on AIDA = 4.55, t = 4.16, p = .001), and none of the Polestar ads outperform the average of the AI-generated ads (Mhighest actual Polestar = 4.53 vs. MAI trained on AIDA = 4.55, t = .27, p = .791).

Average AIDA Ratings of Conventional and AI-Generated Ads.
Discussion
These results yield several noteworthy conclusions. Comparing AI-generated ads with a large selection of conventionally produced ads reveals that AI-generated ads systematically outperform conventionally produced ads. While we deliberately did not use any Polestar marketing content for fine-tuning in order to make meaningful comparisons between original Polestar and AI-generated ads, one could ensure coherent visual communication by fine-tuning on brand advertisements (see Web Appendix B for illustrations). In this application, we find only a few exceptions of AI-generated ads performing worse than conventionally produced ads that were used as training material, suggesting that it is possible to produce reasonably performing ads at a substantially lower marginal cost.
The competitor ads feature brands and vehicle types other than the focal brand Polestar 3, which makes it difficult to disentangle the effects of the brand and body types from the effects of ad generation. We note that the original Polestar advertisements perform similarly to the average of competition, suggesting an average value of the brand and body style. To test whether possible brand or car preferences play a role, we run a regression analysis with the average AIDA funnel performance as the dependent variable and the type of ad production (all conventionally produced ads vs. all AI-generated) as a binary independent variable, controlling for 58 brands and 10 car body types as fixed effects. The results in terms of average AIDA performance are substantively and statistically similar with a negative coefficient for conventional ads (β = −.88, p = .001), suggesting that our findings are not driven by brand or body style preferences.
Another potential explanation for the observed performance differences is that AI-generated ads benefit from superior aesthetic quality due to being trained on images with high aesthetic scores. AIDA performance may hinge on high aesthetic appeal, and conventional production may not succeed in achieving similar aesthetics. 4 To investigate this, we compute the LAION aesthetic quality scores (values range from 1 to 10) (Schuhmann 2022) to compare AI-generated ads with conventional ads. If AIDA performance is exclusively driven by aesthetics, then the highest aesthetics ads would have the highest AIDA scores. This is not the case. In aesthetic scores, the 30 highest aesthetic conventional ads (Mhigh aesthetic = 6.40) surpass both the 30 ads used for model training (Mtraining set = 5.17, t = 6.96, p < .001) and, more importantly, the 50 AI-generated ads (MAI-generated = 5.62, t = 9.40, p < .001). However, the 30 conventional ads with the highest aesthetic rating score lower in terms of AIDA metrics than both the training set and the AI-generated set (Mhigh aesthetic = 4.14 vs. MAI-generated = 4.55, t = 2.87, p = .006; Mtraining set = 4.75, t = 3.86, p < .001). This suggests that aesthetic quality alone does not explain the superior performance of the AI-generated ads. 5
While the performance of AI-generated ads is impressive, this study has its limitations. Since we collect ads from the internet, some were distributed earlier than others. The generative model may have picked up more recent advertising trends, so these AI-generated ads may appear more contemporary to respondents. However, more than 90% of the conventional ads first appeared only three years before our model training. For this short time frame, the correlation between time of appearance and average ad ratings is low (r = .09), making timing an unlikely driver of our results. Moreover, if results are heavily driven by recency alone, then we would expect the best-performing (more recent) conventional ads to have comparable or superior performance to the best-performing AI-generated ad.
While generative AI is likely to achieve the best results for the communication objectives on which it is fine-tuned, the performance might generalize to related KPIs. To investigate this, we test the virality of ads rated on a seven-point scale (Akpinar and Berger 2017). We test the 10 AI-generated ads based on the full AIDA fine-tuning against the 10 conventional Polestar ads and a subset of 155 randomly selected competitive ads on an independent sample of 127 Prolific participants (39.4% female, 57.5% male, 3.1% preferred not to state; agerange = 25–72 years; all with a driver's license) using the same setup as in the main study. On average, respondents report a higher likelihood of sharing the AI-generated ads compared with the scale midpoint (4.84; t = 4.43, p < .001). For conventional competitors’ ads, respondents report a lower likelihood (3.98) that is well below the AI-generated ads (t = 4.42, p < .001). The reported likelihood of sharing conventional Polestar ads is also lower than AI-generated ads (4.29; t = 1.98, p = .049).
Furthermore, we have not yet considered brand image objectives, which are important in visual communications for brand differentiation and targeting specific market segments. While diffusion models can combine complex prompts, this technology has been developed for objective content (i.e., multiple objects in an image). Multiple abstract concepts, such as perceptual brand image dimensions, could prove more challenging to train. This could deteriorate the performance of the AIDA funnel. We investigate this capability next.
Studies 2a and 2b: Brand Image Building and Targeting with Generative AI—Balancing Multiple Advertising Goals
We build on Liu, Dzyabura, and Mizik (2020) to investigate brand image objectives and study two brand personalities that feature opposing associations. We extend Study 1 by adding the personality dimensions of “ruggedness” and “luxury” (Aaker 1997). We select these dimensions because the Polestar 3 can be positioned as an off-road-capable vehicle or a luxury car with comfortable suspension. To reinforce these associations while preserving AIDA performance, we train the AI on distinct visual languages for each brand personality.
Study 2a: Branding and generative AI
In terms of AIDA and Polestar 3 product representation, we follow Study 1. In particular, we again search the hyperparameter space for the best configuration and find no relevant improvements from a configuration that is coherent with the previous study. Specifically, we add 24 randomly selected Flickr images tagged with “ruggedness” and 24 images tagged with “luxuriousness” to the fine-tuning. Since we now train three concepts (Polestar 3, AIDA, ruggedness/luxury) instead of two as in Study 1, we increase training steps proportionally from 4,000 to 6,000, keeping everything else constant (see Web Appendix A).
Using the prompting language specified previously, the prompt for Model 1 is c = {a [high AIDA] ‘web banner ad’ of a [Polestar 3] ‘car’ in a [rugged] ‘image style’}. The prompt for Model 2 replaces [rugged] with [luxury]. We generate 10 ads from both models. We compare these AI-generated ads with a random selection of 155 actual competitor ads (i.e., automotive ads from the online advertising), the 10 actual Polestar ads from Study 1, and 10 newly generated ads by the model from Study 1 exclusively trained on highest average AIDA mindset metric with the prompt c = {a [high AIDA] ‘web banner ad’ of a [Polestar 3] ‘car’}. See Figure 5 for illustrations and Web Appendix E for all generated ads.

Illustrations of AI-Generated Ads Including Brand Dimensions.
We recruit 245 study participants from Prolific (42.9% female, 56.7% male, .4% prefer not to state; agerange = 25–64 years; all with a driver's license) to complete an online survey. Participants rate 10 randomly chosen ads on seven-point Likert scales for whether they perceive the AI-generated ads as conveying “ruggedness” (“This advertisement looks rugged to me”) or “luxury” (“This advertisement looks luxurious to me”), as well as the purchase funnel (AIDA) metrics from the previous study.
Tuning the model on “ruggedness” indeed results in visuals that the participants rate highest on that dimension (4.91, differing from the scale midpoint of 4; t = 6.15, p < .001), as shown in Figure 6 (see also Web Appendix F). While we do not know the branding objectives of the conventional ads we found online, 4.91 is higher than the average of all actual competitor ads (M = 3.71; t = 7.13, p < .001) and higher than the average of the actual Polestar ads (M = 4.09; t = 3.01, p = .008). It is also more than one scale point higher than a model that was trained solely on mindset metrics without inducing brand personalities (M = 3.53; t = 4.07, p = .001) and differs even more from a model that was trained on “luxury” (M = 3.11; t = 8.50, p < .001).

Average Brand Personality Rating of Conventional and AI-Generated Ads.
Training a model on the personality dimension of luxury results in higher perceptions of luxury (M = 5.44) than either actual Polestar (M = 4.50; t = 5.05, p < .001) or competitor (M = 3.99, t = 8.21, p < .001) ads, and marginally higher than AI-generated ads without brand image training (M = 4.96, t = 1.90, p = .074). A model trained on “rugged” results in the lowest perceptions of luxury (M = 3.70; t = 7.10, p < .001). These results suggest that perceptions of “luxury” and “rugged” may be (at least partially) at odds, as also evidenced by negative correlations of r = −.45 of these ratings (see Web Appendix G). However, both abstract concepts can be trained effectively in Stable Diffusion.
While these results demonstrate the ability to use generative AI to create content that evokes specific brand personality dimensions, whether doing so hampers performance on mindset metrics (AIDA) remains to be determined. Both branding and engagement may be joint goals of advertising. As Figure 7 shows, infusing brand personalities into AI-generated visual content does not need to compromise performance on mindset metrics. As in Study 1, we observe similar results of the ads generated by the model that is trained on high-engagement outcomes, which the participants rated more favorably in terms of AIDA mindset ratings than they did for ads from market competitors (MAI trained on AIDA = 4.79 vs. Mmarket competitors = 3.70, t = 5.49, p < .001) or Polestar ads (M = 4.30, t = 2.39, p = .034).

Average AIDA Rating of Conventional and AI-Generated Ads.
More importantly, the ads generated by the model that is trained on AIDA and ruggedness continues to outperform the actual ads from competitors (MAI trained on AIDA + rugged = 4.88 vs. Mmarket competitors = 3.70, t = 7.27, p < .001) and from Polestar (M = 4.30, t = 3.40, p = .004). A model that is trained on AIDA and luxury also outperforms competitors (MAI trained on AIDA + luxury = 4.57 vs. Mmarket competitors = 3.70, t = 5.36, p < .001) but is only directionally superior to the actual Polestar ads (MAI trained on AIDA + luxury = 4.57 vs. Mactual Polestar = 4.30, t = 1.55, p = .144). It is worth noting that luxury perceptions correlate more strongly with AIDA ratings than ruggedness perceptions (see Web Appendix G). Perhaps more people find luxury desirable in cars than ruggedness, and Polestar is already relatively strongly associated with luxury. Despite this, it is possible to train associations that are less typical for the category while maintaining AIDA performance.
Study 2a demonstrates that it is possible to train AI to produce visual content for ads that convey a chosen brand personality dimension and still perform at least as well as conventionally produced ads in terms of mindset metrics. However, Study 2a is limited to the average market. That is why we next examine if generative AI can be used to create ad variants that target specific customer segments.
Study 2b: Targeting consumer segments with ads created by generative AI
Marketing research contains a rich stream of targeting and positioning literature (Morgan et al. 2019). While already a prominent topic in the literature (Fennel 1978), targeting in practice has kept pace with consumers’ increasing technology use and the resulting possibilities of fine-grained targeting, including in online advertising (Schumann, Wangenheim, and Groene 2014). However, applications have so far focused mainly on selecting which ad to show to each target group, as manually creating ads for every group remains costly. Generative AI could support targeting more nuanced segments without incurring incremental production costs. For example, the Polestar brand could be interested in strengthening perceptions of the brand among consumers with an interest in ruggedness. Such consumers may be observed to visit particular websites, allowing for their identification, and ads highlighting ruggedness would appear as more contextually relevant on such websites.
To investigate whether generative AI can support such targeting, we use five randomly selected AI-generated ads from the model trained on AIDA metrics and ruggedness used in Study 2a, and five AI-generated ads from Study 2a that were generated from the model trained exclusively on AIDA metrics (and the Polestar 3 vehicle). We recruit 402 online survey participants from Prolific (53.5% female, 45.5% male, 1% preferred not to state; agerange = 25–74 years; all with a driver's license) and ask about their brand attitudes (“I like the product in this advertisement”) on a seven-point scale as dependent variable. To avoid concerns about priming, we ask participants to report their perceived importance of ruggedness (“Please rate how desirable ruggedness is for you when you buy a car”), along with three other car-related aspects (innovativeness, luxury, and reliability), on a seven-point scale that ranges from 1 (“extremely undesirable”) to 7 (“extremely desirable”), after they provided brand attitude ratings. 6
On average and for the total sample, we do not find a difference between the mean brand attitudes of participants who saw ads that were generated to evoke ruggedness (M = 4.95) and participants who saw ads generated without an added brand image (M = 4.84, t = .87, p = .387). This replicates Study 2a (i.e., it is possible to add brand image training without impeding generative AI's ability to engage the average market). Table 3 reports regression results with brand attitude as the dependent variable and the desire for ruggedness, generative AI visuals with ruggedness training (1) vs. without ruggedness training (0), and their two-way interaction as independent variables (F(3, 398) = 25.58, p < .001, R2 = .16). To aid interpretation, we rescale ruggedness desire to range from 0 to 6. Notably, we observe a two-way interaction between fine-tuning ruggedness and the desire for ruggedness, with individuals who desire ruggedness responding more favorably to ads generated from the model fine-tuned on ruggedness (.69, p < .001; see Web Appendix H for a visual representation of the interaction and crossover pattern).
Effect of Generative AI Training and Consumer Preferences on Brand Attitude.
According to these results, visual generative AI can be used for targeting by evoking desirable brand personality dimensions. While we observe a neutral response of the average market, individuals opposed to the personality dimension respond more favorably to generative AI ads without additional brand personality training. Therefore, it is critical to target the right audience on the right distribution channels when adding brand personality training. Based on consumer preference information, brands can deliver tailored ads to the consumer segment for which the ads are most relevant—without incurring the production costs of conventional visual ads.
Study 3: Behavioral Indicators of Consumer Interest
Studies 1, 2a, and 2b use standard advertising pretests to evaluate ad performance (e.g., Batra and Keller 2016; Vakratsas and Ambler 1999), allowing us to control for factors like placement context, prior brand exposure, and self-selection. However, the survey measures used in these studies do not shed light on how the AI-generated ads will perform in the field. In Study 3, we test whether training on one performance metric (AIDA mindset metrics) generalizes to another metric (CTRs).
Method
We utilize Meta's advertising platform for this study, leveraging its A/B testing feature that allows comparison of five ad variants. 7 We select five actual Polestar ads used in the previous studies and use five generated ads geared toward AIDA metrics and luxury from Study 2a. We chose a combination of AIDA and brand image training since this represents what most practical applications will be interested in. We chose the luxury association because it best matches the actual positioning of the brand. Since text captions are required for all ads, we add the same caption to all 10 ads (“Uncover the ideal vehicle for your journey”). Participants who clicked on the ads were redirected to a car comparison website containing relevant content that we set up for this study (see Web Appendix I for further information). Given the platform's limitation of five ads per experiment in A/B tests, we structure our investigation as two concurrent A/B tests. The first test includes three AI-generated ads geared toward AIDA and luxury and two actual Polestar ads, while the second test includes two AI-generated ads and three actual Polestar ads.
We ensure consistency by targeting a U.S. audience aged 25–65 years old with an interest in the automotive industry. To maintain comparability across experiments, we use a cost-per-click bidding setup with a cost cap of $.80 per click and the ad objective “traffic.” The experiments ran from February 1 to February 4, 2025, with a daily budget of $5 per ad condition, a random letter combination for each ad, and variance in the order of ad uploads. Inspecting the sociodemographic data that Meta provides, we do not detect relevant differences between any of the 10 ads (pgender = .946, page = .121, see Web Appendix I).
Results and discussion
Within the first experiment, a total of 15,306 impressions and 213 clicks were recorded. The AI-generated ads achieve a mean CTR of 1.63%, which is within the range of industry reports (Bailyn 2024). For the Meta A/B test, compared with the AI-generated ads, we find a lower average click rate of only 1.02% for all actual ads from Polestar (χ2 = 9.22, p = .002). The second experiment produces a similar pattern, with a total of 13,469 impressions and 283 clicks. The AI-generated ads have a mean CTR of 2.72%, whereas the conventional Polestar ads achieve a mean CTR of 1.64% (χ2 = 17.37, p < .001). For the combined dataset of 28,775 impressions and 496 clicks—and across both experiments—the AI-generated ads result in a CTR of 2.04%, while the CTR of the conventional Polestar is only 1.37%, representing a 48.9% increase in CTR for AI-generated ads (χ2 = 18.09, p < .001). Web Appendix I provides an overview of all impressions, clicks, and CTRs for each individual ad.
We acknowledge that we have less control in these field studies compared with the experiments presented in our previous studies. Unlike the prior studies, we lack full control over the actual ad distribution choices of Meta's A/B test. While individuals see only one of the five ads within each A/B test, it is possible that individuals could be exposed to ads from both A/B tests. From a potential audience of 43.5 million accounts for the broad target group, we reached only 28,775, and the expected overlap between our two A/B tests is 4.76 exposures (.017% of the total exposures), 8 which would be insufficient to affect our conclusions. Nonetheless, it is not clear that these results will generalize beyond the Meta platform. While Meta is an important channel, other ad networks may have different audience targeting, which can impact ad effectiveness (Braun and Schwartz 2025). We therefore run a study on another platform (Taboola) including all 10 actual Polestar ads and the 10 AI-generated ads from Study 2a trained on AIDA and luxury. While we observe slightly lower CTR values, the overall pattern favors the AI-generated ads, with an average CTR of 1.41% for AI-generated ads compared with .64% for the actual ads (χ2 = 121.65, p < .001; see Web Appendix J). Despite the consistent superior performance of generative AI, we note that effect sizes for observed CTRs can vary across applications, and we urge readers to interpret our CTR results in combination with other metrics.
Taken together, Studies 1, 2a, 2b, and 3 demonstrate that generative AI can produce effective ads in terms of AIDA mindset metrics, brand attitudes, and CTRs. We employ one approach of using AI to generate effective ads, but simpler procedures might achieve similar (or better) results. While Study 1 sheds light on the role of the aesthetic capabilities of Stable Diffusion, the value of fine-tuning (i.e., performance relative to simply prompting the Stable Diffusion base model without fine-tuning) remains an open question. We next investigate whether the aesthetic capabilities of the standard model might suffice for successful advertising. We also assess alternative AI-based approaches to identify the most critical component(s) of our proposed process.
Study 4: Alternative Approaches to Applying Generative AI
In Study 4, we compare three alternative approaches that (1) do not leverage prior ad images for training, (2) train using randomly chosen ads, rather than identifying high-performing ads, and (3) include both high-performing and low-performing ads to train the generative AI. We compare this with the models used in Study 1 and Study 2b, trained on AIDA alone, AIDA + ruggedness, and AIDA + luxury. This allows us to better understand the role of fine-tuning by (a) omitting all fine-tuning and (b) fine-tuning on typical advertisements without making use of ad performance data. This is also practically meaningful since (a) is readily available to any advertiser and (b) can save significant costs by avoiding collecting mindset metrics. Since both (a) and (b) include the same aesthetic capabilities as our proposed fine-tuning, we can gain more insights whether the aesthetic capabilities of Stable Diffusion itself suffice to create effective ads.
Method
We compare alternative approaches with our proposed workflow, referred to as “trained on AIDA,” with the model “(High) AIDA” being trained using the same images as in Study 1 and the models “(High) AIDA + rugged” and “(High) AIDA + luxury” incorporating training on images associated with brand personalities using the same hyperparameter settings as in Study 2a. We employ the 10 high-AIDA-performing images and 10 images each that include the rugged or luxury brand style from Study 2a.
The first alternative we consider does not involve model training on high-AIDA web banner ads and brand personalities, and instead relies on descriptive prompting of the standard Stable Diffusion model. The only fine-tuning performed is on product images of the Polestar 3 vehicle to ensure that the focal product of the ad is reproduced accurately. For this conventional object fine-tuning we follow Ruiz et al. (2023) in all aspects of hyperparameter tuning (see Web Appendix A). Relying on the generative model's base training, we descriptively prompt performance in the AIDA funnel and brand personality infusion following a prompt structure as c = {an engaging web banner ad of a [Polestar 3] ‘car’} and include brand personalities by extending the descriptive prompt with “in a rugged image style” or “in a luxury image style.” In this condition, the generative model is not fine-tuned with a brand personality or with conventionally produced ads. Instead, it relies on the model's off-the-shelf understanding of these concepts. For each prompt, we generate 10 images.
In a slightly more advanced approach, again following Ruiz et al.'s (2023) suggestions, we imitate the visual language of the industry by training a random set of ads from the 543 market competitor ads and Polestar 3 vehicle visuals, referred to as “trained on random ads.” In contrast to Studies 1, 2a and 2b, this does not require gathering consumers’ ad evaluations, but does result in the model being trained on the visual advertising language of automotive ads. With this model, we generate 10 images using a prompt structure as c = {a [random AIDA] ‘web banner ad’ of a [Polestar 3] ‘car’} where [random AIDA] is a custom prompt trained on randomly selected ads (see Web Appendix A for details). As in the previous condition, we use the model's base understanding of luxury and rugged to generate 10 images by descriptively prompting for the individual brand personalities by extending the prompt with “in a rugged image style” or “in a luxury image style.”
While the approaches we described previously are simpler to execute than our proposed workflow, another option is to incorporate additional information to refine the generative process. One could train the AI to learn the visual language of both high-performing and low-performing ads from the category. That is, in addition to prompting the generative AI on the visual language of an effective car ad, we could also use negative prompting to train it not to produce visuals in the style of ineffective car ads. 9 For this purpose, we select the 30 ads with lowest average AIDA rating from the 543 market competitor ads. To incorporate training on ineffective ads, we augment our proposed workflow with a negative prompt of c = {a [low AIDA] ‘web banner ad’}, where [low AIDA] is a custom prompt trained on the poorly performing ads based on study participant ratings. We refer to this model as “trained on high and low AIDA.” To make these models comparable to the proposed approach, we also train two models including the concepts luxury and rugged as before and generate 10 images with each model with the same prompt structure.
In summary, we compare a total of 30 generated images (10 without brand personality, 10 infused with a rugged brand personality, and 10 infused with a luxury brand personality) from each of four generative AI models: (1) the proposed workflow with fine-tuning on high-performing ads, (2) a model relying only on descriptive prompting of the standard generative model without fine-tuning on conventionally produced ads and brand dimensions, (3) fine-tuning with randomly chosen ads, and (4) fine-tuning on high- and low-performing ads (see Web Appendix K for visuals). To assess performance, we further study the 10 actual Polestar ads that we included in the previous studies and 155 randomly selected ads from competitors.
We measure the average AIDA performance across funnel stages and brand perceptions for all images using the same survey procedure as described in Study 2a, with 359 participants from Prolific that we randomly assign to 10 images each (45.1% female, 53.5% male, 1.4% preferred not to state; agerange = 25–74 years; all with a driver's license). This also ensures that the same sample participants see all versions of all generative AI ads.
Results and discussion
The column “Average AIDA Rating” in Table 4 compares the average mindset metric of AIDA funnel performance of our proposed approach with market competitors, actual Polestar ads, and the alternative AI models. None of the alternative approaches that we consider results in higher average ad mindset ratings than the actual Polestar ads (Mactual Polestar = 4.35). More importantly, none of the alternative approaches produces AI-generated ads with mindset metric ratings that are as high as the ads produced by our proposed framework that fine-tunes based on high-performing ad images (M = 4.61, all p < .050). This suggests that fine-tuning on ad performance is critical and that merely relying on the aesthetic capabilities of base generative diffusion models is insufficient to yield visual content that performs well for specific marketing objectives.
Performance of Alternative Generative AI Models and Conventional Ads.
Notes: Superscripts a–e indicate differences from the respective groups in the first column in a two-sample t-test at p < .05. Underlined cells highlight the values that are expected to be highest.
Specifically, prompting the standard generative model performs about .62 scale points worse (Mprompting engagement = 3.99 vs. MAI trained on AIDA = 4.61, t = 5.11, p < .001). Interestingly, training on randomly selected ads performs even worse, with a mean of 3.82 (t = 5.89, p < .001 compared with our proposed model). This type of training might lead Stable Diffusion to mimic the performance of the average market competitors of 3.48. It seems insufficient to simply learn the visual language of the product category; rather, it is the visual language of high-impact ads in the category that is necessary to produce quality ads. Omitting any training altogether, simple prompting of the standard Stable Diffusion model performs en par with training on random ads, suggesting such training ads little value (MAI trained on random ads = 3.82 vs. Mprompting engagement = 3.99; t = 1.32, p = .203).
The ads generated by the model that is trained on the high- and low-performing ads perform similarly to the actual Polestar ads (MAI trained on high and low AIDA = 4.21 vs. Mactual Polestar = 4.35; t = .69, p = .501). Incorporating the additional information associated with the low-performing ads, however, does not achieve the performance of our proposed workflow (M = 4.61; t = 2.51, p = .023). A possible explanation for this result is that certain visual elements are shared between high- and low-performing ads. While some feature combinations may correlate with high AIDA performance, other patterns may appear in both high- and low-performing ads. Training the model to emulate high-performing ads while avoiding low-performing ads with both of them associated with the same class definition “web banner ad” may introduce contradictory signals. This might make it difficult for the model to determine which visual features drive performance, ultimately leading to poorer performance.
These observations shed further light on the role of aesthetic quality. If the aesthetic capabilities of the standard Stable Diffusion were the main driver of AIDA ratings, then prompting the model would generate ads with higher AIDA scores compared with conventionally produced Polestar ads. To investigate this, Table 4 lists average aesthetic quality based on the LAION Aesthetic scores (Schuhmann 2022), as we had computed in Study 1. Though alternative prompting strategies produce ads with higher aesthetic quality than actual Polestar ads and competitor ads, they fail to generate ads that yield higher AIDA ratings.
Specifically, prompting engagement, choosing random ads and prompting high- and low-AIDA ads all result in directionally higher aesthetic scores than our proposed procedure based on the highest-rated ads. However, all of these alternatives perform worse in terms of AIDA metrics. These observations suggest that the relative performance of AI-generated ads trained on high-AIDA ads is not exclusively driven by the aesthetic capabilities of the standard Stable Diffusion base model. Rather, training on effective ads appears to improve ad performance above and beyond what can be explained by aesthetic quality.
The Table 4 columns “Rugged Perception” and “Luxury Perception” reveal that our assessments for brand personality yield a similar pattern of results. When we train models on both AIDA performance and each of the brand personality dimensions prompted independently of each other, we find that brand personality is communicated effectively to consumers by our proposed model, as evidenced by differences from the scale's midpoint (MAI trained on AIDA + rugged = 4.65; t = 2.84, p = .019; MAI trained on AIDA + luxury = 5.37; t = 11.09, p < .001). Furthermore, participants rate their perceptions of both dimensions above those of the actual Polestar ads (rugged: t = 3.57, p = .002; luxury: t = 2.62, p = .017) and above conventionally produced competitor ads (rugged: t = 2.76, p = .020; luxury: t = 11.77, p < .001).
Using descriptive prompting and the base model Stable Diffusion without any fine-tuning (e.g., “an engaging web banner ad” and “in a rugged image style” vs. trained concepts like “a [high AIDA] ‘web banner ad’” and “in a [rugged] ‘image style’”) results in associations that are approximately .80 scale points lower than the ones of our proposed workflow trained on high-performing ads and rugged (MAI trained on AIDA + rugged = 4.65 vs. Mprompting engagement + rugged = 3.86; t = 2.60, p = .018) or luxury (MAI trained on AIDA + luxury = 5.37 vs. Mprompting engagement + luxury = 4.50; t = 4.37, p < .001). When using the model that is trained on randomly chosen ads, we also did not train brand image perceptions, and consequently we observe nearly identical associations for both dimensions. According to these results, there is value in training the base generative model on additional images related to the desired brand personality dimensions.
These results further highlight the importance of consumer feedback. Fine-tuning a visual generative AI on well-performing ads and brand personality associations improves both performance on the AIDA mindset metrics and performance in building brand image. Our approach that trains the model on high-performing ads consistently outperforms alternative approaches, including prompting and fine-tuning without consumer feedback.
Next, we probe possible boundary conditions to identify the contexts under which advertisements produced by generative AI may and may not be expected to perform well.
Study 5: The Role of Brand Familiarity
Strong brand equity allows brands to leverage the associations in their existing communications (Keller 1993). While the Polestar 3 is a new vehicle, respondents may have already formed favorable associations with the Polestar brand that make it easier for generative AI to activate existing brand knowledge. To ensure that existing brand equity and familiarity did not contribute to the AI-generated images’ performance, we replicate Study 1 using an electric vehicle brand with which respondents are not familiar. Specifically, we study the Chinese car brand NIO, which was not available in Western markets at the time of the study.
Method
To obtain the AI model training material for the NIO brand, we collect 24 images of NIO’s EL7 vehicle from the NIO website and remove all background image information. We train Stable Diffusion on these 24 NIO EL7 images, along with the same highest-rated competitor ads used previously, following the procedure described in Study 1. We use the same hyperparameter settings as in Study 1, which also achieves the lowest loss value, and follow the same prompt logic to generate 10 NIO ads tuned on mindset metric funnel performance.
To evaluate the resulting visual content, we recruit 229 Prolific participants (51.1% female, 48.5% male, .4% preferred not to state; agerange = 25–73 years; all with a driver's license) to rate 170 conventionally produced online ads, 10 AI-generated ads for the NIO EL7 car, and 5 actual NIO EL7 ads from which we remove all text as before (see Web Appendix L for visuals). We randomly assign each participant to 10 ads for which they answer the same funnel performance (AIDA) questions as before.
Results and discussion
As in all previous studies, we test the average score across all phases of the AIDA funnel because the phases are highly correlated with each other (rrange = .71–.90, p < .001, Cronbach's α = .95). The results for the unknown NIO brand are similar to those of Study 1 for the Polestar brand, with the ads produced using generative AI (M = 4.94) outperforming actual NIO ads (M = 4.02; t = 6.53, p < .001) and ads from market competitors (M = 3.75; t = 9.43, p < .001) on mindset metrics. This also holds when we control for the effects of the 43 brands and 9 car body types with a negative coefficient for conventional ads (β = −.93, p = .019).
The range of ratings for the AI-generated ads (Mrange = 4.33–5.50) indicates that the lowest-rated AI-generated ad's mean rating still outperforms the mean of all conventionally produced ads from competitors (M = 3.75, t = 9.15, p < .001) and the mean of the actual NIO ads (M = 4.02, t = 3.48, p = .025). These results suggest that brand familiarity does not explain the efficacy of the fine-tuned generative AI ads, as ads that were produced for unknown brands also perform better than conventionally produced ads do.
Study 6: Consumables Versus Consumer Durables
Like many durable goods, cars are depicted in a limited number of contextual settings (roads, nature, city, parking), which may facilitate AI's ability to learn the visual language of ads for the product category. Other categories, such as fast-moving consumer goods (FMCG), are often represented in a greater variety of contexts (e.g., soft drinks indoors, outdoors, in artificial CGI settings). For this reason, we test the performance of generative AI in an FMCG category to assess the capabilities for nondurable goods.
Method
To rule out brand familiarity, we select a brand that has not been established in the market where we conduct our study. Specifically, we test the sunscreen brand Bondi Sands. The brand, as well as its market competitors, depicts the product in various contexts (beach, towel, pillow, in hand, CGI background, etc.) (see Web Appendix M for illustrations). To train the AI model, we collect 120 actual sunscreen banner ads. A notable aspect of these ads is that the images display brand logos more prominently than the automotive ads. As we want to control for brand familiarity, we remove all brand-related content from the ads, as well as any text to reduce AI hallucinations. Following Studies 1 and 2, we collect ratings from Prolific participants (130 participants; 57.7% female, 39.2% male, 3.1% preferred not to state; agerange = 25–74 years) and randomly assign each participant to 10 ads for which they answer the same funnel performance (AIDA) questions as before.
Given the high correlations among the four phases of the AIDA funnel (rrange = .45–.58, p < .001, Cronbach's α = .80), we fine-tune the generative AI on the sunscreen ads that have the 30 highest average AIDA scores across all funnel phases, as well as 11 preprocessed images of the Bondi Sands product (with background, text, and brand logo removed) to train the model on the visual appearance of the product. Due to the differences in the product category and the competitive advertising images, we test hyperparameter settings to find the lowest loss value. We again arrive at consistent settings as in Study 1 (see Web Appendix A). Also using the same procedures as in the previous studies, we generate images using the same prompt logic c = {a [high AIDA] ‘web banner ad’ of a [Bondi Sands] ‘sunscreen bottle’} and the same hyperparameters.
We compare these generated ads with the baseline performance of competitors’ conventionally produced ads (i.e., 120 sunscreen banner ads from various brands), as well as 10 conventionally produced ads from Bondi Sands, our focal brand. Illustrative images are shown in Figure 8 (see Web Appendix M for further detail). Ten randomly assigned ad images are rated by an independent sample of 144 Prolific users (56.9% female, 40.3% male, 2.8% preferred not to state; agerange = 25–74 years) on seven-point scales, as before.

Illustrations of Conventional and AI-Generated Sunscreen Ads.
Results and discussion
The fine-tuned generative AI model again outperforms the competitor advertisements, with the average AIDA performance (M = 4.61) exceeding those of the average conventionally produced ads from competitors (M = 3.61, t = 10.25, p < .001) and the Bondi Sands brand (M = 4.24, t = 2.43, p = .028). Since the Bondi Sands brand does not operate in the market where we conduct our study and all brand-related text has been removed, these results are also not likely to have been driven by brand familiarity. 10
The pattern of results is similar to what we observe in the automotive industry. We find that no generated ad performs worse than the average of all conventionally produced online ads of competitors (Mworst generative AI = 4.18 vs. Mmarket competitors = 3.61; t = 10.99, p < .001). Although we purposefully did not use any additional prompts beyond the unique identifiers that we fine-tune, this study suggests that generative AI is capable of learning the effective visual language of mindset performance for different product categories. Moreover, despite the more heterogeneous product contexts in which products may be depicted, generative AI can be trained to produce effective ads using mindset metrics. Fine-tuning involved consistent hyperparameter settings, suggesting little value of manual hyperparameter search for these two product categories.
Thus far, we have investigated mainstream vehicles and sunscreen products with broad market appeal. While we incorporated two brand image dimensions, we have not yet considered generative AI's ability to attract attention to and drive interest in highly differentiated products. Next, we assess generative AI's ability to produce such content.
Study 7: Visual Communication of Differentiated Product Benefits
We use the Smart Fortwo vehicle to investigate the role of differentiation. As one of the smallest two-seater cars available, the Smart Fortwo offers unique benefits that combine city parking abilities and agility. Since the target market is smaller, marketers often integrate elements of surprise to drive attention and arousal (Tellis 2004).
To test the performance of generative AI for more differentiated products, we generate ads for the Smart brand by collecting 12 Smart Fortwo images and removing all background information. We simultaneously train Stable Diffusion on the 12 Smart Fortwo images and the same highest-rated competitor ads from Study 1. We again find the same set of hyperparameters to achieve the lowest loss value and most coherent visuals.
We generate five ads using the same prompt logic as in all previous studies as well as the same hyperparameters. To test the performance of these AI-generated images, we randomly select five actual Polestar 3 ads, five actual Smart Fortwo ads that we collect online, and five randomly selected AI-generated Polestar 3 ads from Study 1. We test these four different types of ads in a 2 (AI-generated vs. actual ads) × 2 (Polestar vs. Smart Fortwo) full factorial experiment with random assignment to either five actual Polestar images, five actual Smart images, five AI-generated Polestar images, or five AI-generated Smart images (see Figure 9 for representative illustrations and Web Appendix O for further detail).

Illustrations of Conventional and AI-Generated Polestar and Smart Ads.
Note that the actual Smart Fortwo ads contain unique creative visual content to attract attention and emphasize the car's distinctive size. Notably, we lack relevant training material for this specific positioning that is unique to the Smart Fortwo. Instead, the generative AI is trained on effective imagery from the automotive industry, captured in fine-tuning the unique text identifier [high AIDA]. Given the car's intended benefits, placing the Smart Fortwo vehicle in imagery that is common and effective for mainstream vehicles may lack the elements of surprise and unusualness that the Smart Fortwo ads leverage.
As in our prior studies, we collect the four AIDA ratings for all ads by recruiting 301 Prolific users (55.5% female, 45.5% male; agerange = 25–74 years; with all participants having a driver's license) and assign groups of 75 to each of the four experimental conditions. We ask participants to rate the product benefits of parking space and city utility on hedonic and utilitarian seven-point scales. As expected, the participants rated the Polestar vehicle as catering to more general hedonic benefits than the Smart Fortwo vehicle, which had more specific utilitarian value. This is evidenced by a compound difference scale that subtracts the utilitarian scores from the hedonic ratings (Mactual Polestar = .05 vs. Mactual Smart = −.34, t = 4.25, p < .001).
Results and discussion
As Figure 10 shows, AI-generated ads for Polestar 3 perform better than conventionally produced Polestar ads in terms of average AIDA mindset ratings (MPolestar AI trained on AIDA = 4.40 vs. Mactual Polestar = 3.99; t = 4.23, p < .001). A different pattern emerges for the Smart Fortwo, where AI-generated ads result in slightly lower AIDA ratings (MSmart AI trained on AIDA = 3.62) than those of the actual Smart ads (Mactual Smart = 3.81; t = 1.85, p = .064). An ANOVA that uses average mindset metric funnel performance as the dependent variable and the two experimental conditions for Smart vs. Polestar as independent variables confirms the apparent interaction (F(1, 1,501) = 18.14, p < .001) from Figure 10. These findings suggest that the benefits of generative AI are better suited to producing content for mass-market products than for products with unique benefits. This would require additional creative input (e.g., more sophisticated prompting, specific imagery to be used for fine-tuning) and could not be automated to the degree our prior studies demonstrated.

AIDA Ratings of Conventional and AI-Generated Ads for Polestar and Smart.
Discussion
Visual content is a mainstay for brand-building and driving sales in today's digital landscape. However, the process of crafting visual content manually is complex, time-consuming and costly. Despite marketers’ interest in using visual generative AI, we know little about its suitability for supporting marketing objectives. Conversations with agencies and brands suggest that this interest is driven largely by the potential for cost savings, raising the question of whether generative AI can create visual content that performs at least as well as conventionally produced content in the eyes of consumers.
In this research, we have examined whether, how, and under what circumstances generative AI can create visual advertising content that is aligned with marketing objectives. Our proposed workflow guides the performance of an open-source AI image generator with the objectives of performing well on mindset metrics and conveying a desired brand personality. Through a series of studies, we demonstrated that generative AI can produce ads that perform at least as well as conventionally produced ads on the metrics it is trained on. In some cases, generative AI ads can perform well beyond its training. We found above-average CTRs despite fine-tuning on AIDA survey data. For other objectives, such as brand image positioning, additional fine-tuning was needed. More generally, advertisers should not expect fine-tuning on one metric to automatically translate to all conceivable other metrics. Generative AI performs as it is instructed and can easily miss the mark when lacking relevant training material.
Our insights are of value to both researchers and practitioners. Generative AI can produce images that evoke various dimensions of brand personality, regardless of consumers’ familiarity with the brand, and can create engaging ads for both durable and nondurable products. Combined with consumer-level data, a generative AI workflow can provide a cost-effective means of creating ad content designed to resonate with the targeted segments.
We demonstrated the importance of consumer feedback in the fine-tuning process. In contrast to conventional ad testing, data needs to be collected only once to create unlimited ad variants that are aligned with performance on desired marketing metrics. This allows marketers to avoid the need for specialized creative skills and to combat ad fatigue with more variation in visual content at a fraction of the cost. Coupled with consistent sets of hyperparameters across products, markets, and training material, this suggests the potential to automate the production of some visual marketing content.
Despite these possibilities, there are salient limitations to the generative AI workflow that we have developed in this research. Fine-tuning generative AI on effective advertisements essentially creates visual averages based on the types of elements that make advertisements successful. Research has found that visual averages can result in fluent processing by consumers, creating a sense of familiarity, trust, and liking (Heitmann et al. 2020). However, fluent visuals often fail to stand out (Landwehr, Wentzel, and Hermann 2013), which can be a desired objective for highly differentiated products.
We investigate this by randomly assigning 998 Prolific participants (54.3% female, 45.5% male, .2% preferred not to state; agerange = 25–75 years) to 20 ads each that we draw from the pool of AI-generated and conventional ads in Study 1. We ask about “the process of studying this advertisement” on a seven-point scale (1 = “very difficult,” and 7 = “very easy”) (Graf, Mayer, and Landwehr 2019). We find that generated ads do indeed feature higher fluency than conventional ones (M = 5.17 vs. 4.80, t = 7.30, p < .001). Ad level fluency scores also correlate with AIDA performance (r = .37, p < .001). This suggests that the AI workflow can create fluency, which has proven important for appeals around familiarity and trust (Landwehr, Wentzel, and Hermann 2013).
Our proposed workflow may struggle, however, to support advertising objectives that require an element of surprise. For example, humor in advertising builds on incongruity with expectations (Eisend 2009). 11 Similar to other elements of surprise, humor can drive attention, establish memorability, and generate word of mouth. Our proposed generative AI workflow may be unable to achieve this because it is predicated on averaging across visual content to effectively learn the category. While this training may be conducive to building fluent ads, the workflow would not learn the (context-dependent) surprising elements that are needed in humorous advertising. We study the dog food category by training generative AI both on the highest-AIDA-performing ads and highest-rated humor ads. While we replicated above-average AIDA performance, we could not create above-average humor ratings or reach the level of humor in the training materials (see Web Appendix P).
In advertising, brand recall and brand recognition are relevant objectives. Human created ads with humorous elements could prove memorable to consumers and might score better than generative AI because elements of humor have proven difficult to recreate with AI. To probe this, we display three dog dental care, three sunscreen, and three car ads (two competitive ads and one ad from the focal brand) in random order to 302 participants. For a random subset of 152 participants, we exchanged the ad of the focal brand in each category with a generative AI ad of the same brand taken from the previous studies. Based on Study 4, we cannot expect above-average recall or recognition performance from generative AI when it is not fine-tuned on recall or recognition. For humorous ads, it is even possible that conventional ads outperform generative AI (lacking similar levels of humor).
For dog dental care, the actual humorous ad of the same brand (Pedigree) has directionally higher recall than the generative AI ad (60.7% vs. 51.3%; χ2 = 2.31, p = .128) and also higher recognition from a list of 15 brands (84.7% vs. 70.4%; χ2 = 8.02, p = .005). The sunscreen and car brands, on the other hand, result in about identical recall (39.0% vs. 37.2%; χ2 = .24, p = .887) and recognition (84.0% vs. 83.9%; χ2 = .08, p = .961) for conventional and AI-generated ads of the same brands. For advertising centered around trust and familiarity, generative AI fine-tuning limited to AIDA therefore creates value by driving important funnel metrics while not inhibiting other objectives such as recall or recognition (see Web Appendix Q).
Collectively, these observations highlight a limitation of our proposed procedure. When elements that deviate from category conventions are essential to the intended objective, generating fluent visual averages with AI becomes less effective. Our approach performs well across applications where such averages foster familiarity and trust, but it is not a panacea. For cases where differentiation is key—whether for a distinctive product or a bold advertising message—alternative approaches are needed. These may involve providing additional training images or crafting creative prompts to elicit consumer responses, followed by fine-tuning as demonstrated in this article. While such methods require greater creative and technical expertise today, future advances may lower the costs of creative and evidence-based customization..
From a practical standpoint, our findings suggest that the creative process of marketing that involves entire industries (e.g., creative advertising agencies, media agencies, advertising pretesting) may undergo fundamental change. Generative AI can increasingly support execution on specific communication objectives, brand associations, and target groups, enabling marketers to produce and adapt far more visual content with minimal creative input or manual effort. While in our application initial market research remains necessary to train and calibrate models, ongoing ad-by-ad testing is less critical as generative systems autonomously create and adjust content at scale. At the same time, generative AI introduces new risks. Across studies, visuals produced by generative AI displayed notable commonalities—such as the frequent appearance of Monstera leaves in sunscreen ads and similar road settings across automotive ads (see Web Appendix M). As more advertisers rely on comparable technologies and optimization objectives, visual content may become increasingly homogeneous, less differentiated, and ultimately less effective (Wenger 2024). A central challenge for marketers will be to balance the efficiency and capabilities that generative AI offers with the creative infusions needed to capture and maintain consumer interest in both the short and long term.
Supplemental Material
sj-pdf-1-jmx-10.1177_00222429251356993 - Supplemental material for Picture Perfect: Engaging Customers with Visual Generative AI
Supplemental material, sj-pdf-1-jmx-10.1177_00222429251356993 for Picture Perfect: Engaging Customers with Visual Generative AI by Mark Heitmann, Tijmen P.J. Jansen, Martin Reisenbichler and David A. Schweidel in Journal of Marketing
Footnotes
Acknowledgments
The authors thank the JM review team, the participants of the Frankfurt School’s Marketing Research Camp, and attendees of the Workshop on AI in Marketing at NOVA School of Business and Economics for their helpful feedback on earlier versions of this research.
Author Contributions
Authorship is alphabetical to denote overall equal contributions.
Coeditor
Detelina Marinova
Associate Editor
Sha Yang
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the German Research Foundation (DFG), research grant number 460037581.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
