Sage Journals: Discover world-class research

Abstract

The evaluation of AI-generated art has seen increased interest after widespread access to AI-generated art (e.g., DALL-E or Stable Diffusion). While previous studies have suggested that there are preferences for human-generated art, the research remains far from robust with numerous contradictory findings. One potential reason for this discrepancy is differing experimental designs employing comparative or non-comparative methods. To shed light on this problem, two experiments were conducted: one using a Likert scale (N = 250) and another using a 2-alternative forced choice design (N = 102). Our conflicting results between the two designs suggest that traditional Likert-based art appraisals in non-comparative formats may not be sensitive enough to reliably detect preferences that a forced-choice task can reveal. While AI-generated art continues to become more mainstream, people tend to prefer human art in terms of their liking and valuation appraisals when measured in comparative designs that better approximate real-world interaction with art.

Keywords

art perception artificial intelligence forced-choice design

Introduction

The last few years have seen a tremendous rise in the use of artificial intelligence (AI) to create works of art. Popular platforms such as DALL-E (OpenAI, 2025) or Midjourney (Midjourney, 2025) enable millions of users to quickly generate stunning images with simplistic prompts via generative algorithms (e.g., generative adversarial networks or diffusions models). While this form of AI-generated art is relatively new, many researchers have already been examining attitudes and perceptions of AI-generated art. For example, there have been numerous studies empirically investigating how people value and judge a work of art based on its provenance being AI or human (Chiarella et al., 2022; Fortuna & Modliński, 2021; Ragot et al., 2020).

It has been hypothesised that people value human-generated art for the agency required to make it, including the artist's intentionality (Snapper et al., 2015), or effort (Kruger et al., 2004). On a related note, the field of research concerning the degree to which people perceive minds in others (i.e., mind perception) has revealed that while people do attribute agency to AI, albeit to a reduced degree compared to humans, there is less willingness to ascribe experiential features such as feelings to AI (Gray et al., 2007; Jacobs et al., 2022). This has led some (Wu et al., 2021) to suggest that it could be the uncanniness (e.g., Gray & Wegner, 2012) associated with a perceived lack of emotion in AI art that leads to it being appraised less positively than human-generated art.

Most research has supported this preference for human-generated art over AI art (e.g., Fortuna & Modliński, 2021; Ragot et al., 2020). However, not all studies (e.g., Gangadharbatla, 2022; Hong & Curran, 2019). While a number of possible reasons might give rise to these discrepant results (e.g., the painting styles used, true provenance, sample sizes) one likely suspect concerns the context in which preferences are being measured, especially regarding whether or not preferences are being measured in a comparative situation. Supporting this idea, a recent study by Neef et al. (2024) found that when art labelled as either AI-generated or human-created was presented sequentially to observers, a positive bias emerged. People preferred human-created artworks but only when the presentation of those images was intermixed. Neef et al. (2024) attribute this preference for art labelled as human to the mixed presentation order creating a “subliminal competitive scenario” between the two categories of art. This interpretation is supported by the null effects observed by previous studies that presented AI- or human-labelled art in distinct and separate sets (e.g., Gangadharbatla, 2022; Hong & Curran, 2019).

A potential limitation of the Neef et al. (2024) design is that one can only assume that the positive effects emerged because individuals compared “subliminally” the AI- and human-labelled art. An alternative and more direct approach is to employ explicit comparative measures. Specifically, asking individuals to directly choose between the two types of art. It is our working hypothesis that when people are asked to judge AI and human art in isolation, rather than being asked to make a relative comparison, differences in preferences may become masked. Not only does this hypothesis dovetail with the mixed human vs AI art research findings, but it is consistent with what is known in the field of psychometrics. When people are instructed to give an absolute judgment of how much they value or like an item on a numerical scale of, say, one to five, the same numbers may represent very different perceptions for different individuals, and conversely, different numbers may represent the same perception (Kreitchmann et al., 2019; Stadthagen-González et al., 2018; Wetzel et al., 2016; Wildt & Mazis, 1978). Moreover, even when numerical scales are administered in within-subjects designs, acquiescent responding (i.e., the general pattern to agree to items) can limit variability and obscure underlying preferences (Kreitchmann et al., 2019). In contrast, when people are asked to make relative comparisons, for example choosing which of two items they prefer, these potential difficulties with numerical scaling are eliminated and any prevailing preferences in perception can be exposed. The present paper aims to directly test this possibility, using a non-comparative numerical-scale design in Experiment 1 with separate blocks of artist labels and a forced-choice comparative design in Experiment 2.

Experiment 1

In Experiment 1, participants were asked to evaluate four categories of paintings (abstract expressionism, abstract geometrical expressionism, romanticism, and naturalism). Artist labels were manipulated such that each painting category was attributed to one of two kinds of artists (human or AI) at one of two levels of expertise (Human: amateur or professional; AI: weak or strong). The true provenance for all the included paintings were human artists and the proficiency manipulation was included as an exploratory factor in case preferences interacted both with prestige and artist type. There was also an additional control condition that did not assign any labels to the art which was included as a baseline for label comparisons. We administered the dependent measures using traditional numerical scales that range from one to five.

Method

Participants

An a priori power analysis using WebPower (Zhang & Yuan, 2018) for a mixed ANOVA with 90% power, a medium-sized effect (f = .25), and 5 groups suggested a total sample size of 252 participants. 250 participants took part in the online experiment with 50 participants per group. All participants were recruited using CloudResearch (Litman et al., 2017) and took part from IP addresses listed in the United States. No participants were excluded from the analyses. Participants’ mean age was 34.38 ( $S D = 9.61$ ). The rest of the demographic characteristics are presented in Table 1.

Table 1.

Demographic Characteristics of Experiment 1 (N = 250).

Characteristics	Category	Number of Responses	Percentage
Sex	Female	75	30.00%
	Male	175	70.00%
Education	Less than high school	1	0.40%
	High school diploma	15	6.00%
	Some college	35	14.00%
	Associate degree in college	16	6.40%
	Bachelor's degree	149	59.60%
	Master's degree	29	11.60%
	Professional degree or PhD	5	2.00%
Art Knowledge	Not knowledgeable at all	28	11.20%
	Slightly knowledgeable	73	29.20%
	Moderately knowledgeable	78	31.20%
	Very knowledgeable	38	15.20%
	Extremely knowledgeable	33	13.20%
AI Knowledge	Not knowledgeable at all	3	1.20%
	Slightly knowledgeable	62	24.80%
	Moderately knowledgeable	88	35.20%
	Very knowledgeable	68	27.20%
	Extremely knowledgeable	29	11.60%

Materials and Procedure

The study was administered as a Qualtrics survey (Qualtrics, 2023). Each survey question was composed of a painting, a one-line description of the artist's identity (Human [amateur or professional] or AI [weak or strong]), and two rating scales (value and liking). The description for the amateur artist was “This painting was created by an amateur artist,” for the professional artist it was “This painting was created by a professional artist,” for weak AI it was “This painting was created by an AI system from Calgary College of Fine Arts,” and for the strong AI label it was “This painting was created by an AI system from Google and MIT.”

In total, there were 20 paintings that were organised into four distinct blocks consisting of: five abstract, five patterned abstract, five romantic, and five realistic paintings. These four blocks remained consistent across the five groups participants were randomly assigned into. Four of these groups viewed the same blocks of images but with the labels systematically manipulated so that each block was connected either to the amateur artist, professional artist, weak AI, or strong AI labels. These labels were rotated across the four blocks between the four groups, so each block was presented with each of the four types of labels exactly once. The fifth group of participants saw the same four blocks of images but they were presented with no labels to serve as the control condition.

The selection of paintings was made with the intention of selecting a diverse range of images that looked professional but were not well-known or famous images. The images used in this study were sourced from a variety of online repositories, which included public domain collections from the Metropolitan Museum of Art, Wikimedia Commons, and freely licenced images from Pexels (see Appendix for examples). A small number of images were also used under Canada's fair dealing provisions for research purposes. All images were either in the public domain, available under free licences, or included under fair dealing.

After consenting to participate, participants filled in two questionnaires. The first was a standard demographic questionnaire querying the participant's age, sex, and education. Next, the participants were presented with an image of a painting and a short description of the artist's identity. Participants were asked to rate each of the paintings for value (“How much you would you pay for this painting?”) from 1-“None at all” to 5-“A great deal,” and liking (“How much do you like this painting?”) from 1-“Dislike a great deal” to 5-“Like a great deal.” After rating each painting, participants were informed that they had completed the study and were then debriefed and compensated.

Data Analysis and Availability

Data analyses were conducted in R (v4.2.1) using the packages tidyverse (v1.3.2; Wickham et al., 2019), afex (v1.1; Singmann et al., 2015), and ggplot2 (v3.4.2; Wickham et al., 2016). This study was approved by the Behavioural Research Ethics Board of the University of British Columbia (H10-00527). Data are available on OSF at: https://osf.io/wtqxz/.

Results and Discussion

We conducted a 5 (artist labels) × 4 (painting style) analysis of variance on liking attributions with labels as a between-subjects factor and painting style as a within-subjects factor. There was a significant main effect of painting style, $F (1.96, 479.78) = 115.23$ , $M S E = 0.57$ , $p < .001$ , ${\hat{η}}_{G}^{2} = .156$ , but no significant main effect of artist label, $F (4, 245) = 0.62$ , $M S E = 1.72$ , $p = .651$ , ${\hat{η}}_{G}^{2} = .006$ , nor was their interaction significant, $F (7.83, 479.78) = 0.46$ , $M S E = 0.57$ , $p = .881$ , ${\hat{η}}_{G}^{2} = .003$ . Similarly, for the value attributions, the main effect of painting style was significant, F(2.06,504.27) = 102.41, MSE = 0.35, p < .001, ${\hat{η}}_{G}^{2} = .054$ , but no significant main effect of artist label, $F (4, 245) = 0.41$ , $M S E = 4.57$ , $p = .802$ , ${\hat{η}}_{G}^{2} = .006$ , nor was their interaction significant, $F (8.23, 504.27) = 1.48$ , $M S E = 0.35$ , $p = .161$ , ${\hat{η}}_{G}^{2} = .003$ . Altogether, participants valued and liked realistic paintings over other categories irrespective of the painter's identity (AI or human). Supporting these omnibus tests, additional contrasts were conducted using linear models to compare whether art labelled with either types of human labels or types of AI labels differed. After controlling for painting style, there was no difference between human or AI labels for liking (β = .028, SE = .035, t = .804, p = .422) or valuing attributions (β = .078, SE = .042, t = 1.84, p = .066). See Figures 1 and 2 for a visualisation of the results.

Figure 1.

Mean Liking Attributions by Artist Labels and Painting Style. Error Bars Represent Standard Error.

Figure 2.

Mean Valuation Attributions by Group and Painting Style. Error Bars Represent Standard Error.

Figure 3.

Painting Selections by Artist Label and Measure.

The results of Experiment 1 failed to find significant differences between appraisals for art displaying human provenance compared to AI provenance. These null findings replicate the results of a few previous studies (e.g., Gangadharbatla, 2022; Hong & Curran, 2019), but conflict with a myriad of other studies that find significant preferences for human-generated art (e.g., Ragot et al., 2020). These findings support the idea that asking people to provide numerical ratings of each piece on its own may be less sensitive to extracting a reliable preference for human-generated art relative to AI-generated art. A more direct, and sensitive, method may be to ‘force’ participants to choose which of two pieces of art they prefer: human-generated or AI-generated, while keeping all other factors the same.

Experiment 2

In Experiment 2, instead of using a numerical rating task and a blocked design, we required participants to indicate their preference in a two-alternative forced choice (2AFC) task. And to focus purely on the effect of comparative designs, we only manipulated the artist labels of the paintings (human vs. AI) without providing information about the prestige of the artist. We again use a variety of painting styles given previous findings that certain types of art, specifically abstract, are more likely to be associated with AI generation and to remain consistent with the stimuli used in Experiment 1 (Gangadharbatla, 2022). See Figure 3.

Method

Participants

Another power analysis, this time for a 2 × 2 contingency table test for 90% and a medium-sized effect (w = .32) suggested 102 participants. A total of 102 individuals using IP addresses from the United States took part in the study through CloudResearch (Litman et al., 2017). Participants’ mean age was 33.72 ( $S D = 9.37$ ). No participants were excluded. The demographic characteristics of the sample are presented in Table 2.

Table 2.

Demographic Characteristics of Experiment 2 (N = 102).

Characteristics	Category	Number of Responses	Percentage
Sex	Female	31	30.39%
	Male	71	69.61%
Education	High school diploma	9	8.82%
	Some college	11	10.78%
	Associate degree in college	11	10.78%
	Bachelor's degree	54	52.94%
	Master's degree	16	15.69%
Art Knowledge	Not knowledgeable at all	11	10.78%
	Slightly knowledgeable	23	22.55%
	Moderately knowledgeable	38	37.25%
	Very knowledgeable	24	23.53%
	Extremely knowledgeable	6	5.88%
AI Knowledge	Not knowledgeable at all	3	2.94%
	Slightly knowledgeable	19	18.63%
	Moderately knowledgeable	44	43.14%
	Very knowledgeable	23	22.55%
	Extremely knowledgeable	13	12.75%

Materials and Procedure

Experiment 2 was also administered as a Qualtrics survey (Qualtrics, 2023) and followed the same CloudResearch (Litman et al., 2017) procedure including the consent process. Each survey question was composed of two paintings, instructions (“Please examine the following paintings for comparison. One of the paintings was created by an AI system and the other one was created by a human artist.”), a one-line description of the artist identity (human or AI) above each painting (e.g., “Artist Type: AI”), and two forced-choice ratings (valuing and liking; same wording as Experiment 1). In total, there were 16 paintings: 4 abstract, 4 patterned abstract, 4 romantic, and 4 realistic. The paintings were the same as Experiment 1 with the exception that the lowest rated image from each category was removed. Each identity label was assigned to the same side of the screen throughout the survey, but the arrangement was counterbalanced across participants (i.e., the AI label was always presented on the left to half of the participants and vice versa).

Data Analysis and Availability

Data analyses were conducted in R (v4.2.1) using the packages tidyverse (v1.3.2; Wickham et al., 2019) and ggplot2 (v3.4.2; Wickham et al., 2016). This study was approved by the Behavioural Research Ethics Board of the University of British Columbia (H10-00527). Data are available on OSF at: https://osf.io/wtqxz/.

Results

A chi-squared test of independence revealed that switching labels of human and AI artists between groups made a significant difference both for liking preferences, $χ^{2} (1, n = 102) = 21.66$ , $p < .001$ , and valuation preferences, $χ^{2} (1, n = 102) = 20.30$ , $p < .001$ . Participants liked and valued paintings with a human artist label more frequently than an identical painting with an AI artist label—thus indicating an overall preference for human artists.

General Discussion

The immense rise of public access to high-quality AI-generated art has coincided with more scientific research investigating how the nature of the art's creation influences evaluations of the artwork. A commonly cited finding has been that there exist preferences for human-generated art, perhaps for the enhanced intentionality or effort integral to human creation of art (Elgammal et al., 2017; Fortuna & Modliński, 2021; Ragot et al., 2020). However, the findings for this preference remain mixed (e.g., Gangadharbatla, 2022; Hong & Curran, 2019). Critically, the mixed findings have typically occurred in designs that do not promote comparison by intermixing human and AI artist labels (Neef et al., 2024). However, intermixing artist labels can only be assumed to create competitive comparisons. Researchers can employ more direct methodologies, such as forced-choice designs, to explicitly induce competitive comparisons and measure how provenance affects aesthetic preferences. Although forced-choice designs may be less common, they can have greater sensitivity for detecting biases by compelling participants to make an explicit judgment between options. In order to shed light on these conflicting findings, over two experiments differing in experimental design, we conducted a survey experiment using human-generated art from four painting styles that were randomly paired with deceptive labels that were either presented without a comparative design (Experiment 1) or with a two-alternative forced choice design (Experiment 2).

In our first experiment, we found no evidence that people had preferences for human-generated art, nor preferences for greater degrees of artist prestige, suggesting that the artist labels had a minimal impact on participants’ ratings. Similar to Hong and Curran (2019) and Gangadharbatla (2022), participants expressed no observable differences in their liking and valuation preferences based on the artist label indicating its creator was human or AI. The null result with the blocked design also supports Neef et al.'s (2024) finding that a lack of intermixing artist labels can obfuscate underlying preferences for human art or negative biases toward AI-generated art. Though a null finding does not necessarily support a null hypothesis (Leppink et al., 2017), we hypothesised that the lack of emergent preferences may be related to the design involving a non-comparative context which may have reduced the impact of the artist labels. Moreover, given limitations in probing people's attitudes towards art using Likert or continuous scale ratings of attributions (Kreitchmann et al., 2019; Watrin et al., 2019) we thought it was prudent to follow up the results by using a forced-choice dichotomous scale. In this way we could tap more directly into preferences regarding liking and valuation decisions between human- and AI-generated art by forcing participants to choose.

To this end, participants were tasked with deciding which painting to choose between in terms of their liking and valuation—a situation better approximating a prospective buyer perusing an art gallery. Indeed, this difference in method led to completely different results from Experiment 1 with a large preference for human-generated art emerging. These results suggest that people may have underlying preferences for human-created art that may not be consistently captured using more traditional Likert or continuous-scale probes, especially when the art is presented without intermixing artist labels. Direct comparative measures also provide greater ecological validity in that they better approximate art appraisal in the real world, as a decision to purchase a particular piece is typically informed by comparisons with other art. It may be that in comparative contexts—both in research designs and in real-world situations—factors such as intentionality and effort become larger considerations in evaluating artworks.

There are some notable limitations to this work. For example, we only used artworks that were originally produced by human artists which may have influenced our results by skewing the believability of the deception. The AI artist labels were always deceptive and may have been less believable than the human artist labels which may have moderated emergent preferences for human provenanced art. Moreover, our assumption that participants would differentiate between types of artist prestige (i.e., amateur/professional or weak/strong AI) was not observed and may be related to the labels’ wording not inducing a strong separation both for the human labelled art and the AI labelled art. We did not check if participants believed the identity labels, so we are unable to test for differences in believability between identity conditions or for an effect of believability on liking and valuation. However, we note that this limitation did not prevent the Experiment 2 finding of preferences for human-generated art. An additional limitation of Experiment 2 was the omission of manipulating artist prestige. While this was done to streamline the methodological comparison, there could be an interaction between how prestige interacts with measurement style, however unlikely. We suggest future studies use AI-generated artworks in similar comparative designs to increase external validity in this line of research. We also did not investigate the roles of individual differences or more complex situations such as co-created art as other studies have examined (e.g., Fortuna & Modliński, 2021). An interesting future direction would be to investigate if human co-creation of art with AI (Oh et al., 2018; Wu et al., 2021) influences appraisals and also benefits from more sensitive comparative designs. Similarly, future work could explore how positive biases for human-generated art might affect real-world decision making in conjunction with other beliefs such as whether a piece of art will appreciate in value.

In conclusion, our results revealed that previously mixed findings regarding preferences for human-generated art may be obscured by traditional numerical-rating designs or non-competitive methods. However, by employing comparative designs, such as with the use of forced-choice questionnaires, underlying differences in preference can be exposed. Crucially, this finding has broader implications for the field of human-computer interaction by suggesting that methodological measures that more explicitly reinforce direct comparisons between humans and AI can induce stronger contrast effects. Finally, comparative measures also provide greater ecological validity for distinguishing between human and AI-generated art as people typically make decisions about art in comparative contexts rather than through Likert-like appraisals in the real world.

Footnotes

ORCID iDs

Oliver Jacobs

Farid Pazhoohi

Grayson Mullen

Alan Kingstone

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Sciences and Engineering Research Council of Canada, (grant number RGPIN-2022-03079).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author Biographies

Oliver Jacobs is a researcher in psychology examining social cognition and human-computer interaction with emerging technologies.

Farid Pazhoohi is a lecturer in psychology. His area of research is social and cognitive psychology, with an interdisciplinary approach.

Grayson Mullen is a graduate student in psychology whose research involves human-computer interaction.

Alan Kingstone is a professor in psychology and expert in approaches to studying attention and cognitive ethology.

Appendix

References

Chiarella

S. G.

Torromino

Gagliardi

D. M.

Rossi

Babiloni

Cartocci

(2022). Investigating the negative bias towards artificial intelligence: Effects of prior assignment of AI-authorship on the aesthetic appreciation of abstract paintings. Computers in Human Behavior, 137, 107406. https://doi.org/10.1016/j.chb.2022.107406

Elgammal

Liu

Elhoseiny

Mazzone

(2017). CAN: Creative adversarial networks, generating “art” by learning about styles and deviating from style norms (No. arXiv:1706.07068). arXiv. http://arxiv.org/abs/1706.07068

Fortuna

Modliński

(2021). A(I)rtist or counterfeiter? Artificial intelligence as (d)evaluating factor on the art market. The Journal of Arts Management, Law, and Society, 51(3), 188–201. https://doi.org/10.1080/10632921.2021.1887032

Gangadharbatla

(2022). The role of AI attribution knowledge in the evaluation of artwork. Empirical Studies of the Arts, 40(2), 125–142. https://doi.org/10.1177/0276237421994697

Gray

H. M.

Gray

Wegner

D. M.

(2007). Dimensions of mind perception. Science, 315(5812), 619–619. https://doi.org/10.1126/science.1134475

Gray

Wegner

D. M.

(2012). Feeling robots and human zombies: Mind perception and the uncanny valley. Cognition, 125(1), 125–130. https://doi.org/10.1016/j.cognition.2012.06.007

Hong

J.-W.

Curran

N. M.

(2019). Artificial intelligence, artists, and art: Attitudes toward artwork produced by humans vs. Artificial intelligence. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(2s), 1–16. https://doi.org/10.1145/3326337

Jacobs

O. L.

Gazzaz

Kingstone

(2022). Mind the robot! Variation in attributions of mind to a wide set of real and fictional robots. International Journal of Social Robotics, 14(2), 529–537. https://doi.org/10.1007/s12369-021-00807-4

Kreitchmann

R. S.

Abad

F. J.

Ponsoda

Nieto

M. D.

Morillo

(2019). Controlling for response biases in self-report scales: Forced-choice vs. Psychometric modeling of Likert items. Frontiers in Psychology, 10, 2309. https://doi.org/10.3389/fpsyg.2019.02309

10.

Kruger

Wirtz

Van Boven

Altermatt

T. W.

(2004). The effort heuristic. Journal of Experimental Social Psychology, 40(1), 91–98. https://doi.org/10.1016/S0022-1031(03)00065-9

11.

Leppink

O’Sullivan

Winston

(2017). Evidence against vs. In favour of a null hypothesis. Perspectives on Medical Education, 6(2), 115–118. https://doi.org/10.1007/S40037-017-0332-6

12.

Litman

Robinson

Abberbock

(2017). Turkprime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49(2), 433–442. https://doi.org/10.3758/s13428-016-0727-z

13.

Midjourney. (2025). Midjourney [AI image generation platform]. https://www.midjourney.com

14.

Neef

N. E.

Zabel

Papoli

Otto

(2024). Drawing the full picture on diverging findings: Adjusting the view on the perception of art created by artificial intelligence. AI & Society, 40, 1–21. https://doi.org/10.1007/s00146-024-02020-z

15.

Song

Choi

Kim

Lee

Suh

(2018). I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1–13). https://doi.org/10.1145/3173574.3174223

16.

OpenAI. (2025). DALL·E (Version 3) [AI image generator]. https://openai.com/dall-e

17.

Qualtrics. (2023). Qualtrics XM platform [Computer software]. https://www.qualtrics.com

18.

Ragot

Martin

Cojean

(2020). AI-generated vs. Human artworks. A perception bias towards artificial intelligence? Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1–10). https://doi.org/10.1145/3334480.3382892

19.

Singmann

Bolker

Westfall

Aust

Ben-Shachar

M. S.

Højsgaard

Fox

Lawrence

M. A.

Mertens

Love

(2015). Package ‘afex’. http://afex.singmann.science/, https://github.com/singmann/afex

20.

Snapper

Oranç

Hawley-Dolan

Nissel

Winner

(2015). Your kid could not have done that: Even untutored observers can discern intentionality and structure in abstract expressionist art. Cognition, 137, 154–165. https://doi.org/10.1016/j.cognition.2014.12.009

21.

Stadthagen-González

López

Parafita Couto

M. C.

Párraga

C. A.

(2018). Using two-alternative forced choice tasks and Thurstone’s law of comparative judgments for code-switching research. Linguistic Approaches to Bilingualism, 8(1), 67–97. https://doi.org/10.1075/lab.16030.sta

22.

Watrin

Geiger

Spengler

Wilhelm

(2019). Forced-choice versus Likert responses on an occupational Big Five Questionnaire. Journal of Individual Differences, 40(3), 134–148. https://doi.org/10.1027/1614-0001/a000285

23.

Wetzel

Lüdtke

Zettler

Böhnke

J. R.

(2016). The stability of extreme response style and acquiescence over 8 years. Assessment, 23(3), 279–291. https://doi.org/10.1177/1073191115583714

24.

Wickham

Averick

Bryan

Chang

McGowan

L. D. A.

François

Grolemund

Hayes

Henry

Hester

Kuhn

Pedersen

T. L.

Miller

Bache

S. M.

Müller

Ooms

Robinson

Seidel

D. P.

Spinu

…

Yutani

(2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

25.

Wickham

Chang

Wickham

M. H.

(2016). Package ‘ggplot2’. Create Elegant Data Visualisations Using the Grammar of Graphics. Version, 2(1), 1–189. https://doi.org/10.1007/978-0-387-98141-3

26.

Wildt

A. R.

Mazis

M. B.

(1978). Determinants of scale response: Label versus position. Journal of Marketing Research, 15(2), 261–267. https://doi.org/10.1177/002224377801500209

27.

Zeng

Shidujaman

(2021). AI Creativity and the human-AI co-creation model. In Kurosu

(Ed.), Human-computer interaction. Theory, methods and tools (Vol. 12762, pp. 171–190). Springer International Publishing. https://doi.org/10.1007/978-3-030-78462-1_13

28.

Zhang

Yuan

K.-H.

(2018). Practical statistical power analysis using Webpower and R (Eds). ISDSA Press.

Comparative Designs Reveal Preferences for Human-Generated Rather Than AI-Generated art

Abstract

Keywords

Introduction

Experiment 1

Method

Participants

Materials and Procedure

Data Analysis and Availability

Results and Discussion

Experiment 2

Method

Participants

Materials and Procedure

Data Analysis and Availability

Results

General Discussion

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

Author Biographies

Appendix

References