Sage Journals: Discover world-class research

Abstract

The introduction of ChatGPT—an artificial-intelligence (AI) chatbot capable of text recognition and generation—has been transformative for numerous academic research communities, including psychology. We propose using ChatGPT to reduce researchers’ cognitive load and time spent creating text materials for psychological studies (e.g., vignettes). We present examples of ChatGPT-generated text materials for relationship-science (N = 60) and social-cognition (N = 67) studies and provide evidence of their effectiveness. Furthermore, we discuss ethical considerations and make recommendations related to using text materials generated by ChatGPT or similar AI tools. We end with a brief discussion of the importance of this work and encourage others to leverage AI in the field of psychology.

Keywords

ChatGPT artificial intelligence psychological research experimental stimuli open data preregistration

Rapid advancements in artificial intelligence (AI), particularly in large language models (LLMs), have drastically altered how people use digital technologies. An extensive review of LLMs is out of the scope of this article and has been thoroughly discussed by others (e.g., see De Angelis et al., 2023; Demszky et al., 2023). However, a brief introduction of key features of LLMs and ChatGPT¹ specifically is relevant. LLMs are advanced and complex statistical systems with two key capabilities: natural language understanding and natural language processing (i.e., text recognition and generation; IBM, 2023). These capabilities are possible through a training process in which LLMs are exposed to diverse data sets of textual sources, such as books, articles, webpages, social media, and technical documents (Hua et al., 2024). Exposure to these textual sources allows LLMs to learn the nature of human language and statistically predict the next word in a sequence more accurately to produce coherent responses (Hua et al., 2024; IBM, 2023). LLMs are a diverse technological tool because they provide the foundational capabilities for a variety of applications rather than being specialized to do only one task (Hua et al., 2024; IBM, 2023). For instance, variations in the prompts given will result in variations of the tasks that it performs (Lin, 2024; Meyer et al., 2023). To illustrate, LLMs can perform a wide range of tasks, such as question answering, summarization, translations, and data analysis, to name a few (Demszky et al., 2023; Fraiwan & Khasawneh, 2023; Ke et al., 2024; Lund et al., 2023; Yenduri et al., 2024). ChatGPT specifically harnesses OpenAI’s advanced generative-pretrained-transformer (GPT) technology, which uses deep learning (i.e., neural networks that involve many hidden layers; IBM, 2023; LeCun et al., 2015). The greater the number of hidden layers is, the greater is the sophistication and complexity of inputs/outputs it can understand/produce (LeCun et al., 2015). It gives ChatGPT the ability to produce remarkably human-like text responses compared with other LLMs (for a detailed description of ChatGPT and its evolution, see Hua et al., 2024). Furthermore, among the available LLMs, ChatGPT is arguably the most user-friendly (De Angelis et al., 2023; Meyer et al., 2023). Therefore, even individuals without technical expertise can take advantage of this technology.

Multiple uses of LLMs have been proposed to increase productivity in and to advance psychological research. Among the proposed uses are conducting literature reviews, hypothesis generation, experimental design (e.g., stimuli generation), data analysis, academic writing, peer review, and practice (e.g., case management; Demszky et al., 2023; Ke et al., 2024). Although exploring these possibilities is exciting and valuable, there are numerous limitations that have been discussed regarding being able to achieve them using ChatGPT or any other LLMs. Specifically, some concerns involve hallucinations (i.e., errors), biases, plagiarism, copyright violations, privacy concerns, and many others (Hua et el., 2024; Khowaja et al., 2024; Wu et al., 2024; Yenduri et al., 2024). Nonetheless, many researchers have used LLMs to benefit their academic and research goals (Ke et al., 2024; Raman, 2025; Salah, Al Halbusi, & Abdelfattah, 2023).

We are particularly interested in the role of ChatGPT in generating experimental text stimuli. Although others have suggested using AI tools for stimuli generation, as mentioned earlier, we are one of the first to try it and evaluate its effectiveness. Many psychological-research studies use specific sets of text materials (e.g., vignettes) to test their hypotheses. Depending on the design of the study, creating the materials can be a tedious, time-consuming task. ChatGPT can reduce a researcher’s cognitive load and overall time spent on creating these materials. In this article, we discuss the utility and success of ChatGPT in generating experimental text stimuli for psychological research. We specifically selected ChatGPT because it is one of the LLMs at the forefront of this technological revolution and is easy to use. In the first part of this article, we present examples of GPT-4-generated text materials and empirical data testing their effectiveness. In the second part of this article, we discuss ethical considerations and make recommendations related to using text materials generated by ChatGPT or similar AI tools. In the last part of this article, we discuss the importance of this work and encourage others to leverage AI tools in the field of psychology.

Part 1

Two examples of materials created using GPT-4 will be discussed. Both sets of materials were created to mimic materials previously developed by a researcher for studies that have been conducted. These examples demonstrate and support our claim that GPT-4—with its current capabilities—can be used for this purpose. A brief description of each of the studies is provided to give context to what materials were required to test those research questions. Along with the description, our predictions will be detailed. Then, we address how the materials were generated using GPT-4. Finally, we present our findings. All studies discussed in this article were approved by the Institutional Review Board at The University of Texas at El Paso. Our predictions and methods were preregistered and are available on our OSF profile (https://osf.io/w7scm/?view_only=4b49de111fbe45bb916f96c13e149ab7). Our materials, data, and syntax are also available on our OSF profile (https://osf.io/kjtqx/?view_only=94b2c8a1939c48e1b2af9a7b3694cbc9). Any deviations from our preregistration are detailed in Table 1, as suggested by Willroth and Atherton (2024), to maintain transparency and increase credibility of research findings.

Table 1.

Preregistration Deviations

No.	Details		Original wording	Deviation description	Reader impact
Deviations
1	Type	Sample	“Our goal is to collect at least 80 participants for each study. However, our final sample will depend on how many participants were collected prior to the end of the semester.”	Because of a change in institutional affiliations, data collection was terminated at the end of the semester, and a sample of 80 participants per study was not met.	A smaller sample size reduced our statistical power to detect our effects of interest (i.e., equivalence), making our results for equivalence less reliable and requiring them to be interpreted with caution.
	Reason	Plan not possible
	Timing	During data collection
2	Type	Analysis	“T-tests will be used to compare mean ratings in human-generated and AI-generated materials.”	TOST procedures were used to test the equivalence between the effect sizes for human-generated and AI-generated materials, as suggested by our reviewers.	The TOST procedure was the most appropriate analytical approach for testing our hypotheses. In addition, the results obtained with this approach were consistent with our previous findings; the main difference is that equivalence for those effects was tested.
	Reason	Peer review
	Timing	After results known
Unregistered steps
1	Type	Variable	Trait rating was the only preregistered outcome variable to test the effectiveness of sentences.	The effectiveness of sentences was based on both trait ratings and trait-valence scores.	Using both trait ratings and trait-valence scores was the most appropriate way to evaluate the effectiveness of sentences, as done by the researcher who developed the human-generated sentences. This strengthened our assessment of effectiveness.
1	Timing	After data access

Note: TOST = two one-sided tests; AI = artificial intelligence.

Prompt tuning

The output generated by LLMs can be improved in multiple ways (Demszky et al., 2023). However, prompt tuning is one way of doing this without requiring changing the underlying model parameters (i.e., how it has been trained; Demszky et al., 2023). Although prompt engineering (i.e., developing effective instructions to request LLMs to generate output) also requires technical expertise, compared with other forms of fine-tuning LLMs, prompt engineering is more accessible. Some guidelines for prompt tuning have been put forward. Lin (2024) summarized different strategies for writing effective prompts for LLMs. These strategies include guiding the model to solutions (i.e., providing incremental instructions), adding relevant context, being explicit in the instructions, asking for multiple options, assigning characters (i.e., instructing the model to role-play), showing examples, declaring a preferred response format, and experimenting (i.e., making tweaks; Lin, 2024). The prompts entered to generate materials for the two studies presented here used multiple of these strategies.

Example 1

Description of research study

One of the purposes of this research study was to test how the level of inclusion of other in the self is associated with disregarding red flags (i.e., an indicator that a person is potentially bad for one’s mental and physical well-being) in romantic partners. This research question was tested by using online dating profiles that incorporate an inclusion-of-other-in-the-self manipulation and reveal a transgression committed by a potential romantic partner.

In this investigation, we tested and compared only the effectiveness of an inclusion-of-other-in-the-self manipulation using human-generated and AI-generated (i.e., GPT-4) materials. It was predicted that inclusion of other in the self would be greater in the high (vs. low) self-disclosure condition for both the human-generated and AI-generated stimuli. It was also predicted that the effectiveness of human-generated and AI-generated stimuli would be equivalent. Finally, it was predicted that participants would indicate low confidence in their ability to distinguish between human-generated and AI-generated stimuli and would demonstrate this inability.

Method

Inclusion-of-other-in-the-self manipulation

The level of inclusion of other in the self (i.e., overlap in identities; Aron et al., 1992) was manipulated by controlling the level of self-disclosure to six question prompts included in the online dating profiles. The question prompts differ in the level of intimacy (i.e., low, moderate, and high), and they were selected from an interpersonal closeness-generating task (Aron et al., 1997). Higher levels of self-disclosure are associated with higher levels of inclusion of other in the self (Aron et al., 1997). Therefore, online dating profiles with high (low) self-disclosure answers are high (low) on inclusion of other in the self. First, we entered the following in the chat box:

Imagine that you are a college student who is creating an online dating profile.

Create an online dating profile answering only the questions prompts below and create responses that are low in self-disclosure.

Question Prompt #1: For what in your life do you feel the most grateful?

Question Prompt #2: If you could wake up tomorrow having gained any one quality or ability, what would it be?

Question Prompt #3: What is your most terrible memory?

Question Prompt #4: If you knew that in one year you would die suddenly, would you change anything about the way you are now living? Why?

Question Prompt #5: Complete this sentence: “I wish I had someone with whom I could share . . . ”

Question Prompt #6: If you were to die this evening with no opportunity to communicate with anyone, what would you most regret not having told someone? Why haven’t you told them yet?

Here is an example of an online dating profile with answers that are low in self-disclosure.

For what in your life do you feel most grateful? That I am still alive.

If you could wake up tomorrow having gained any one quality or ability, what would it be? I would like the ability to fly or have super strength.

What is your most terrible memory? I don’t feel comfortable sharing this. Sorry.

If you knew that in one year you would die suddenly, would you change anything about the way you are now living? Why? No. I like how it is.

Complete this sentence: “I wish I had someone with whom I could share . . . ” My Food. I think it’s delicious and I always make extra.

If you were to die this evening with no opportunity to communicate with anyone, what would you most regret not having told someone? Why haven’t you told them yet? My deepest secret. I haven’t told people because it is a secret.

Avoid repetitiveness between the answers in the profile that is generated and the answers in the profile that was provided as an example.

After its first iteration of output, GPT-4 was told the following:

Recreate the profile. Keep the same answers but make them less expressive of personality and interests.

For GPT-4’s response to this final request, see Table 2 (OpenAI, 2024). Then, we entered the same prompt but changed our request to create responses that are high in self-disclosure and to make them different than the response it generated for low self-disclosure.

Imagine that you are a college student who is creating an online dating profile.

Now, create an online dating profile answering only the questions prompts below and create responses that are high in self-disclosure.

Question Prompt #1: For what in your life do you feel the most grateful?

Question Prompt #2: If you could wake up tomorrow having gained any one quality or ability, what would it be?

Question Prompt #3: What is your most terrible memory?

Question Prompt #4: If you knew that in one year you would die suddenly, would you change anything about the way you are now living? Why?

Question Prompt #5: Complete this sentence: “I wish I had someone with whom I could share . . . ”

Question Prompt #6: If you were to die this evening with no opportunity to communicate with anyone, what would you most regret not having told someone? Why haven’t you told them yet?

Table 2.

Artificial-Intelligence-Generated Online Dating Profiles

Question prompt	High inclusion	Low inclusion
For what in your life do you feel the most grateful?	I’m incredibly thankful for the opportunity to study at a university that inspires me every day. The community and the intellectual environment here have really shaped who I am.	Grateful for weekends.
If you could wake up tomorrow having gained any one quality or ability, what would it be?	I wish I could master any musical instrument instantly. Music has always been a passion of mine, and being able to express myself through any instrument would be a dream come true.	To never need to charge my phone.
What is your most terrible memory?	Losing my best friend in high school was devastating. It was a tough period in my life, and it taught me a lot about grief and healing.	Prefer not to discuss past events.
If you knew that in one year you would die suddenly, would you change anything about the way you are now living? Why?	I would quit putting off contacting old friends and making new ones. I’d focus more on creating meaningful relationships and less on worrying about grades and job prospects.	Maybe see a few places.
Complete this sentence: “I wish I had someone with whom I could share . . . ”	My passion for indie films and quirky documentaries. It’s always more enjoyable to watch and discuss them with someone who appreciates their nuances as much as I do.	Watching movies.
If you were to die this evening with no opportunity to communicate with anyone, what would you most regret not having told someone? Why haven’t you told them yet?	I’d regret not telling a certain someone from my past how deeply they impacted my life. It’s hard to find the right moment for such a confession, and I guess I’ve always hoped for the perfect timing.	Appreciation for friends. It seemed obvious.

Here is an example of an online dating profile with answers that are high in self-disclosure.

For what in your life do you feel most grateful? I am grateful for having a family who loves and cares about me. I am also grateful for the love and endless support I receive from my friends. They have become my second family. I can’t imagine going through life without them.

If you could wake up tomorrow having gained any one quality or ability, what would it be? I would like the ability to teleport myself anywhere I want. Teleporting would give me the opportunity to travel around the world, see the beauty that this world has to offer, and experience new cultures.

What is your most terrible memory? My most terrible memory is receiving a call from a family member to inform me that my mother had died. For months, I felt anxious every time I received a call. I was always worried that someone would call me to tell me something bad had happened again.

If you knew that in one year you would die suddenly, would you change anything about the way you are now living? Why? I would attempt to experience life more. I would try to do the things that I keep saying I will but had been reluctant to do like traveling on my own. I would also reconsider my priorities. I would spend more time making memories with my loved ones.

Complete this sentence: “I wish I had someone with whom I could share . . . ” My life. I want to find someone with whom I can share my successes and struggles and someone that feels they can do the same with me.

If you were to die this evening with no opportunity to communicate with anyone, what would you most regret not having told someone? Why haven’t you told them yet? My biggest regret would be not telling my family and friends how much I love them and how much better my life is because of them. I think I assume that they already know, but I should express my feelings to them more often.

Avoid repetitiveness between the answers in the profile that is generated and the answers in the profile that was provided as an example.

For GPT-4’s response to this request, see Table 2 (OpenAI, 2024). Our human-generated online dating profiles were the examples in our prompts to create the AI-generated profiles.

Procedure

Our final sample consisted of 60 college students from a Hispanic-Serving Institution who participated in this study in exchange for class credit (38.33% female, 58.33% male, 3.33% nonbinary; 3.33% African American, 8.33% White, 1.67% Asian, 85% Hispanic, 1.67% other; age: range = 18–27, M = 20.28 years, SD = 1.96). Participant responses were excluded from data analyses if they were not single (n = 3) or did not reconsent after being debriefed (n = 3). No participant responses were excluded for extreme responses (i.e., 2.5 SD above or below the mean on outcome variables). Participants first completed an online dating profile by answering the same question prompts in the four online dating profiles they reviewed. Half of the online dating profiles were human-generated (i.e., created by a researcher), and the other half were AI-generated (i.e., created by GPT-4). In addition, each set of online dating profiles had one profile high on inclusion of other in the self and one low on inclusion of other in the self. Participants were randomly presented with the online dating profiles, and they rated each online dating profile on various attributes using a 7-point Likert scale. Specifically, for each online dating profile, participants rated the perceived level of self-disclosure (“Please rate the level of self-disclosure for [NAME]”), the expected level of inclusion of other in the self (“Please select the set of circles that best represent how close you anticipate feeling towards the person in this dating profile”), and the likelihood that this was created by a human versus AI (“Please rate the likelihood that this was created by a human versus being generated by artificial intelligence”). At the end of the study, participants rated their overall confidence in distinguishing between human- and AI-generated materials (“Overall, how confident do you feel in your ability to distinguish between content created by humans and content generated by artificial intelligence”). Finally, participants were debriefed and thanked for their participation.

Results and discussion

Data analytical plan

First, the effectiveness of our materials was tested separately by source (i.e., human-generated vs. AI-generated). Then, two-one-sided-tests (TOST) procedures were used to test the equivalence between the effect sizes for human-generated and AI-generated materials (Lakens et al., 2018) using the TOSTER R package (Lakens & Caldwell, 2025). The equivalence bounds were set at Cohen’s d = ±0.15, representing the smallest effect size of interest. This effect size was selected based on a recent meta-analysis, which calculated the average small (d = 0.15), medium (d = 0.36), and large (d = 0.65) effect sizes in social psychology (Lovakov & Agadullina, 2021). Finally, a Bonferroni correction was used to adjust for multiple tests (i.e., nine statistical tests) using the same sample. The alpha level was adjusted to .006, and all interpretations of significance were based on this threshold. When we compared human-generated and AI-generated materials, our interpretation was based on the results of both null hypothesis significant testing (NHST) and TOST (i.e., lower and upper bounds; Lakens et al., 2018). Equivalence was concluded when both one-sided tests were significant, indicating that differences (if any) were within the bounds of the determined smallest effect size of interest. Nonequivalence was concluded when the NHST was significant and one or both TOST were nonsignificant, indicating that mean differences corresponded to differences in performance. Results were considered inconclusive when the NHST and one or both TOST were nonsignificant, indicating that there was no sufficient evidence to distinguish between equivalence and nonequivalence.

Human-generated online dating profiles

A paired samples t test indicated that levels of inclusion of other in the self were higher for potential romantic partners paired with high self-disclosure responses (M = 4.62, SD = 1.91) than for potential romantic partners paired with low self-disclosure responses (M = 2.68, SD = 1.89), t(59) = 5.95, p < .001, Hedges’s g = 1.02. Therefore, the manipulation was effective.

AI-generated online dating profiles

A paired samples t test indicated that levels of inclusion of other in the self were higher for potential romantic partners paired with high self-disclosure responses (M = 3.75, SD = 1.86) than for potential romantic partners paired with low self-disclosure responses (M = 2.38, SD = 1.79), t(59) = 5.06, p < .001, Hedges’s g = 0.75. Therefore, the manipulation was effective.

Comparing human-generated and AI-generated online dating profiles

A paired samples t test (NHST) indicated that the human-generated online dating profile (M = 4.62, SD = 1.91) performed better than the AI-generated online dating profile (M = 3.75, SD = 1.86), t(59) = 3.53, p < .001, Hedges’s g = 0.45, when manipulating high levels of inclusion of other in the self. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .006) was nonsignificant, t(59) = 4.69, p < .001 (lower bound), t(59) = 2.37, p = .989 (upper bound), 98.8% confidence interval [CI] = [0.23, 1.51]. This test indicated a notable difference in the strength of manipulation for high inclusion of other in the self between human-generated and AI-generated profiles favoring the human-generated materials. Another paired samples t test (NHST) indicated no difference between the human-generated online dating profile (M = 2.68, SD = 1.89) and the AI-generated online dating profile (M = 2.38, SD = 1.79) for low levels of inclusion of other in the self, t(59) = 1.23, p = .224, Hedges’s g = 0.16. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .006) was also nonsignificant, t(59) = 2.39, p = .010 (lower bound), t(59) = .07, p = .527 (upper bound), 98.8% CI = [−0.33, 0.93]. Therefore, there is no sufficient evidence to distinguish between a significant difference and equivalence in the strength of manipulation for low inclusion of other in the self between human-generated and AI-generated materials.

Confidence and ability to distinguish source

Overall confidence in the ability to distinguish between human-generated and AI-generated content was closer to a neutral point (M = 4.15, SD = 1.61). A paired samples t test (NHST) indicated no difference between the ratings of perceived source (human-generated vs. AI-generated) between the human-generated (M = 5.02, SD = 1.37) and AI-generated online dating profiles (M = 4.73, SD = 1.32), t(59) = 1.64, p = .106, Hedges’s g = 0.21. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .006) was also nonsignificant, t(59) = 2.80, p = .003 (lower bound), t(59) = .48, p = .683 (upper bound), 98.8% CI = [−0.17, 0.75]. Therefore, there is no sufficient evidence to distinguish between a significant difference and equivalence in the ability to discern between human-generated and AI-generated materials.

Our findings provide partial support for our predictions. Our manipulation was effective regardless of the source (i.e., human-generated vs. AI-generated) of the online dating profiles. In addition, participants lacked the confidence to distinguish between human-generated and AI-generated materials. However, our data were insufficient to properly distinguish between a significant difference and equivalence for the low inclusion-of-other-in-the-self online dating profiles and for participants’ ability to discern between sources. Based on our one interpretable comparison, human-generated materials were superior to AI-generated materials, but this result should be carefully considered.

Example 2

Description of research study

One of the purposes of this research study was to test the influence of stereotypes on spontaneous trait inferences (i.e., traits inferred from observable behavior; Bray, 2019). This research question was tested using a recognition-probe paradigm in which participants read sentences and identified whether a word (i.e., a probe) was present or absent from the sentence. Slower reaction times to words that were implied but not present in the sentence demonstrate the occurrence of a spontaneous trait inference.

In this investigation, we tested and compared only the effectiveness of sentences in implying a trait through the description of behavior. It was predicted that the effectiveness of human-generated and AI-generated stimuli would be equivalent. Finally, it was predicted that participants would indicate low confidence in their ability to distinguish between human-generated and AI-generated stimuli and would demonstrate this inability.

Method

Spontaneous-trait-inference sentences

Sentences for spontaneous trait inferences must describe a behavior that indicates a trait without explicitly saying what the trait is. A total of 24 traits (12 positive, 12 negative) were used to create the sentences. We entered the following in the chat box:

Create a short sentence for each of the following traits: Caring, Smart, Helpful, Polite, Honest, Loyal, Lazy, Selfish, Jealous, Rude, Mean, Annoying, Considerate, Ambitious, Friendly, Brave, Respectful, Creative, Impolite, Nosy, Stubborn, Bossy, Stupid, Clumsy. Each sentence should imply the trait through observable behavior. The sentences should not explicitly mention the trait that it is implying.

Here is an example. The sentence below implies the trait of caring through the description of behavior without explicitly mentioning the trait of caring.

Helped the elderly lady pack her groceries into the car.

Here is a second example. The sentence below implies the trait of smart through the description of behavior without explicitly mentioning the trait of smart.

Aced the neuroscience project for their psychology class.

For GPT-4’s response to this request, see Table 3 (OpenAI, 2024). Our human-generated sentences are also presented in Table 3.

Table 3.

Spontaneous-Trait-Inference Sentences

Implied trait	Human-generated sentence	AI-generated sentence
Caring	Helped the elderly lady pack her groceries into the car	Volunteered to watch the neighbor’s kids on short notice
Smart	Aced the neuroscience project for their psychology class	Solved the complex calculus problem that stumped the rest of the class
Helpful	Gave directions to the freshman on how to get to the student union	Offered to explain the instructions again when someone seemed confused
Polite	Opened the door for the delivery man with the giant box	Always says thank you to the bus driver when getting off
Honest	Discreetly told their friend that they were wearing torn pants	Admitted to breaking the vase even though no one was around to see it
Loyal	Never shares any juicy secrets about any of their friends	Stood by their friend during the tough times despite the personal cost
Lazy	Spent the day watching Netflix instead of working on the project due at midnight	Always finds an excuse to avoid doing their share of the work
Selfish	Ate the last serving of food before everyone else got a chance to eat	Took the biggest slice of cake before anyone else could choose
Jealous	Bad mouthed their classmate for getting a high exam score	Couldn’t hide their irritation when their friend won the award
Rude	Laughed and called the lady a “fat ass” when she walked past them	Interrupted the conversation without saying excuse me
Mean	Splashed the homeless man by purposefully driving into a nearby puddle	Made fun of a classmate’s effort in their presentation
Annoying	Blocked the grocery aisle with the shopping cart so people could not walk past	Keeps tapping their pen even after being asked to stop several times
Considerate	Ensured vegetarian options were on the menu for the student welcome event	Lowered the music volume when they noticed it was disturbing others
Ambitious	Worked multiple jobs to save money for their new business	Spends weekends attending additional courses to enhance their career prospects
Friendly	Complimented Ben on his final project presentation	Always greets new coworkers with a warm smile and introduction
Brave	Intervened to save a stranger despite being badly outnumbered	Ran towards the burning building to help rescue pets
Respectful	Took off their cap for the national anthem	Listens intently without checking their phone during conversations
Creative	Was accepted to showcase their work at the art museum downtown	Built an innovative model out of recycled materials for the science fair
Impolite	Interrupted the professor while they were talking to another student	Asked a prying question about personal finances at the dinner table
Nosy	Paid attention to the couple’s conversation from the table next to them	Kept asking about the neighbor’s visitors last night
Stubborn	Insisted on walking to the cafeteria even though someone told them it was closed	Refused to change their opinion despite the evidence presented
Bossy	Ordered the rest of the group to continue working even though everyone was tired	Dictated everyone’s tasks without asking for their input
Stupid	Attempted to steal a new car but got locked inside	Tried to push the door open despite the clear “Pull” sign
Clumsy	Dropped an expensive piece of artwork as they removed the packaging	Tripped over their own feet walking across a flat surface

Procedure

Sixty-seven college students from a Hispanic-Serving Institution participated in this study in exchange for class credit (56.70% female, 37.31% male, 2.99% nonbinary; 5.97% African American, 17.91% White, 1.49% Asian, 74.63% Hispanic; age: range = 18–46, M = 20.99 years, SD = 4.17). There were no inclusion criteria, and no participant responses were excluded from data analyses based on failure of attention checks or extreme responses (i.e., 2.5 SD above or below the mean on outcome variables). Participants rated 48 spontaneous-trait-inference sentences presented in a random order. Half of the sentences were human-generated (i.e., created by a researcher), and the other half were AI-generated (i.e., created by GPT-4). Participants rated each sentence on various attributes using a 7-point Likert scale. Specifically, after each sentence, participants rated the valence (“How good/bad do you think this behavior is?”), the trait agreement (“How well does the trait [TRAIT] describe this behavior?”), and the likelihood that this was created by a human versus AI (“Please rate the likelihood that this was created by a human versus being generated by artificial intelligence”). At the end of the study, participants rated their overall confidence in distinguishing between human- and AI-generated materials (“Overall, how confident do you feel in your ability to distinguish between content created by humans and content generated by artificial intelligence”). Finally, participants were debriefed and thanked for their participation.

Results and discussion

Data analytical plan

We used the same approach as in Example 1. However, our alpha level was adjusted to .013 based on the multiple tests conducted using this sample (i.e., four statistical tests). All interpretations of significance were based on this threshold and when relevant, on the alignment between the results of the NHST and TOSTs.

Trait-agreement and -valence ratings

Effective sentences based on trait-agreement ratings must demonstrate higher ratings for positive and negative traits. For a summary of the descriptive information for each sentence across sources, see Table 4. All sentences, irrespective of source, strongly implied the traits. The only exception was stupid, in which the AI-generated sentence did not reach a trait rating above a neutral score. A paired samples t test (NHST) indicated no difference between average trait ratings for human-generated (M = 6.16, SD = 0.49) and AI-generated (M = 6.04, SD = 0.48) sentences, t(66) = 2.00, p = .049, Hedges’s g = 0.24. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .013) was also nonsignificant, t(66) = 3.23, p < .001 (lower bound), t(66) = .78, p = .78 (upper bound), 97.4% CI = [−0.02, 0.26]. Therefore, there is no sufficient evidence to distinguish between a significant difference and equivalence in mean trait ratings between human-generated and AI-generated sentences.

Table 4.

Descriptive Information for Spontaneous-Trait-Inferencing Sentences Across Sources of Materials

	Trait rating				Valence rating
	Human-generated sentence		AI-generated sentence		Human-generated sentence		AI-generated sentence
Implied trait	M	SD	M	SD	M	SD	M	SD
Caring	6.45	0.72	5.99	0.95	6.58	0.53	6.09	0.92
Smart	6.28	0.81	6.31	0.82	6.33	0.86	6.10	1.00
Helpful	6.49	0.56	6.31	1.03	6.31	0.63	6.21	0.96
Polite	6.51	0.68	6.57	0.56	6.37	0.74	6.45	0.61
Honest	6.34	0.77	6.43	0.87	5.87	1.04	6.10	1.21
Loyal	6.43	0.97	6.42	1.00	6.25	0.97	6.25	1.05
Lazy	5.39	1.19	6.04	1.19	2.28	0.98	1.84	1.08
Selfish	6.07	1.00	5.54	1.08	2.04	1.09	2.64	1.07
Jealous	6.16	1.11	6.34	0.62	1.72	0.73	1.90	0.82
Rude	6.64	1.05	5.64	1.07	1.30	0.94	2.46	1.08
Mean	6.69	0.66	6.36	0.75	1.31	0.80	1.73	0.96
Annoying	6.12	1.37	5.82	0.95	1.87	1.03	2.55	0.93
Considerate	6.46	0.70	6.45	0.58	6.36	0.69	6.24	0.72
Ambitious	6.45	0.76	6.49	0.79	6.00	1.10	6.27	0.75
Friendly	6.16	0.86	6.46	0.66	6.25	0.70	6.39	0.82
Brave	6.43	0.76	6.54	0.68	6.15	1.05	6.16	1.08
Respectful	6.09	1.01	6.55	0.66	5.94	1.00	6.39	0.82
Creative	6.18	0.95	6.36	0.64	6.36	0.75	6.12	0.81
Impolite	5.97	0.97	5.72	1.18	2.03	0.82	2.43	1.03
Nosy	5.73	0.98	5.78	1.11	3.09	0.88	2.96	1.12
Stubborn	5.64	1.04	5.76	1.14	2.93	0.89	2.78	1.20
Bossy	5.58	0.99	5.90	1.12	2.55	0.93	2.30	0.89
Stupid	6.22	1.11	4.04	1.66	1.79	1.32	3.79	0.57
Clumsy	5.22	1.27	5.24	0.99	3.18	1.10	3.81	0.61

Effective sentences based on valence ratings must demonstrate higher ratings for positive traits and lower ratings for negative traits. As shown in Table 4, behaviors described in the sentences were indeed rated as positive when the implied trait was positive and negative when the implied trait was negative irrespective of source. A paired samples t test (NHST) indicated no difference between average valence ratings for human-generated (M = 6.23, SD = 0.56) and AI-generated (M = 6.23, SD = 0.54) sentences for positive traits, t(66) = 0.00, p = 1.00, Hedges’s g = 0.00. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .013) was also nonsignificant, t(66) = 1.23, p = .112 (lower bound), t (66) = −1.23, p = .112 (upper bound), 97.4% CI = [−0.16, 0.16]. Therefore, there is no sufficient evidence to distinguish between a significant difference and equivalence in mean valence ratings for positive traits between human-generated and AI-generated sentences. Another paired samples t test (NHST) indicated that human-generated sentences (M = 2.17, SD = 0.47) performed better than AI-generated sentences (M = 2.60, SD = 0.48) at conveying behaviors that implied negative traits, t(66) = −7.49, p < .001, Hedges’s g = −0.90. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .013) was nonsignificant, t(66) = −6.26, p = 1.00 (lower bound), t(66) = −8.72, p < .001 (upper bound), 97.4% CI = [−0.56, −0.30]. Therefore, there was a meaningful difference in mean valence ratings for negative traits between human-generated and AI-generated sentences favoring the human-generated sentences.

Confidence and ability to distinguish source

Overall confidence in the ability to distinguish between human-generated and AI-generated content was lower than a neutral point (M = 3.25, SD = 1.85). A paired samples t test (NHST) indicated no difference between the ratings of perceived source (human-generated vs. AI-generated) between the human-generated (M = 5.07, SD = 1.00) and AI-generated sentences (M = 5.02, SD = 1.00), t(66) = 0.41, p = .684, Hedges’s g = 0.05. Furthermore, a TOST equivalence test (equivalence bounds = ±.15, α = .013) was also nonsignificant, t(66) = 1.64, p = .053 (lower bound), t(66) = −0.82, p = .208 (upper bound), 97.4% CI = [−0.23 0.33]. Therefore, there is no sufficient evidence to distinguish between a significant difference and equivalence in the ability to discern between human-generated and AI-generated materials.

Our findings provide partial support for our predictions. Our materials were effective regardless of the source (i.e., human-generated vs. AI-generated) of the sentences. In addition, participants lacked the confidence to distinguish between human-generated and AI-generated materials. However, our data were insufficient to properly distinguish between a significant difference and equivalence across multiple comparisons of our materials and participants’ ability to discern between sources. Based on our one interpretable comparison, human-generated materials were superior to AI-generated materials, but this result should be carefully considered.

Part 2

The increased interest in AI tools has generated discussions about the ethics of their use. Nonetheless, these discussions have primarily focused on simply listing and describing various ethical concerns. Even when recommendations to address these ethical concerns are provided, these recommendations are too broad or too narrow. Some recommendations simply encourage researchers to follow AI guidelines and policies without necessarily specifying what those guidelines and policies are and/or where these can be found (Behrend & Landers, 2025; Ghandour et al., 2024; Haque & Li, 2025; Salah, Al Halbusi, & Abdelfattah, 2023; Tawfeeq et al., 2023; van Berlo et al., 2024; Zhou et al., 2024). On the other hand, some recommendations focus on addressing specific ethical concerns (Calderon & Herrera, 2025; Dehbozorgi et al., 2025; Farmer et al., 2025; Haltaufderheide & Ranisch, 2024; Youssef et al., 2025). For instance, a recent systematic review on the ethical discourse of ChatGPT concluded that the use of this AI tool in academic writing (i.e., authorship) is at the forefront of these ethical concerns (Stahl & Eke, 2024). Considering the array of AI tools and their possible applications, it is reasonable that as ethical guidelines and policies are developed, they will be focused on specific uses, such as academic writing. However, it is important to extend our ethical discourse beyond academic writing (Stahl & Eke, 2024). Currently, little to no discussion exists about the ethics of using output generated by this technology in other contexts, such as part of an experimental design. Even less, no guidelines have been proposed for using the generated output ethically. We start with a brief discussion on concerns related to plagiarism and copyright, which are the most closely related to whether and how ChatGPT output can be used. Then, we propose a few recommendations on how to use these AI-generated materials.

Plagiarism

Most definitions of plagiarism involve the accidental or purposeful appropriation of someone else’s work or ideas as one’s own (Merriam-Webster, n.d.). A primary concern in the usage of ChatGPT or similar AI tools and plagiarism is whether the generated text plagiarizes other people’s work or ideas. One major argument related to this concern, specifically to ChatGPT, is that the generated text is a combination of multiple corpora of text to which ChatGPT has been exposed (Brown et al., 2020; Henderson et al., 2023; Vincent, 2022). Therefore, ChatGPT’s generated text is arguably original and does not infringe on anyone’s intellectual property (Meyer et al., 2023). In the education field, for instance, when ChatGPT’s generated text has been submitted to popular plagiarism checkers (e.g., Turnitin), the content is not found to be plagiarized (Gao et al., 2023; Khalil & Er, 2023). When asked, ChatGPT does warn that its generated text can “resemble existing content” and potentially infringe copyrights, our next point of discussion.

Copyright

Intellectual property and creativity in many forms of work (e.g., photographs, music, videos, text) can be protected by copyright laws (U.S. Copyright Office, n.d.). Copyright gives the owners exclusive rights to their work (e.g., distribute copies) and thus limits how others can use this work. If someone exerts usage of copyrighted work without formal permission from the owners, they would be infringing on the owners’ rights over their work. There is concern that AI tools—including ChatGPT—were trained using text that includes copyrighted work (Epstein et al., 2023; Henderson et al., 2023; Sag, 2024). Some discussions related to this particular concern mention “fair use” to justify the potential use of copyrighted work for the training of this technology (e.g., Henderson et al., 2023). Fair use is an exception to copyright laws that allow the limited usage of copyrighted work without legal retributions. In general, four factors are considered when evaluating whether the unlicensed usage of copyrighted work is fair use. According to the U.S. Copyright Office (2023b), these factors are (1) purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes; (2) nature of the copyrighted work; (3) amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) effect of the use on the potential market for or value of the copyrighted work. For a detailed discussion on how training for AI tools meets these criteria, see Henderson et al. (2023). Our primary focus is to discuss how fair use applies to the usage of generated output by ChatGPT and similar AI tools.

Experimental text stimuli created to be used in psychological research with ChatGPT or similar AI tools would most likely meet the requirements detailed in these factors. First, fair use applies when the content is being used for educational and research purposes (Factor 1). Second, fair use applies when the content that is being used is more technical and factual than creative (Factor 2). Experimental stimuli is an important part of scientific research, and it serves as a tool rather than expressing creativity, thus meeting these two first requirements. Third, fair use applies when the amount of copyrighted content that is being used is small (Factor 3). The amount of copyrighted content used is likely to be minimal because ChatGPT’s generated responses are a combination of multiple corporals of text, thus meeting this requirement. Finally, fair use applies when the original work is not negatively affected by the usage of some of its content (Factor 4). The usage of experimental stimuli is limited to scientific research and seen only by a specific audience (i.e., participants, researchers), thus not affecting the marketability of the original work and meeting this last requirement.

Although fair use can potentially legally protect the usage of materials created with ChatGPT and similar AI tools, note that courts decide fair use on a case-by-case basis, and it is dependent on the interpretation of the four factors mentioned above (U.S. Copyright Office, 2023b). The U.S. Copyright Office has created an index with a quick summary of all fair-use cases (https://www.copyright.gov/fair-use/fair-index.html). Closely related to this, guidelines have now been created for the registration of materials generated with AI tools, such as ChatGPT, to be protected by copyright regulations. Materials generated with AI tools are not eligible to be registered, but the changes made to the materials can potentially be protected by copyright regulations (U.S. Copyright Office, 2023a).

Recommendations

The developers of ChatGPT warn that the responsibility to avoid plagiarism and violations of copyright is adopted by the users when they generate text using this tool. To date, there are no detailed guidelines on how to approach this. Consequently, the development of standardized guidelines to facilitate the ethical usage of AI tools while maintaining academic integrity has been encouraged by others (Abdelhafiz et al., 2024; Guleria et al., 2023; Malik et al., 2025; Raman, 2025; Rupp, 2024). Below is a summary of existing guidelines for the usage of ChatGPT and similar AI tools specifically in scientific publications, followed by our recommendations on how to ethically use the text generated by ChatGPT as experimental stimuli. We invite experts in psychology and other fields to join the conversation.

Existing guidelines for the usage of AI tools

Currently, existing guidelines for the usage of ChatGPT and similar AI tools are mostly focused on the role of these tools in the creation and publication of scientific articles. For a summary of the positions taken by multiple professional societies and journals on the usage of AI tools, see Table 5. The consensus among these entities is that AI tools cannot be given authorship (American Geophysical Union [AGU], n.d.; American Psychological Association [APA], 2023; Committee on Publication Ethics [COPE], 2023; “Science Journals: Editorial Policies,”n.d.; “Tools Such as ChatGPT Threaten Transparent Science,”2023; U.S. Copyright Office, 2023a; “World Scientific’s Position Statement on Authorship and AI Tools,”n.d.; Zielinski et al., 2023). The main argument made is that authors commit to taking responsibility for the materials and ideas being presented in the article, which is something that an AI tool cannot do. There is also a consensus that the usage of AI tools in any capacity (e.g., editing writing) should be made transparent in the methods section or any other relevant section (AGU, n.d.; APA, 2023; COPE, 2023; “Science Journals: Editorial Policies,”n.d.; “Tools Such as ChatGPT Threaten Transparent Science,”2023; U.S. Copyright Office, 2023a; “World Scientific’s Position Statement on Authorship and AI Tools,”n.d.; Zielinski et al., 2023). Between these entities, there are organizations specifically responsible for the creation of writing guidelines (e.g., COPE), organizations that are recognized and read by multiple disciplines (e.g., Nature), and organizations that are field-specific (e.g., AGU). Therefore, these guidelines represent the perspective of a wide range of academics regarding the usage of AI tools. These guidelines, however, apply largely to the drift of using AI tools to summarize and generate text for different sections of a scientific article (e.g., literature review). Next, we offer recommendations on how to ethically use the text generated by ChatGPT as experimental stimuli.

Table 5.

Guidelines for the Usage of ChatGPT According to Various Entities

Entity	Authorship	Transparency
American Geophysical Union	No	Yes
American Psychological Association	No	Yes
Committee on Publication Ethics	No	Yes
Federal Register	No	Yes
Nature	No	Yes
Science	No	Yes
World Association of Medical Editors	No	Yes
World Scientific	No	Yes

Proposed guidelines for the usage of experimental stimuli generated by ChatGPT

Referencing/citing

The most effective way to avoid plagiarism is to properly credit the original authors for their work. Unfortunately, ChatGPT cannot disclose its sources because it is considered proprietary knowledge or when it provides its sources, it does it inaccurately. Future iterations of this technology should consider making ChatGPT able to provide accurate references/citations for the corpora of text it uses for its output. Until then, although ChatGPT cannot be credited as an author for the materials it creates, ChatGPT can and should be cited as a tool (see Table 5). Furthermore, researchers should cite any other work that might have been used to inspire the materials created with ChatGPT. For instance, to create the online dating profiles in Example 1, ChatGPT was asked to answer six question prompts. Those question prompts were previously published by Aron et al. (1997). Therefore, Aron et al. should be credited along with ChatGPT as a tool.

Limit the usage of AI tools to low-complexity, low-creativity tasks

One of the main problems with the proposed uses of AI tools is that these tasks (e.g., writing scientific articles) require a level of sophistication that AI tools cannot completely fulfill. For instance, it has been established that ChatGPT’s output can include false and inaccurate information (Abdelhafiz et al., 2024; Guleria et al., 2023). It has also been established that to produce content with the required quality to accomplish specific goals (e.g., create accurate medical cases for educational purposes), human intervention is necessary (Beghetto et al., 2025; Ghaffari et al., 2025). Nonetheless, asking for AI tools to generate materials that are low in complexity and creativity is a better option. Low-complexity, low-creativity tasks refer to tasks that do not demand accuracy, advance problem-solving, or original thinking. For instance, in Example 1, the responses generated to the question prompts are not based on information that can be evaluated as correct or incorrect. Furthermore, these responses are broad descriptions of experiences or thoughts that many people undergo and do not necessarily infringe on other people’s ideas or work. A cautionary note for researchers is that ChatGPT learns from users’ input and this information is considered public knowledge. Therefore, any information included in a prompt is stored and used to produce better responses for that specific user and others. Because this information is stored and potentially shared with others, it is advised against entering information in the prompts or using ChatGPT to generate text materials intended to be copyrighted (e.g., psychometric assessment).

Transparency and accessibility

If materials are created using AI tools, such as ChatGPT, researchers should disclose this information. This is not different than what has been suggested when using AI tools to write sections of a scientific article (see Table 5). Lin (2025) proposed specific guidelines for documenting the usage of AI in research with the three (out of seven) most important attributes being to include the exact version of the AI tool used, indicate which parts were generated or influenced by AI, and specify the prompt entered or additional training data used. The AI-generated materials should also be made accessible for others to review and use. These AI-generated materials can be made accessible to others through personal websites or platforms such as GitHub or OSF, which are platforms that are already being used to increase transparency in psychological research. Furthermore, when posting these materials online, researchers should select a license that provides full access and gives permission to others to use them. Researchers should not claim ownership of output generated by AI tools unless requirements are met and proper steps are taken to copyright the changes made to these materials (U.S. Copyright Office, 2023a). In those cases, researchers can opt to select a license that restricts usage without permission. However, it is highly encouraged that whenever reasonable, no restrictions are placed on AI-generated materials. Researchers who then make use of these available AI-generated materials should cite both the original author and the AI tool (i.e., as cited by original author). Finally, transparency also involves creating awareness among research participants that they will be/have been exposed to AI-generated materials during the informed consent (e.g., “If you agree to be part of this study, you will be exposed to content generated using an artificial intelligence tool”) and/or debriefing process (e.g., “Some [or all] of the content used in this study were generated using an artificial intelligence tool”). Depending on the research design, these suggested statements may be expanded to address any nuances. For instance, if there was a manipulation, the nature of the manipulation can be explained (e.g., “This artificial intelligence tool was used to [describe the nature of the manipulation]”), or if deception was used about being exposed to AI content, then an explanation should be provided (e.g., “The usage of content generated by artificial intelligence was not disclosed to maintain the integrity of this research study. Specifically, [describe the reason why this information was not previously disclosed]”). The debriefing process could also be taken as an opportunity to promote AI literacy by suggesting sources of information or workshops (e.g., Google AI skills program).

Make an intellectual contribution

There are multiple ways in which researchers can add their own intellectual contribution to the materials created using AI tools. An intellectual contribution can be made by creating the criteria used to write the prompt entered in AI to generate a specific set of materials. For instance, in Example 1, deciding which question prompts to include and manipulating the level of self-disclosure is an example of this type of contribution. An intellectual contribution can also be made by editing or building up AI’s generated work. Finally, an intellectual contribution can also be made after AI’s generated work has been edited. For instance, researchers can test the materials to get relevant descriptive information or if the materials are used as a manipulation, that the manipulation is effective. Any of these contributions should be made explicit when writing an article, and products should be made available to others. The usage of AI tools and predetermined intellectual contributions can be included as part of open-science practices (e.g., preregistration) when relevant.

General Discussion

We proposed using ChatGPT to reduce researchers’ cognitive load and time spent creating text materials for psychological studies. As expected, across two studies, we found evidence that AI-generated materials (i.e., created by GTP-4) can successfully manipulate (Study 1) or imply (Study 2) psychological constructs. This is consistent with other work that underscores the ability of ChatGPT to understand psychological constructs as demonstrated by its use to assist in the development of psychological instruments (Beghetto et al., 2025; Schlegel et al., 2025), its ability to predict human emotion (Santavirta et al., 2025), and its ability to generate visual stimuli to show emotions (Lu et al., 2025). Furthermore, we found in our studies that participants lacked the confidence to distinguish between human-generated and AI-generated materials, which also supported our predictions. Similar findings have been noted in prior work that demonstrated that people make an appreciable number of errors when asked to distinguish between human-generated and AI-generated scientific abstracts (Gao et al., 2023). In most cases, our data were insufficient to properly test equivalence between the quality of human-generated and AI-generated materials because most of these results were considered inconclusive. In the cases in which our data were sufficient to test for equivalence, human-generated materials outperformed AI-generated materials. This is consistent with prior work that has also found human-generated materials to have an advantage over AI-generated materials (Grassini & Koivisto, 2025; Schlegel et al., 2025). Yet others have provided support for the superiority of AI-generated materials (Gherheş et al., 2024), and others have provided support for comparable performance (Alzahrani, 2025). Although our data are not well positioned to provide support for our hypotheses on equivalence, our data do provide evidence that irrespective of which set of materials performed better (i.e., human-generated vs. AI-generated), both were effective in achieving their intended purpose. Overall, ChatGPT indeed facilitated the creation of experimental text stimuli for psychological research and replicated the effects (i.e., although not necessarily the strength of the effect) of human-generated stimuli.

Implications

We are one of the first to provide empirical evidence that AI’s generation of human-like text can be used to facilitate the study of psychological phenomena. Although prior work has provided support for this, most of these studies have been focused on linguistic stimuli (Alzahrani, 2025; Duan et al., 2025) or psychometric-item generation (Beghetto et al., 2025; Schlegel et al., 2025). Our work is innovative in showing the utility of AI tools, specifically ChatGPT, in the creation of stimuli for other domains of psychology, such as relationship science and social cognition. There are multiple strengths in this to emphasize. First, it can be useful in speeding up the development of new research studies that advance or fill in the gaps across different fields. Although human expertise is fundamental for the effectiveness and quality of AI-generated content, AI tools facilitate this process by generating content instantly (Beghetto et al., 2025; Ghaffari et al., 2025). Second, it can help avoid the overuse of the same stimuli among psychological studies. For instance, the Chicago Face Database (Ma et al., 2015) is a widely used photography database in many psychological-research studies; the most recent citation count indicated that approximately 2,300 researchers have referenced this database. This is particularly important in recruitment platforms (e.g., MTurk, Prolific) that tend to have a fixed pool of participants who might be continuously being exposed to the same stimuli, which could bias the results of those studies. Finally, carefully developed prompts for AI-generated stimuli can eliminate biases related to having preconceived expectations on what the findings should be when testing specific hypotheses (Tomaino et al., 2025).

We are also one of the first to recommend guidelines for using the output generated by AI tools. There have been efforts put forward to guide researchers in how to best approach developing high-quality experimental stimuli using AI tools (Duan et al., 2025; Li et al., 2025; Lu et al., 2025; van Berlo et al., 2024) and how to integrate AI tools as part of the research-design process (Behrend & Landers, 2025; El-Bassel et al., 2025; Gong et al., 2024; Lehr et al., 2024). However, there is still a need for recommendations on how to move forward once those materials have been created. A lack of guiding principles in using AI tools creates fear and hesitation among researchers in using AI irrespective of recognizing the value of such tools (Abdelhafiz et al., 2024). Likewise, perceived social acceptance of the usage of AI has been associated with a greater intention to use AI tools (Gado et al., 2022). Ease of use is a predictor of willingness to use AI tools among faculty (Cambra-Fierro et al., 2025) and students (Rodriguez-Saavedra et al., 2025). However, the anxiety involved with uncertainty of how to best approach these tools can reduce that perceive ease (Cambra-Fierro et al., 2025; Mohamed et al., 2025). Therefore, the existence of guidelines can decrease the hesitation to use AI and popularize the overall usage of these tools. Given the incremental interest in capitalizing on this technology—paired with a lack of guiding principles to adhere to—our recommendations are valuable in providing structure and direction. Nonetheless, our recommendations are not intended to be definitive.

Limitations and future directions

We have shown ChatGPT’s ability to create effective materials for research studies on different topics (e.g., interpersonal relationships, social cognition). One of the limitations of this article is that we focused on text materials because GPT-4 was best equipped to generate text at the time of preparation for this article. Nonetheless, ChatGPT and other AI tools can generate visual stimuli. The usage of other AI modalities can be capitalized in similar ways to advance psychological research. However, researchers opting to use those tools will have to investigate how to better use them to generate visual materials and the ethics of using the materials generated with those tools. The ethical considerations discussed in this article for ChatGPT are not necessarily applicable to all AI tools or modalities. Future directions include testing the effectiveness of generating stimuli using other modalities and initiating a conversation about ethics specific to those. For instance, van Berlo et al. (2024) recently put forward a summary on best practices to create visual experimental stimuli, and Li et al. (2025) put forward a summary of strategies to create multimodal stimuli. Closely related to this limitation is that our human-generated materials do not represent the “gold standard” (i.e., widely accepted and used by others) in their respective fields. Although these sets of stimuli have been successfully used in multiple replications, their success is still limited to their respective research labs. Future directions include comparing AI-generated materials with human-generated materials that have been thoroughly validated.

Another of the limitations of using AI tools to generate experimental stimuli is that it will generate only what it is asked to generate. AI tools will be helpful to the extent that researchers’ creativity can think of ways to use AI-generated materials to test their research questions and hypotheses. AI–human collaboration is likely to become the norm in the near future. Guidelines on how to approach AI–human collaborations for idea generation (Gong et al., 2024), stimuli generation (Behrend & Landers, 2025; Duan et al., 2025), and research-design implementation (Behrend & Landers, 2025; El-Bassel et al., 2025) have been proposed. However, it is important to consider the nuances in the generation of experimental stimuli. For instance, the level of specificity in the prompt entered to an AI tool to generate experimental stimuli can influence the generalizability of findings by introducing or eliminating biases (Tomaino et al., 2025). Furthermore, depending on the research question, AI-generated content can be perceived as untrustworthy compared with human-generated content, such as when asking for recommendations related to experience-based products (Jin & Zhang, 2025). Finally, the effectiveness of AI-generated stimuli has also been shown to be dependent on the language it was created on (Alzahrani, 2025). Future directions should include identifying the different context in which AI-generated stimuli is the best fit and effective.

Finally, another limitation of using AI tools to generate experimental stimuli is that it is proprietary technology and can have a subscription cost (e.g., GPT-4 subscription cost was approximately $20) that might increase with future iterations. Depending on the cost, it may lead to disparities of access to this technology. Similar concerns have been raised by others who have used AI platforms to develop a clinical-psychology chatbot and fear that the platform will become unavailable or require numerous updates to keep up with iterations of this technology (Siddig & Hines, 2019; Stahl & Eke, 2024). Lack of access to AI technology can be problematic given that others have underscored the importance of integrating this technology into daily tasks (Haque & Li, 2025; Kooli, 2023; Salah, Abdelfattah, et al., 2023). For this reason, developing AI literacy and competency should be objectives integrated into the curriculum of future scholars (Gong et al., 2024; Mohamed et al., 2025; Rupp, 2024). It is to be expected that AI literacy will be a desirable skill in the job market (Rupp, 2024), such as having a strong quantitative background. Future directions should include identifying the best methods to encourage human-AI collaboration and effective methods to teach these skills. Current efforts to address this gap include identifying predictors for intention to use AI tools (Cambra-Fierro et al., 2025; Gado et al., 2022; Rodriguez-Saavedra et al., 2025) and how to decrease AI anxiety (i.e., fear associated with the rapid advancement of AI technologies; Kim et al., 2025; Mohamed et al., 2025).

Conclusion

We proposed in this article how to use ChatGPT for the advancement of psychological research. Specifically, we proposed and provided examples of how to use GPT-4 to reduce the cognitive load and time invested in the creation of materials used in research studies. Furthermore, we made recommendations on how to move forward by using the materials created or inspired by ChatGPT-generated text. We hope to start a thorough discussion in the field about the potential ways in which ChatGPT and similar AI tools can be used to streamline psychological research without risking the quality and integrity of researchers’ work.

Footnotes

Acknowledgements

We gratefully acknowledge Jessica R. Bray for providing the human-generated spontaneous-trait-inference sentences.

Transparency

Action Editor: David A. Sbarra

Editor: David A. Sbarra

Author Contributions

Jacqueline Lechuga: Conceptualization; Formal analysis; Investigation; Methodology; Resources; Visualization; Writing – original draft; Writing – review & editing.

Nakul N. Karle: Conceptualization; Investigation; Writing – original draft; Writing – review & editing.

ORCID iD

Jacqueline Lechuga

Notes

References

Abdelhafiz

A. S.

Ali

Maaly

A. M.

Ziady

H. H.

Sultan

E. A.

Mahgoub

M. A.

(2024). Knowledge, perceptions and attitude of researchers towards using ChatGPT in research. Journal of Medical Systems, 48, Article 26. https://doi.org/10.1007/s10916-024-02044-4

Alzahrani

(2025). The acceptability and validity of AI-generated psycholinguistic stimuli. Heliyon, 11(2), Article e42083. https://doi.org/10.1016/j.heliyon.2025.e42083

American Geophysical Union. (n.d.). AGU publications policies. https://www.agu.org/publish-with-agu/publish/agu-publications-policies?m-search-result=chatgpt

American Psychological Association. (2023). APA publishing policies. https://www.apa.org/pubs/journals/resources/publishing-policies

Aron

E. N.

Smollan

(1992). Inclusion of other in the self scale and the structure of interpersonal closeness. Journal of Personality and Social Psychology, 63(4), 596–612. https://doi.org/10.1037/0022-3514.63.4.596

Aron

Melinat

Aron

E. N.

Vallone

R. D.

Bator

R. J.

(1997). The experimental generation of interpersonal closeness: A procedure and some preliminary findings. Personality and Social Psychology Bulletin, 23(4), 363–377. https://doi.org/10.1177/0146167297234003

Beghetto

R. A.

Ross

Karwowski

Glăveanu

V. P.

(2025). Partnering with AI for instrument development: Possibilities and pitfalls. New Ideas in Psychology, 76, Article 101121. https://doi.org/10.1016/j.newideapsych.2024.101121

Behrend

T. S.

Landers

R. N.

(2025). Participant interactions with artificial intelligence: Using large language models to generate research materials for surveys and experiments. Journal of Business and Psychology, 40, 1275–1297. https://doi.org/10.1007/s10869-025-10035-6

Bray

J. R.

(2019). The influence (or lack thereof) of stereotypes on spontaneous trait inferences. ETD Collection for University of Texas, El Paso. https://scholarworks.utep.edu/dissertations/AAI27671470

10.

Brown

T. B.

Mann

Ryder

Subbiah

Kaplan

Dhariwal

Neelakantan

Shyam

Sastry

Askell

Agarwal

Herbert-Voss

Krueger

Henighan

Child

Ramesh

Ziegler

D. M.

Winter

. . . Amodei

(2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165

11.

Calderon

Herrera

(2025). And Plato met ChatGPT: An ethical reflection on the use of chatbots in scientific research writing, with a particular focus on the social sciences. Humanities and Social Sciences Communications, 12, Article 713. https://doi.org/10.1057/s41599-025-04650-0

12.

Cambra-Fierro

Chabán

Fuentes-Blasco

López-Pérez

Ruz-Mendoza

Trifu

(2025). Exploring faculty adoption of ChatGPT: A stimuli-organism-response approach. Interactive Learning Environments, 33(9), 5580–5598. https://doi.org/10.1080/10494820.2025.2483415

13.

Committee on Publication Ethics. (2023). Authorship and AI tools. COPE position statement. https://publicationethics.org/cope-position-statements/ai-author

14.

De Angelis

Baglivo

Arzilli

Privitera

G. P.

Ferragina

Tozzi

A. E.

Rizzo

. (2023). ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Frontiers in Public Health, 11, Article 1166120. https://doi.org/10.3389/fpubh.2023.1166120

15.

Dehbozorgi

Zangeneh

Khooshab

Nia

D. H.

Hanif

H. R.

Samian

Yousefi

Hashemi

F. H.

Vakili

Jamalimoghadam

Lohrasebi

(2025). The application of artificial intelligence in the field of mental health: A systematic review. BMC Psychiatry, 25(132), 2–20. https://doi.org/10.1186/s12888-025-06483-2

16.

Demszky

Yang

Yeager

D. S.

Bryan

C. J.

Clapper

Chandhok

Eichstaedt

J. C.

Hecht

Jamieson

Johnson

Jones

Krettek-Cobb

Lai

JonesMitchell

Ong

D. C.

Dweck

C. S.

Gross

J. J.

Pennebaker

J. W.

(2023). Using large language models in psychology. Nature Reviews Psychology, 2, 688–701. https://doi.org/10.1038/s44159-023-00241-5

17.

Duan

Xiao

Kan

Cai

Z. G.

(2025). StimMAS: A multi-agent framework for automated linguistic stimulus construction in psychological research. PsyArXiv. https://doi.org/10.31234/osf.io/83zev_v1

18.

El-Bassel

David

Mukherjee

T. I.

Aggarwal

Gilbert

Walters

Chandler

Hunt

Frye

Campbell

Goddard-Eckrich

D. A.

Keyes

Benjamin

S. N.

Balise

Muresan

Aragundi

Chen

Davé

. . . Zheng

(2025). The practical, robust implementation and sustainability (PRISM)-capabilities model for use of artificial intelligence in community-engaged implementation science research. Implementation Science, 20(37), 2–17. https://doi.org/10.1186/s13012-025-01447-2

19.

Epstein

Hertzmann

Akten

Farid

Fjeld

Frank

M. R.

Groh

Herman

Leach

Mahari

Pentland

A. S.

Russakovsky

Schroeder

Smith

(2023). Art and the science of generative AI. Science, 380(6650), 1110–1111. https://doi.org/10.1126/science.adh4451

20.

Farmer

R. L.

Lockwood

A. B.

Goforth

Thomas

(2025). Artificial intelligence in practice: Opportunities, challenges, and ethical considerations. Professional Psychology: Research and Practice, 56(1), 19–32. https://doi.org/10.1037/pro0000595

21.

Fraiwan

Khasawneh

(2023). A review of ChatGPT applications in education, marketing, software engineering, and healthcare: Benefits, drawbacks, and research directions. arXiv. https://doi.org/10.48550/arXiv.2305.00237

22.

Gado

Kempen

Lingelbach

Bipp

(2022). Artificial intelligence in psychology: How can we enable psychology students to accept and use artificial intelligence? Psychology Learning & Teaching, 21(1), 37–56. https://doi.org/10.1177/14757257211037149

23.

Gao

C. A.

Howard

F. M.

Markov

N. S.

Dyer

E. C.

Ramesh

Luo

Pearson

A. T.

(2023). Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. NPJ Digital Medicine, 6, Article 75. https://doi.org/10.1038/s41746-023-00819-6

24.

Ghaffari

Langarizadeh

Nabovati

Sabery

(2025). Effectiveness of ChatGPT for clinical scenario generation: A qualitative study. Archives of Academic Emergency Medicine, 13(1), Article e49. https://doi.org/10.22037/aaemj.v13i1.2690

25.

Ghandour

Woodford

B. J.

Abusaimeh

(2024). Ethical considerations in the use of ChatGPT: An exploration through the lens of five moral dimensions. IEEE Access, 12, 60682–60693. https://doi.org/10.1109/ACCESS.2024.3394243

26.

Gherheş

Fărcaşiu

M. A.

Cernicova-Buca

(2024). Are ChatGPT-generated headlines better attention grabbers than human-authored ones? An assessment of salient features driving engagement with online media. Journalism and Media, 5(4), 1817–1835. https://doi.org/10.3390/journalmedia5040110

27.

Gong

Soomro

S. A.

Wang

Latif

U. K.

Georgiev

G. V.

(2024). Text stimuli created by generative AI in ideation: An exploratory study. In DS 130: Proceedings of NordDesign 2024 (pp. 496–503). The Design Society. https://doi.org/10.35199/NORDDESIGN2024.53

28.

Grassini

Koivisto

(2025). Artificial creativity? Evaluating AI against human performance in creative interpretation of visual stimuli. International Journal of Human–Computer Interaction, 41(7), 4037–4048. https://doi.org/10.1080/10447318.2024.2345430

29.

Guleria

Krishan

Sharma

Kanchan

(2023). ChatGPT: Ethical concerns and challenges in academics and research. Journal of Infection in Developing Countries, 17(9), 1292–1299. https://doi.org//10.3855/jidc.18738

30.

Haltaufderheide

Ranisch

(2024). The ethics of ChatGPT in medicine and healthcare: A systematic review on large language models (LLMs). NPJ Digital Medicine, 7, Article 183. https://doi.org/10.1038/s41746-024-01157-x

31.

Haque

M. A.

(2025). Exploring ChatGPT and its impact on society. AI and Ethics, 5(2), 791–803. https://doi.org/10.1007/s43681-024-00435-4

32.

Henderson

Jurafsky

Hashimoto

Lemley

M. A.

Liang

(2023). Foundation models and fair use. arXiv. https://doi.org/10.48550/arXiv.2303.15715

33.

Hua

S. Y.

Jin

S. C.

Jiang

S. Y.

(2024). The limitations and ethical considerations of ChatGPT. Data Intelligence, 6(1), 201–239. https://doi.org/10.1162/dint_a_00243

34.

IBM. (2023). What are large language modes (LLMs)?https://www.ibm.com/think/topics/large-language-models

35.

Jin

Zhang

(2025). Artificial intelligence or human: When and why consumers prefer AI recommendations. Information Technology & People, 38(1), 279–303. https://doi.org/10.1108/ITP-01-2023-0022

36.

Tong

Cheng

Peng

(2024). Exploring the frontiers of LLMs in psychological applications: A comprehensive review. Artificial Intelligence Review, 58, Article 305. https://doi.org/10.1007/s10462-025-11297-5

37.

Khalil

(2023). Will ChatGPT get you caught? Rethinking of plagiarism detection. arXiv. https://doi.org/10.48550/arXiv.2302.04335

38.

Khowaja

S. A.

Khuwaja

Dev

Wang

Nkenyereye

(2024). ChatGPT needs SPADE (sustainability, privacy, digital divide, and ethics) evaluation: A review. Cognitive Computation, 16(5), 2528–2550. https://doi.org/10.1007/s12559-024-10285-1

39.

Kim

J. J. H.

Soh

Kadkol

Solomon

Yeh

Srivatsa

A. V.

Nahass

G. R.

Choi

J. Y.

Lee

Nyugen

Ajilore

(2025). AI anxiety: A comprehensive analysis of psychological factors and interventions. AI and Ethics, 5, 3393–4009. https://doi.org/10.1007/s43681-025-00686-9

40.

Kooli

(2023). Chatbots in education and research: A critical examination of ethical implications and solutions. Sustainability, 15(7), Article 5614. https://doi.org/10.3390/su15075614

41.

Lakens

Caldwell

(2025). TOSTER: Two one-sided tests (TOST) equivalence testing. R package version 0.8.6 [Software]. CRAN. https://cran.r-project.org/package=TOSTER

42.

Lakens

Scheel

A. M.

Isager

P. M.

(2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963

43.

LeCun

Bengio

Hinton

(2015). Deep learning. Nature, 521, 436–444. https://doi.org/10.1038/nature14539

44.

Lehr

S. A.

Caliskan

Liyanage

Banaji

M. R.

(2024). ChatGPT as research scientist: Probing GPT’s capabilities as a research librarian, research ethicist, data generator, and data predictor. Proceedings of the National Academy of Sciences of the United States of America, 121(35), Article e2404328121. https://doi.org/10.1073/pnas.2404328121

45.

Chen

Yao

Zhang

Liu

Sun

(2025). Investigating intelligent generation of multimodal creative stimuli in conceptual design: Strategies and implications. Journal of Engineering Design. Advance online publication. https://doi.org/10.1080/09544828.2025.2527517

46.

Lin

(2024). How to write effective prompts for large language models. Nature Human Behaviour, 8(4), 611–615. https://doi.org/10.1038/s41562-024-01847-2

47.

Lin

(2025). Beyond principlism: Practical strategies for ethical AI use in research practices. AI and Ethics, 5, 2719–2731. https://doi.org/10.1007/s43681-024-00585-5

48.

Lovakov

Agadullina

E. R.

(2021). Empirically derived guidelines for effect size interpretation in social psychology. European Journal of Social Psychology, 51(3), 485–504. https://doi.org/10.1002/ejsp.2752

49.

Zhang

(2025). A prompt engineering method for generating emotional images for psychological research. Affective Science, 6, 548–561. https://doi.org/10.1007/s42761-025-00315-y

50.

Lund

B. D.

Wang

Mannuru

N. R.

Nie

Shimray

Wang

(2023). ChatGPT and a new academic reality: Artificial intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74(5), 570–581. https://doi.org/10.1002/asi.24750

51.

D. S.

Correll

Wittenbrink

(2015). The Chicago face database: A free stimulus set of faces and norming data. Behavioral Research Methods, 47, 1122–1135. https://doi.org/10.3758/s13428-014-0532-5

52.

Malik

Khan

M. L.

Hussain

Qadir

Tarhini

(2025). AI in higher education: Unveiling academicians’ perspectives on teaching, research, and ethics in the age of ChatGPT. Interactive Learning Environments, 33(3), 2390–2406. https://doi.org/10.1080/10494820.2024.2409407

53.

Merriam-Webster. (n.d.). Plagiarize. Merriam-Webster.com dictionary. https://www.merriam-webster.com/dictionary/plagiarize

54.

Meyer

J. G.

Urbanowicz

R. J.

Martin

P. C. N.

O’Connor

Peng

P.-C.

Bright

T. J.

Tatonetti

Won

K. J.

Gonzalez-Hernandez

Moore

J. H.

(2023). ChatGPT and large language models in academia: Opportunities and challenges. BioData Mining, 16, Article 20. https://doi.org/10.1186/s13040-023-00339-9

55.

Mohamed

M. G.

Goktas

Khalaf

S. A.

Kucukkaya

Al-Faouri

Seleem

E. A. E. S.

Ibraheem

Abdelhafez

A. M.

Abdullah

S. O.

Zaki

H. N.

Nashwan

A. J.

(2025). Generative artificial intelligence acceptance, anxiety, and behavioral intention in the Middle East: A TAM-based structural equation modelling approach. BMC Nursing, 24, Article 703. https://doi.org/10.1186/s12912-025-03436-8

56.

OpenAI. (2024). ChatGPT-4 [Large language model]. https://chat.openai.com/

57.

Raman

(2025). Transparency in research: An analysis of ChatGPT usage acknowledgment by authors across disciplines and geographies. Accountability in Research, 32(3), 277–298. https://doi.org/10.1080/08989621.2023.2273377

58.

Rodriguez-Saavedra

M. O.

Barrientos-Alfaro

A. R.

Málaga-Dávila

C. P.

Chávez-Quiroz

F. G.

Arguedas-Catasi

R. W.

(2025). Ethics and biases in the use of ChatGPT for academic research. International Journal of Innovative Research and Scientific Studies, 8(1), 874–885. https://doi.org/10.53894/ijirss.v8i1.4432

59.

Rupp

A. A.

(2024). Commentary: Modernizing educational assessment training for changing job markets. Educational Measurement: Issues and Practice, 43(3), 33–38. https://doi.org/10.1111/emip.12629

60.

Sag

(2024). Fairness and fair use in generative AI. Fordham Law Review, 92, 1887–1921.

61.

Salah

Abdelfattah

Al Halbusi

Mohammed

(2023). Beyond the “death of research”: Reimagining the human-AI collaboration in scientific research. Changing Societies & Personalities, 7(4), 31–46, https://doi.org/10.15826/csp.2023.7.4.250

62.

Salah

Al Halbusi

Abdelfattah

(2023). Maythe force of text data analysis be with you: Unleashing the power of generative AI for social psychology research. Computers in Human Behavior: Artificial Humans, 1(2), Article 100006. https://doi.org/10.1016/j.chbah.2023.100006

63.

Santavirta

Suominen

Sander

Nummenmaa

(2025). GPT-4 accurately predicts human emotions and their neural correlates. bioRxiv. https://doi.org/10.1101/2025.09.18.677029

64.

Schlegel

Sommer

N. R.

Mortillaro

(2025). Large language models are proficient in solving and creating emotional intelligence tests. Communications Psychology, 3, Article 80. https://doi.org/10.1038/s44271-025-00258-x

65.

Science journals: Editorial policies. (n.d.). Science. https://www.science.org/content/page/science-journals-editorial-policies?adobe_mc=MCMID%3D52769740602472911183652482957593593268%7CMCORGID%3D242B6472541199F70A4C98A6%2540AdobeOrg%7CTS%3D1696906983#authorship

66.

Siddig

A. A.

Hines

(2019). A psychologist chatbot developing experience. In Irish Conference on Artificial Intelligence and Cognitive Science. https://ceur-ws.org/Vol-2563/aics_20.pdf

67.

Stahl

B. C.

Eke

(2024). The ethics of ChatGPT - Exploring the ethical issues of an emerging technology. International Journal of Information Management, 74, Article 102700. https://doi.org/10.1016/j.ijinfomgt.2023.102700

68.

Tawfeeq

T. M.

Awqati

A. J.

Jasim

Y. A.

(2023). The ethical implications of ChatGPT AI chatbot: A review. Journal of Modern Computing and Engineering Research, 2023, 49–57. https://jmcer.org/research/the-ethical-implications-of-chatgpt-ai-chatbot-a-review/

69.

Tomaino

Mazar

Carmon

Wertenbroch

(2025). A simple method for improving generalizability in behavioral science: Scope testing with AI-generated stimuli (STAGS). Consumer Psychology Review, 8(1), 87–97. https://doi.org/10.1002/arcp.1101

70.

Tools such as ChatGPT threaten transparent science; here are our ground rules for their use [Editorial]. (2023). Nature. https://www.nature.com/articles/d41586-023-00191-1

71.

U.S. Copyright Office. (n.d.). What is copyright?https://www.copyright.gov/what-is-copyright/

72.

U.S. Copyright Office. (2023a). Copyright registration guidance: Works containing material generated by artificial intelligence. Federal Register. https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence

73.

U.S. Copyright Office. (2023b). U.S. Copyright Office fair use index. https://www.copyright.gov/fair-use/

74.

van Berlo

Z. M. C.

Campbell

Voorveld

H. A. M

. (2024). The MADE framework: Best practices for creating effective experimental stimuli using generative AI. Journal of Advertising, 53(5), 732–753. https://doi.org/10.1080/00913367.2024.2397777

75.

Vincent

(2022, December 5). AI-generated answers temporarily banned on coding Q&A site stack overflow. The Verge. https://www.theverge.com/2022/12/5/23493932/chatgpt-ai-generated-answers-temporarily-banned-stack-overflow-llms-dangers

76.

Willroth

E. C.

Atherton

O. E.

(2024). Best laid plans: A guide to reporting preregistration deviations. Advances in Methods and Practices in Psychological Science, 7(1). https://doi.org/10.1177/25152459231213802

77.

World Scientific’s position statement on authorship and AI tools. (n.d.). World Scientific. https://www.worldscientific.com/authors/position-statements/ai-tools

78.

Duan

(2024). Unveiling security, privacy, and ethical concerns of ChatGPT. Journal of Information and Intelligence, 2(2), 102–115. https://doi.org/10.1016/j.jiixd.2023.10.007

79.

Yenduri

Ramalingam

Selvi

G. C.

Supriya

Srivastava

Maddikunta

P. K.

Raj

G. D.

Jhaveri

R. H.

Prabadevi

Wang

Vasilakos

A. V.

Gadekallu

T. R.

(2024). GPT (generative pre-trained transformer) – A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access, 12, 54608–54649. https://doi.org/10.1109/ACCESS.2024.3389497

80.

Youssef

E. R.

Meriem

Ait-Lemqeddem

(2025). ChatGPT and ethics in healthcare facilities: An overview and innovations in technical efficiency analysis. AI and Ethics, 5, 3479–3496. https://doi.org/10.1007/s43681-025-00672-1

81.

Zhou

Müller

Holzinger

Chen

(2024). Ethical ChatGPT: Concerns, challenges, and commandments. Electronics, 13(17), Article 3417. https://doi.org/10.3390/electronics13173417

82.

Zielinski

Winker

M. A.

Aggarwal

Ferris

L. E.

Heinemann

Lapeña

J. F.

Pai

S. A.

Ing

Citrome

Alam

Voight

Habibzadeh

(2023). Chatbots, generative AI, and scholarly manuscripts. WAME recommendations on chatbots and generative artificial intelligence in relation to scholarly publications. World Association of Medical Editors. https://wame.org/page3.php?id=106

Generating Experimental Text Stimuli for Psychological Research Using ChatGPT

Abstract

Keywords

Part 1

Prompt tuning

Example 1

Description of research study

Method

Inclusion-of-other-in-the-self manipulation

Procedure

Results and discussion

Data analytical plan

Human-generated online dating profiles

AI-generated online dating profiles

Comparing human-generated and AI-generated online dating profiles

Confidence and ability to distinguish source

Example 2

Description of research study

Method

Spontaneous-trait-inference sentences

Procedure

Results and discussion

Data analytical plan

Trait-agreement and -valence ratings

Confidence and ability to distinguish source

Part 2

Plagiarism

Copyright

Recommendations

Existing guidelines for the usage of AI tools

Proposed guidelines for the usage of experimental stimuli generated by ChatGPT

Referencing/citing

Limit the usage of AI tools to low-complexity, low-creativity tasks

Transparency and accessibility

Make an intellectual contribution

General Discussion

Implications

Limitations and future directions

Conclusion

Footnotes

Acknowledgements

Transparency

ORCID iD

Notes

References