Sage Journals: Discover world-class research

Abstract

Creative problem solving (CPS), a cognitive and affective system, generates innovative solutions. The integration of generative artificial intelligence (AI) tools, such as ChatGPT, with CPS processes has transformed the field, particularly through prompt engineering. Despite the proliferation of generative AI and the prompts that go with it, previous research has only defined their effectiveness and explored their implementation. This study is the first to investigate the influence of prompt elements on CPS performance and the user experience, as measured by user satisfaction and emotional arousal, in interactions with generative AI. We measured five prompt elements: contextual assignment, number of prompts, assigned persona, output format designation, and critical attitude. Data from a ChatGPT prompting contest at a South Korean university were analyzed using mixed-methods to examine the impact of these elements on problem-solving performance and the user experience. The results show that contextual assignments and the number of prompts positively affect CPS performance, whereas assigned personas influence user satisfaction and dissatisfaction. Additionally, the satisfied user group performs CPS at a higher level than that of the dissatisfied group. These findings provide insights into the use of ChatGPT prompts to improve CPS performance and facilitate human–AI collaboration.

Keywords

prompt elements creative problem solving ChatGPT user emotion user satisfaction

Introduction

Since Osborn’s (1963) seminal work, creative problem solving (CPS) has been widely studied as a structured approach for fostering innovative thinking. CPS, a comprehensive cognitive and affective system, deliberately stimulates creative thinking by leveraging natural creative processes, ultimately leading to novel solutions and changes (Puccio et al., 2010). The benefits of CPS have been recognized in the education, industrial, and government sectors, with evidence suggesting that CPS can enhance organizational performance. Recently, workplace emphasis on CPS has intensified in response to the growing complexity and uncertainty in business, locally and globally (World Economic Forum [WEF], 2020).

The rapid adoption of generative artificial intelligence (GAI) tools, particularly large language models (LLMs), such as ChatGPT, has introduced new dynamics into the CPS process (Dwivedi & Banerjee, 2024). These tools are increasingly used for information retrieval to support complex cognitive tasks such as ideation, reflection, and co-construction. They actively assist users in key CPS stages, including idea generation and problem framing, and often engage in iterative interactions to expand creative possibilities (Cress & Kimmerle, 2023; Urban et al., 2024). As GAI becomes a participant in open-ended tasks, it challenges the traditional notion of CPS and provides new opportunities for human–AI collaboration (Memmert & Bittner, 2022). For example, GAI tools are already being adopted in the educational, creative, and research domains, in which CPS-like tasks are performed (Adiguzel et al., 2023; Kasneci et al., 2023; Lo, 2023). Moreover, GAI tools can theoretically support different stages of the CPS process by offering diverse perspectives, generating alternatives, and enabling iterative refinement (Cress & Kimmerle, 2023; Kasneci et al., 2023). Vernon et al. (2016) emphasized the importance of matching appropriate tools to each CPS stage to improve learning outcomes and execution.

Recent research has emphasized the role of prompts in shaping the interaction between users and GAI tools. Prompts do not merely function as instructions to trigger AI responses, but serve as cognitive and affective mediators that articulate user intent and influence the depth and direction of problem-solving (Henrickson & Meroño-Peñuela, 2023; Oppenlaender, 2023; Short & Short, 2023; Wang et al., 2024; Zamfirescu-Pereira et al., 2023). Therefore, prompts may be technically functional and procedurally significant, particularly within the CPS process. Previous studies have suggested guidelines for composing prompts; however, empirical evidence on the effect of prompt characteristics on problem-solving performance is scant (Clayton, 2023; Cook, 2023). Furthermore, existing approaches to prompts are often overly standardized and fail to account for the diversity of user goals and contexts (Gupta et al., 2024; Han & Choi, 2023). Such rigid practices risk simplifying complex user–GAI interactions and limiting the problem-solving potential of users and technology.

To address this gap, this study investigated how specific prompt elements—contextual assignment, number of prompts, assigned persona, output format designation, and critical attitude—affect CPS performance and user experience in problem-solving tasks facilitated by ChatGPT. These elements were selected based on previous studies on structured prompt designs (Clayton, 2023; Cook, 2023; Reynolds & McDonell, 2021; Velásquez-Henao et al., 2023; White et al., 2023). Data were collected from a ChatGPT prompting contest held at a South Korean university in which undergraduate participants engaged in open-ended CPS tasks under time and goal constraints. In this high-engagement, user-driven environment, participants were free to construct, revise, and optimize their prompts in response to the unfolding task requirements.

This research article contributes to the field of human–AI collaboration by offering a focused analysis of prompt engineering within the context of CPS. By examining the influence of prompting strategies on performance and emotional engagement, this paper moves beyond guideline-based approaches and provides insights into the dynamics of user–GAI interaction. The results of this study offer implications for fostering user-centered and context-sensitive CPS practices in GAI environments.

Literature Review and Research Questions

ChatGPT and Prompt Engineering

Effectively leveraging ChatGPT in educational contexts or workplaces depends on the strategic use of prompts (Dai et al., 2023; Reynolds & McDonell, 2021). They are carefully crafted guides that steer the chatbot’s dialogue toward achieving specific objectives and handling complex problem-solving tasks. Prompt engineering, a sophisticated conversational strategy that goes beyond guiding responses (Henrickson & Meroño-Peñuela, 2023; Short & Short, 2023; Wang et al., 2024), establishes a conversational context by setting specific rules and guidelines, influencing how the dialogue unfolds (White et al., 2023). Designing and refining prompts to mirror user intentions accurately is crucial, particularly for improving student engagement and learning outcomes in educational environments and enhancing work performance in workplaces.

Ongoing academic research spanning diverse applications of GAI has highlighted the significance of meticulously developed prompts (Dai et al., 2023; Han et al., 2022; Liu & Chilton, 2022). These range from structured prompts in image generation (Liu & Chilton, 2022) to document categorization improvements (Han et al., 2022), to enhanced reasoning skills via chain-of-thought prompting (Dai et al., 2023). The evolution of this field has led to the development of new methodologies and frameworks that aim to refine interactions with language models and create adaptable prompt patterns for various domains (Lo, 2023; White et al., 2023).

OpenAI has outlined a methodology for optimizing outcomes from ChatGPT, providing specific suggestions for formulating prompts that can serve as criteria for delineating the characteristics of proficient prompts (OpenAI, 2023). Strategies, such as writing clear instructions, providing reference text, and splitting complex tasks into simpler subtasks (OpenAI, 2023), are instructional aids for utilizing ChatGPT and benchmarks for appraising prompts. These strategies highlight specific aspects of prompting—adopting a persona, methodically delineating problem-solving steps (quantifying prompts), and specifying response methodologies such as summarization or enumeration. Ouyang et al. (2022) identified toxicity, bias, and honesty as primary evaluation parameters in a natural language processing experiment using ChatGPT. They suggested that, to be effective, prompts should yield ethical and impartial outcomes. Moreover, providing additional context or background and delineating the conversational scope is imperative when eliciting the desired response by posing queries to ChatGPT . These supplementary elements can be facilitated using prompt formulas and templates. Clayton (2023) introduced the concept of a “power prompt” that amalgamates diverse prompt formulas. Power prompts advocate integrating personas, roles, activities, and methods, judiciously applying suitable methodologies and expertise aligned with the objective, and fostering coherence and synergy to address problems effectively. Furthermore, to enable non-expert users to formulate effective prompts, the behaviors, intuitions, preferences, and capabilities of users should be considered in prompt engineering (Zamfirescu-Pereira et al., 2023).

We selected the following criteria reported in previous studies as suggestions for prompt engineering: query, user activity, and user attitude toward ChatGPT. Queries are formed using contextual or situational information to obtain the desired responses from ChatGPT (Cook, 2023; Gewirtz, 2023). Users can assign ChatGPT a role and request responses based on the assigned personas (Clayton, 2023). Although several studies have indicated that persona settings do not improve ChatGPT results (Y. Chen, Wong, et al., 2023; Gupta et al., 2024), we included persona assignments to verify the extent of their impact. Additionally, users can design formats of output (Clayton, 2023). User activity refers to the number of times a user interacts with ChatGPT. Finally, a user’s attitude toward ChatGPT indicates how critically the user accepts the answers provided by ChatGPT or controls ChatGPT to obtain the desired results (Cook, 2023; Ouyang et al., 2022).

CPS and ChatGPT

CPS involves the effective integration of problem understanding, contextual awareness, critical thinking, domain knowledge, and the application of appropriate processes. Osborn (1963) originally conceptualized the CPS process as comprising three key stages: fact, idea, and solution finding. Isaksen and Treffinger (1985) refined these stages to understanding problems, generating ideas, and planning actions. Subsequent researchers further developed this model (e.g., Puccio et al., 2005). Vernon et al. (2016) reviewed tools applied in different stages of CPS and noted that some positively influence problem definition and ideation. However, they also observed that empirical support for tools related to problem construction and evaluation remains limited, warranting further research.

CPS provides a structured yet flexible framework that emphasizes iterative ideation and the balance between divergent and convergent thinking (Puccio et al., 2005; Vernon et al., 2016), which aligns well with the unfolding of prompt-based interactions with generative AI tools. White et al. (2023) and Oppenlaender (2023) show that prompt engineering shapes the flow of interaction and that iterative experimentation and refinement significantly affect the quality and functionality of GAI outputs. These parallels offer a compelling basis for considering a CPS perspective in examining prompt-based human–AI interactions in this study.

Recently, the CPS domain has entered a new phase with the advent of GAI, particularly LLMs, which are increasingly integrated into creative and educational contexts (Adiguzel et al., 2023; Kasneci et al., 2023). GAI tools assist users in problem solving by providing new ideas and divergent perspectives (Rick et al., 2023). Although the theoretical connection between CPS and GAI is developing, recent research has provided promising insights into their interaction. For example, GAI can perform on par with humans in certain problem-solving contexts (Orrù et al., 2023), and human–AI collaboration often yields more practical and impactful solutions, especially when humans iteratively refine AI-generated ideas (Boussioux et al., 2024; Memmert & Bittner, 2022). Moreover, effective outcomes are achieved through complementary collaboration, in which humans contribute to originality and critical evaluation, whereas GAI provides diverse perspectives and practical alternatives (Urban et al., 2024).

Several studies on education have examined the implementation and impact of GAI on CPS. Students using ChatGPT for programming instruction significantly improve their computational thinking skills (Yilmaz et al., 2023). Tools such as ChatGPT and BingChat enhance reflective thinking and concept comprehension in STEM learning settings (Vasconcelos & Santos, 2023). Thus, GAI significantly affects domains in which complex problem-solving is essential. As these examples show, recent studies have begun to explore how the structure and characteristics of human–AI interactions, particularly prompt design, may influence the quality and outcomes of CPS tasks (Cress & Kimmerle, 2023; Oppenlaender, 2023).

Despite this growing body of research on the interaction between CPS and GAI, a critical gap in the literature remains regarding empirical studies that examine how prompting—the key mechanism shaping human–AI collaboration—influences CPS performance (Cook, 2023; Reynolds & McDonell, 2021; White et al., 2023). To address this gap, this study used ChatGPT as a CPS tool and analyzed the effects of specific prompt elements on CPS performance and user experience. It employed the approach of Mumford et al. (1997), which measures CPS engagement by having participants complete problem-solving tasks. Therefore, we propose the first research question:

RQ1. Which elements of prompting are associated with higher CPS performances?

User Satisfaction and Emotions

Emotions are crucial for understanding the interactions of users with their external environment and involve an intricate combination of contextual factors that induce affective experiences (Dubé & Menon, 2000). Users’ emotional responses have been characterized as psychological phenomena (Thüring & Mahlke, 2007) with two dimensions: valence and arousal (Lang et al., 1993). Valence is the extent to which positive (toward) or negative (away from) emotions are elicited by a stimulus. Arousal is the level of emotional activation induced by a stimulus. These two dimensions are orthogonal, indicating their independence (Lang et al., 1993). This study focused on emotional arousal because previous studies have not explored it sufficiently compared with emotional valence (Zsidó, 2024). As emotions generated during interactions with information systems can affect user satisfaction, they have been widely adopted in assessing information system effectiveness. Moreover, as satisfied users are more likely to continue using these systems (Tsai et al., 2014), several studies have highlighted the importance of satisfaction. It can be defined as the affective and cognitive evaluations users obtain from a pleasant experience of using information systems (Au et al., 2002); we adopted this definition for using ChatGPT in this study.

Prompting with ChatGPT is similar to communication involvement (Muncy & Hunt, 1984), which is associated with user information processing (e.g., information search) and is likely to result in goal-directed behavior (Hoffman & Novak, 1996). A higher level of communication involvement stimulates users to concentrate more on the information presented to them, resulting in satisfaction with their information-seeking activities (Mahmood et al., 2000; Santosa et al., 2005). Similarly, prompting involves a high level of interaction and concentration to seek the appropriate information, leading to our second research question:

RQ2. Which elements of prompting are associated with higher user satisfaction?

In this study, emotional arousal—the degree of feelings activated by a user—was limited to the users’ interactions with ChatGPT. Users assess the appropriateness of the ChatGPT output based on their objectives, which may or may not trigger user emotions. Thus, we arrive at our third research question:

RQ3. Which elements of prompting are associated with higher emotional arousal in users?

Interactions with applications during the search for information and the achievement specific goals are known to affect user satisfaction (Mahmood et al., 2000; Santosa et al., 2005). When searching, users have pleasant experiences and are satisfied when they receive answers that exceed their expectations or results that satisfy their goals. Applying these research results to interactions with ChatGPT, we devised our fourth research question:

RQ4. Does the user group that is satisfied with the prompt outputs exhibit better CPS performance than the dissatisfied group?

Research Methods

Sample and Data Collection

To explore the relationship between prompt engineering and CPS performance, we analyzed the data collected from a ChatGPT prompting contest held at a university in South Korea in May 2023. Participants included students with a diverse range of majors, including engineering (n = 16, 25.8%), software (n = 14, 22.58%), business and economics (n = 12, 19.36%), social sciences (n = 7, 11.29%), humanities (n = 6, 9.67%), natural sciences (n = 3, 4.84%), pharmacy (n = 2, 3.23%), and education (n = 2, 3.23%).

All participants were current students. The distribution of admission years was fairly even: admitted before 2017 (n = 6, 10.17%), in 2018 (n = 11, 18.64%), in 2019 (n = 10, 16.95%), in 2020 (n = 11, 18.64%), in 2021 (n = 6, 10.17%), in 2022 (n = 10, 16.95%), and in 2023 (n = 5, 8.27%). This study included 29 males (49.15%) and 30 females (50.85%). The age groups included teenagers (n = 1; 1.69%), individuals in their thirties (n = 1; 1.69%), and those in their twenties (n = 57; 96.62%). Although the participants had experience using ChatGPT, almost none used it for school assignments. They were tasked with solving problems presented in the competition using ChatGPT. Table 1 lists the two test tasks. Task 1 involved writing a trip report based on information, and Task 2 involved writing a fairy tale that required a unique composition and narration. The participants interacted with ChatGPT to complete tasks within a specified timeframe and submitted a link to the final output without further editing.

Table 1.

Description of the Tasks Provided in the ChatGPT Prompting Contest.

Task	Category	Instruction
Task 1	Trip report	You have returned from a 2-night, 3-day cultural heritage tour in Gyeongju and intend to write a report related to this visit. Please complete the report using ChatGPT prompting. Set the itinerary yourself.
Task 2	Fairy tale writing	Write a fairy tale using ChatGPT prompting that could help a child experiencing slight anxiety or fear about anything.
Guideline	Participants are free to create prompts in various formats and styles. Any number of prompts can be used during the interaction. ChatGPT-3.5 and ChatGPT-4 models are available for use. Prompts may be written in Korean and English or other foreign languages. The evaluation will focus on the output generated by ChatGPT. Submissions will be assessed based on how effectively the model is used, including the creativity and quality of the prompt design. No specific formatting is necessary for the final answer. However, the final submission must be written in Korean and limited to 1,000 characters, including spaces.

Design of the Research Method

We employed a mixed-methods approach (qualitative and quantitative) that can address exploratory inquiries and validate research objectives, fostering the emergence of new theoretical perspectives (Cheng et al., 2022; Venkatesh et al., 2016). Qualitative analysis ensures effective research that develops an understanding of a phenomenon and constructs propositions (Hua et al., 2020). Conversely, quantitative studies facilitate theoretical testing and causal confirmation (Hua et al., 2020; Venkatesh et al., 2013). Figure 1 illustrates the overall structure and process of this research methodology.

Figure 1.

Mixed-Method Approach Employed in This Study.

First, based on text-based qualitative analysis, prompting elements (e.g., number of contextual assignments, prompts, assigned persona, assigned output format, and critical attitude) were refined for clarity. Subsequently, these elements, along with measures of emotional arousal and satisfaction, were systematically coded according to established construct scales. Second, we employed STATA17 to test the research model using hierarchical regression analysis for RQ1, multinomial logistic regression (MNLogit) analysis for RQ2, binomial logistic regression analysis for RQ3, and analysis of variance (ANOVA) for RQ4.

Development of Measurements

The evaluation criteria and corresponding scale were established to analyze quantitatively the prompts collected from the contest, shown in Table 2.

Table 2.

Measurement and Their Criteria.

Constructs	Measurement item	Scale	Criteria	Description	Adapted from
CPS performance	Score (score)	0–5	Prompt Clarity	To what extent the prompt clearly reflects the user’s problem-solving intent and task context	(Dean et al., 2006; Osborn, 1963; Weber et al., 2023)
		0–5	Prompt Originality	To what extent the prompt reflects the user’s individuality and experience to encourage a non-generic, creative output
		0–5	Prompt Refinement	To what extent the user critically evaluates the AI output and revises prompts to improve reliability and completeness
		0–5	Output Creativity	To what extent the final output is creative, feasible, contextually appropriate, and specific	(Dean et al., 2006; Weber et al., 2023)
Elements of prompts	Number of contextual assignments (context)	0–5	Number of context elements in prompting	The level at which specific and additional situational elements are included in the prompt	(Clayton, 2023; Cook, 2023; Reynolds & McDonell, 2021; Velásquez-Henao et al., 2023; White et al., 2023)
	Number of prompts (prompt)	1–19	Number of questions and answers within prompting	The number of prompts input to arrive at the result
	Assigned persona (persona)	0, 1	Yes or no	Whether a persona is explicitly defined
	Output format designation (format)	0, 1	Yes or no	Whether the format of the report (tables, paragraph numbers, and number of characters) should be presented
	Critical attitude (critic)	0, 1	Yes or no	Whether the subsequent prompt is gradually structured to be specified or developed into a new situation based on the response of ChatGPT
Satisfaction		0, 1, 2	0 – neutral 1 – satisfied 2 – dissatisfied	Satisfaction levels of the participants toward the ChatGPT prompting contest
Emotional arousal		0, 1	0 – no arousal1 – arousal	Whether the participants express their emotions toward the ChatGPT prompting contest

Scores representing the participants’ CPS performance were assessed through contest evaluations conducted by experts in liberal arts education, as recommended by the competition organizers. The evaluation was grounded in three foundational frameworks—the three-step CPS model (Osborn, 1963), the four dimensions of evaluating ideas (novelty, feasibility, relevance, and specificity; Dean et al., 2006), and the experimental framework (Weber et al., 2023)—which analyzed the effects of domain knowledge and prompt engineering strategies on the quality of creative outputs. Based on these foundations, the evaluation rubric included four key criteria—prompt clarity, prompt originality, prompt refinement, and output creativity—each of which corresponds to a critical phase in the CPS process.

Output creativity was designed to capture the final outcome of the process and to assess the creative value of the output qualitatively in terms of originality, feasibility, contextual appropriateness, and specificity (Dean et al., 2006). This criterion was included as a separate dimension to address a key insight from recent research: a well-crafted prompt does not always guarantee creative or valuable output, especially in human–AI collaboration settings (Memmert et al., 2024). In our contest, the participants could not directly revise the AI-generated output; the final results were entirely determined by their prompts. Thus, the output creativity score reflects the participant’s ability to guide and refine the AI behavior and the creative quality of the resulting output. By independently evaluating output creativity (Zhang et al., 2025), the assessment structure acknowledges the complex and sometimes unpredictable relationship between user strategies and AI-generated results. This approach enables a more nuanced interpretation of user–AI interaction throughout the CPS process and offers deeper insight into the CPS process in GAI environments. This expert-based evaluation approach was selected to reflect the contextual and iterative nature of real-world CPS performance, which may not be fully captured by traditional psychometric instruments that focus solely on divergent or convergent thinking. Working with GAI involves recursive and iterative interactions that allow exploring multiple solution paths, necessitating an alternative approach to CPS assessment beyond conventional task-based measures (Marozzo, 2025).

We identified the elements of prompts—“number of contextual assignments (context),”“assigned persona (persona),”“number of prompts (prompt),”“output format designation (format),” and “critical attitude (critic).” The context metric evaluates the extent to which specific scenarios are required to improve outcome quality. For instance, in the case of Task 1, context could entail supplementation with additional information regarding the visited location or specifying relevant local events for inclusion in the itinerary. Moreover, the itinerary specifics, purpose of the visit, and the structure of the report were examined. In Task 2, the evaluation of context involved verifying the direct assignment of the character of the fairy tale, predefinition of the genre, clarity of the background and purpose, and inclusion of a synopsis, encompassing five distinct aspects. Prompt measures the number of prompts entered to produce an outcome; it serves as a quantitative scale for the fine-tuning process through interaction with ChatGPT. Persona evaluates whether the subject of the visit is explicitly defined within the prompts. In Task 1, this choice could result in variations in the scope and level of report-writing based on the role of the participant, such as a graduate student conducting research, an undergraduate attending a general education course, or a guide for foreign tourists. In Task 2, we determined whether roles such as fairy tale writers or counselors should be assigned to ChatGPT. For format, we assessed whether the prompts specified details such as paragraph structure, word count, or table layout. The critic metric examined whether ChatGPT prompts were dynamically adjusted to build on previous responses or restructured to reflect user intentions after evaluating feedback.

Participants shared their feedback on interacting with ChatGPT after completing Task 2. Two researchers, who majored in liberal arts education and participated as judges in the contest, evaluated the participants’ emotions and satisfaction levels regarding the ChatGPT prompts. Emotional arousal was categorized as neutral or emotionally aroused, and satisfaction levels were categorized as positive, negative, or neutral. The experts manually annotated emotional arousal and satisfaction levels because the participants’ feedback included subtle meanings and expressions. The categorizations were finalized after annotating sentiments and satisfaction with the contest feedback.

Analyses and Results

Descriptive Statistics

As indicated in Table 3, the variables persona, format, and critic are 0 or 1 depending on the existence of each feature within the prompt. The variables context, prompt, and score include different ranges of integer values. The prompts with assigned personas (persona) in Task 1 (Mean (M) = 0.34, Standard deviation (SD) = 0.48) were higher than those in Task 2 (M = 0.15, SD = 0.36). By contrast, the prompts for format designation (format) in Task 2 (M = 0.81, SD = 0.39) were higher than those in Task 1 (M = 0.42, SD = 0.50). The critic levels for both Task 1 (M = 0.37, SD = 0.49) and Task 2 (M = 0.31, SD = 0.46) were similar, indicating that the prompting for each task exhibited a similar level of critical attitude. Therefore, we infer that different prompt features can be used to resolve different types of problems while interacting with ChatGPT.

Table 3.

Data Collected from the ChatGPT Prompting Contest.

Variable	Task 1: Trip report					Task 2: Fairy tale writing
Variable	Obs	M	SD	Min	Max	Obs	M	SD	Min	Max
context	59	2.25	1.12	0	5	59	2.69	1.21	0	5
persona	59	.34	.48	0	1	59	.15	.36	0	1
format	59	.42	.50	0	1	59	.81	.39	0	1
critic	59	.37	.49	0	1	59	.31	.46	0	1
prompt	59	3.49	3.47	1	19	59	2.90	2.43	1	11
score	59	19.64	2.91	14	25	59	17.22	3.26	12	24
emotion	-	-	-	-	-	59	.54	.50	0	1
satisfaction	-	-	-	-	-	59	1.19	−68	0	2

Obs: Total observations; M: Mean; SD: Standard deviation.

Prompt Engineering and Problem-Solving Performance

Hierarchical regression was implemented to evaluate the prompt factors predicting CPS performance. Tables 4 and 5 show the correlations among the variables and the results of the hierarchical regression for Task 1 (Trip Report), respectively. The variance inflation factor (VIF) values for all variables were below 3, suggesting no multicollinearity (Table 4).

Table 4.

Correlations for Task 1 (Trip report) Variables.

Variable	(1)	(2)	(3)	(4)	(5)	VIF
score	1					-
context	.578***	1				1.23
persona	.3**	.383***	1			1.21
format	.189	.051	.038	1		1.07
critic	.546	.265	.188	.261	1	1.96
prompt	.547***	.273**	.262**	.177	0.685***

p < 0.1; **p < 0.05; ***p < 0.01.

Table 5.

Results of Hierarchical Regression for Task 1 (Trip report).

Score for Task 1	Task 1: Trip Report (Obs = 59)
Score for Task 1	Model 1	SE	Model 2	SE	Model 3	SE
context	1.162***	.274	1.236***	.269	1.104***	.274
persona (0, no; 1, yes)	.303	.634	.125	.627	.098	.637
format (0, no; 1, yes)	.349	.578	.349	.565	.374	.583
critic (0, no; 1, yes)	2.392***	.615	1.443*	.786	1.65**	.819
prompt			.206***	.110	.227**	.113
gender					.508	.628
AcademicBG					.236	.264
R ²	.505		.536		.545
R ² Change			.031 *		.009

p < 0.1; **p < 0.05; ***p < 0.01.

SE: Standard Error.

For the trip report task, prompt characteristics such as context, persona, format, and critic were entered into Model 1, which explained 50.5% of the variance in CPS performance. Both context (β = 1.162; p < 0.01) and critic (β = 2.392; p < 0.01) had positive effects on CPS performance. After entering the number of prompts in Model 2, the total variance explained by the model was 53.6%, with the number of prompts (β = 0.206; p < 0.01) also showing a positive effect on the CPS score. In Model 3, we introduced gender and academic discipline (AcademicBG) as control variables. Their addition did not result in a significant improvement over Model 2.

Tables 6 and 7 show the correlations between the variables and the results of the hierarchical regression for Task 2 (fairy tale writing), respectively. VIF values for all variables were below three, suggesting no multicollinearity.

Table 6.

Correlations for Task 2 (Fairy tale writing) Variables.

Variable	(1)	(2)	(3)	(4)	(5)	VIF
score	1
context	.551***	1				1.40
persona	.233*	−.01	1			1.16
format	.127	.023	.082	1		1.01
critic	.512***	.323**	.128	.034	1	1.41
prompt	.763***	.501***	.312**	.088	.533	1.91

p < 0.1; **p < 0.05; ***p < 0.01; VIF: Variance Inflation Factor.

Table 7.

Results of Hierarchical Regression for Task 2.

Score for Task 2	Task 2: Fairy tale writing (Obs = 59)
Score for Task 2	Model 1	SE	Model 2	SE	Model 3	SE
context	1.190***	.283	***	.265	.687**	.266
persona	1.677*	.901	.344	.802	.557	.801
format	.744	.824	.534	.691	.558	.684
critic	2.412***	.740	.937	.690	1.168	.968
prompt			.754***	.154	.735***	.152
sex					1.027*	.595
AcademicBG					.019	.234

R ²	.473		.637		.660
R ² Change			.164 ***

p < 0.1; **p < 0.05; ***p < 0.01.

For the fairy tale writing task, prompt characteristics such as context, persona, format, and critic were entered into Model 1, which explained 47.3% of the variance in CPS performance. Context (β = 1.190; p < 0.01), persona (β = 1.677; p < 0.1), and critic (β = 2.412; p < 0.01) had positive effects on CPS performance. After entering the number of prompts in Model 2, the explained total variance was 63.7%, with the number of prompts (β = 0.754; p < 0.01) also positively affecting the CPS score. In Model 3, we introduced gender and academic discipline as control variables. Their addition did not result in a significant improvement over Model 2.

In both Task 1 and Task 2, the variables critic, context, and prompt each exhibited a significant positive effect on CPS performance. Therefore, we have an answer for RQ1.

Prompt Engineering and User Satisfaction

We used multinomial logistic regression to examine the influence of five independent variables: context, persona, format, critic, and prompt. The dependent variable was satisfaction, categorized as neutral, satisfied, and dissatisfied. As delineated in Table 8, only persona exhibits a significant effect (β = 16.775; p < 0.01) for the category labeled “satisfied” in the dependent variable satisfaction, denoting a strong positive influence on the likelihood of this satisfaction level. In the category labeled “dissatisfied,”persona again exhibits a positive effect (β = 17.140; p < 0.01). Thus, we have an answer for RQ2. The normal-based 95% confidence intervals for each predictor across both categories offer insight into the precision of the estimate, whose coefficients indicate intervals that do not straddle zero, thereby supporting their significance. Model adequacy was evaluated using the Wald chi-square test, with a statistic of 23.43 (p < 0.01), indicating statistical significance at the conventional level.

Table 8.

Results of MNLogit Analysis for Satisfaction.

		Task 2: Fairy tale writing
Satisfaction		Observed coefficient (β)	SE	z.	p > \|z\|	[95% conf. Interval]
0 neutral		(base outcome)
1 satisfied group
	context	.933	.504	1.850	.064	−.056	1.922
	persona	16.775	2673.258	.006	.995	-5222.715	5256.265
	format	.657	1.089	.604	.546	−1.477	2.791
	critic	.900	1.092	.824	.410	−1.240	3.040
	prompt	−.155	.308	−.505	.614	−.759	.448
	_cons	−1.828	1.438	−1.273	.203	−4.641	.986
2 dissatisfied group
	context	.501	.500	1.002	.316	−.479	1.480
	persona	17.140	2673.258	.006	.995	−.522	5256.63
	format	−.370	1.026	−.360	.719	−2.381	1.641
	critic	−2.444	1.509	−1.620	.105	−5.402	.515
	Prompt	.035	.302	.114	.909	−.558	.627
	_cons	−.083	1.288	−.064	.949	−2.606	2.441

Note. Logistic Regression chi²(10) =23.43, Prob > chi² =0.009, Pseudo R² = 0.199.

Prompt Engineering and User Emotional Arousal

We performed logistic regression analysis to examine the relationship between a binary emotional outcome of neutral (0) or emotional arousal (1). The predictors included context, persona, format, critic, and prompt. As indicated in Table 9, the normal-based 95% confidence intervals for each predictor suggest a wide range of possible values for the true odds ratios, with none of the intervals excluding the value of one, further reinforcing the lack of statistical significance of the predictors. This observation indicates that statistical evidence is insufficient to conclusively assert the impact of these predictors on emotional arousal. Therefore, the answer to RQ3 is that none of the variables significantly affected the participants’ emotional arousal.

Table 9.

Results of the Logistic Regression Analysis for Emotional Arousal.

Emotion	Task 2: Fairy tale writing (Obs = 59)
Emotion	Odds ratio	SE	z.	p > \|z\|	[95% conf. Interval]
context	1.093	.298	.33	.745	.641	1.863
persona	.941	.771	−.07	.941	.189	4.691
format	.395	.297	−1.23	.217	.09	1.728
critic	1.956	1.409	.93	.351	.477	8.022
prompt	.800	.131	−1.36	.172	.581	1.101
_Cons	3.152	2.871	1.26	.207	.529	18.790

Note. Logistic Regression chi²(10) = 4.40, Prob > chi² = 0.493, Pseudo R² = 0.054.

User Satisfaction Toward ChatGPT and Problem-Solving Performance

We examined the impact of user satisfaction, categorized into three groups (0: neutral, 1: satisfied, and 2: dissatisfied), on the score. An ANOVA test was used to evaluate the statistical significance of satisfaction levels with CPS performance. Table 10 presents the results, which examined the relationship between score and satisfaction (1: satisfied, 2: dissatisfied).

Table 10.

Analysis of Variance Test for Score and Satisfaction for Task 2 (Fairy Tale Writing).

Satisfaction	Sum of squares	Degree of freedom	Mean square	F	Prob > F
Between groups	37.453	1	37.453	3.52	.061
Within groups	510.067	48	10.174
Total	547.52	49	11.174

Bartlett’s equal-variances test: chi²(1) = 0.0949, Prob>chi2 = 0.758.

The observed F-statistic was 3.52 (p = 0.061), which was used to determine whether a significant difference existed in the mean values across the groups defined by satisfaction. The observed variation in score among the different levels of satisfaction was marginally statistically significant. Thus, the ANOVA results suggest that the user group satisfied with the outputs of the prompts exhibited better CPS performance than the dissatisfied group.

Discussion and Implications

Research Observations

This study examined the effects of prompt elements on CPS performance and user experience. Its findings elucidate the interactions between users and ChatGPT and provide clarity on the use of ChatGPT prompts to improve CPS performance. The results indicate that prompts with heightened contextual information are more likely to achieve outputs that align with objectives. The number of contextual assignments significantly affected CPS performance regardless of the task type, which differed notably between trip reports and fairy tale writing tasks. The former requires straightforward and focused strategies, whereas the latter requires unrestricted, innovative thinking. Furthermore, our analysis indicated that the significant effect of the number of prompts on CPS performance was evident regardless of the type of task. In addition, critical attitudes toward ChatGPT responses marginally affected CPS performance. Finally, the assigned personas positively affected user satisfaction and dissatisfaction. The former is consistent with previously reported results; however, the assigned personas were not expected to affect dissatisfaction positively.

This unexpected finding may be explained by the expectation–disconfirmation perspective (Oliver, 1977). Disconfirmation is a psychological state caused by a gap between expectations and perceived performance (Lin et al., 2018). The participants who assigned a persona to a prompt tended to have more specific expectations of ChatGPT’s responses. For instance, when participants choose the persona of a child storywriter, they implicitly expect that the story’s content and tone will be suitable for young readers. If ChatGPT’s output fails to meet these expectations, the participants may feel dissatisfied. This finding was supported by participants’ feedback. For example, those who chose a children’s storybook writer as their persona noted limitations such as “Some parts of the story had repetitive or awkward expressions” and “The mixture of formal and informal tone was uncomfortable.” Consequently, such participants may articulate their discontent and fall into the category of dissatisfied users.

Contrary to expectations, persona assignments and output designations did not affect CPS performance in our study. While earlier studies considered persona settings a key element in prompting (Clayton, 2023; OpenAI, 2023), other research has questioned their practical value, pointing to implementation issues (N. Chen, Wang, et al., 2023; Y. Chen, Wong, et al., 2023) and possible reasoning biases (Gupta et al., 2024). Taken together with previous research, our findings suggest that persona assignments could have a limited effect on CPS performance. Possible explanations can be inferred from the following. First, the primary functions of persona design are representativeness and empathy (Dreamson et al., 2023). For persona assignments to work, users’ needs in areas such as cognition and emotions must be considered. Nevertheless, as ChatGPT does not assume the representativeness of the user or express empathy (Das & Ghoshal, 2023), its ability to enhance CPS performance through the manifestation of empathy is constrained. Second, contextual information is embedded in problem-solving prompts. Therefore, even in the absence of persona assignments, prompts can be construed as inherently embodying personas to a certain extent.

The lack of impact of output designation on CPS performance can be attributed to the clarity of the problem format stipulated by the contest. In the dataset used in this study, the practical task required the creation of trip reports. The structure of the task did not influence the assessment of CPS performance because the prescribed report format was inherently integrated into the task, and most users adhered to the format in their prompts. The creative task inherently specified the genre (fairytale) and word count, causing most users to adhere to this format in their prompts. Therefore, output designation may not function as a distinguishing factor in the evaluation.

Moreover, despite our attempts to incorporate the emotional aspects between users and ChatGPT in this study, we did not obtain a significant response. Users prompting within contextual assignments and those with critical attitudes did not exhibit emotional arousal. Similarly, regardless of the increase in prompt frequency, the users maintained an emotionally neutral state similar to that experienced when conducting a simple task, such as using a calculator. Lee et al. (2019) showed that different routes exist for inducing emotion arousal in human–computer interaction design depending on the design goals, such as sense-making or exploration. Thus, emotional arousal may vary depending on the goal for which the user uses ChatGPT or the user’s perception of ChatGPT. Nevertheless, this inference warrants further research into users’ conceptualizations and relational orientations toward ChatGPT. Furthermore, this study employed only one measurement of emotion from the text analysis. Future studies can adapt multiple components of emotion measurement to analyze emotions in human–computer interactions (Mahlke et al., 2006).

The association between user satisfaction and CPS performance was marginally significant, answering RQ4. Specifically, a significant difference was observed in CPS performance between the satisfied and dissatisfied groups. Thus, user satisfaction in interacting with ChatGPT was positively correlated with CPS performance. However, this association requires further investigation to clarify causality.

Implications

Our study enhanced the understanding of prompt engineering by examining its effects on problem-solving and user satisfaction. Furthermore, our findings contribute significantly to the academic literature. First, the CPS model addresses complex issues through nonlinear processes that include understanding the problem, generating ideas, planning, and execution. As GAI advances rapidly, a review of GAI applicability within the CPS model has become increasingly warranted. Therefore, this study connects CPS with the prompting process to ensure that complex challenges can be effectively addressed. While previous research defined effective prompts and investigated their implementation, this study is the first to assess each prompt element in the context of CPS. Our findings reveal common influential characteristics across task domains, as well as domain-specific effects on CPS performance. Second, our findings suggest that prompt engineering significantly affects CPS performance. A common influential characteristic of prompts exists regardless of the task type. Additionally, this study empirically investigated prompting strategies previously suggested in the literature (Clayton, 2023; Cook, 2023; Liu & Chilton, 2022). Building on this foundation, the study further explored text-based prompting elements that influence CPS.

Third, while previous studies have focused on cognitive evaluations of ChatGPT prompting, we investigated the effects of prompting on problem-solving, satisfaction, and emotional arousal. By examining affective aspects, this study extends prior research that has mostly concentrated on prompting methods to improve learning outcomes or work performance. If more refined data collection and analysis methods are employed to meticulously quantify and analyze the emotional arousal of users engaging in problem-solving via prompts and its impact on satisfaction, the results can contribute to further research on interactions with ChatGPT.

The findings of this study also make a practical contribution to ChatGPT prompting in CPS. First, sufficient contextual information improves the effectiveness of the prompt. We found that the specificity of contextual elements has a positive effect on CPS performance consistently across task types. Therefore, ChatGPT prompts should clearly present the background, objectives, and constraints of the task, guiding the GAI to generate more goal-oriented responses. For example, a prompt such as “Write a field report from the perspective of a university student who participated in a three-day cultural heritage tour in Gyeongju” will likely lead to higher-quality problem-solving outcomes than the simpler instruction, “Write a travel report.” Second, increasing the number of prompt entries is a strategy to enhance user engagement iteratively and refine outputs. A higher number of prompts improved CPS performance—users continuously redefined and elaborated on problems through interactions with ChatGPT. Thus, rather than attempting to derive complete answers from a single query, a conversational approach involving iterative feedback and follow-up prompts is recommended. Finally, a critical attitude toward AI responses reflects the user’s active involvement in the problem-solving process and contributes to performance improvements. When users evaluate and request revisions rather than passively accepting responses, the role of the GAI evolves from a simple information provider to a collaborative problem-solving partner. These prompt strategies encourage users to sustain contextual awareness and adopt a critical perspective throughout the interaction process.

Limitations and Future Research

Certain limitations were observed in this analysis. First, the cross-sectional dataset was collected from a single university, and the initial dataset was small. Furthermore, this study’s contest-based setting may limit its generalizability. Task constraints and performance incentives could have influenced participant behavior, and the self-selected sample may not reflect the broader population. In the future, prompt engineering should be examined by considering a larger dataset in various contexts, and longitudinal research is necessary to capture the prompting dynamics.

Second, although the relationship between emotional arousal and prompting was not confirmed statistically, it was not fully explored. Recent studies have focused on understanding the role of user emotions in the adoption of applications because they often affect user satisfaction and continued usage. The positive valence of emotional responses is associated with user satisfaction, resulting in users’ intent to use products or services (Thüring & Mahlke, 2007). Further research is required to better understand the arousal and valence of emotional responses during the ChatGPT prompting process. This research model should be expanded by incorporating user intentions for future use.

Third, this study employed binary measures for key prompt variables (e.g., persona, format, and critic) and a three-level scale for user satisfaction. Although these simplified measurements facilitated the analysis, they may have overlooked the more nuanced dimensions of prompt design and user experience. Future research could adopt more fine-grained or continuous measures to capture subtle variations. Additionally, it could employ qualitative or mixed-method approaches to better understand how prompt features interact with user perceptions and outcomes.

Fourth, although gender and academic discipline were included as control variables, other potentially influential factors, such as prior experience with ChatGPT, were not considered. Future research should include a broader range of individual variables to enhance explanatory power. Furthermore, while the regression assumed independence among the prompt elements, possible interaction effects were not tested. Exploring these interactions in future studies could better elucidate the joint influence of prompt components on user outcomes.

Conclusions

This study used a mixed-method approach to empirically examine the effects of specific prompt elements in ChatGPT interactions on CPS performance and user experience (satisfaction and emotional arousal). The results show that adding contextual information, using multiple prompts iteratively, and maintaining a critical attitude toward AI responses significantly enhance CPS outcomes. Conversely, assigning personas did not directly improve performance but influenced user experience, specifically user satisfaction. These findings suggest that well-designed prompts should include sufficient context, allow iterative refinement, and encourage users to critically evaluate AI outputs rather than accept them passively. This study contributes new empirical evidence to the field of GAI use in CPS by clarifying which prompt elements influence CPS performance and user satisfaction. By going beyond general prompt guidelines, this study offers a practical understanding of prompt engineering as an active user strategy that shapes the quality of human–AI collaboration.

Footnotes

ORCID iD

Yoonkyoung Choi

Minjeong Lee

Sooyoung Han

Jinyoung Han

Author Contributions

Yoonkyoung Choi: Validation, Supervision, Conceptualizing, Writing—original draft. Minjeong Lee: Project administration, Methodology, Formal analysis, Investigation, Writing—original draft. Sooyoung Han: Validation, Resources, Methodology, Investigation, Data curation. Jinyoung Han: Writing—review & editing, Supervision, Resources, Conceptualization, Project administration.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The data that support the findings of this study are available from Chung-Ang University, AI Humanities Research Institute(CAU-AIHRI), but restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. The data are, however, available from the authors upon reasonable request and with the permission of CAU-AIHRI.

References

Adiguzel

Kaya

M. H.

Cansu

F. K.

(2023). Revolutionizing education with Al: Exploring the transformative potential of ChatGPT. Contemporary Educational Technology, 15(3), ep429. https://doi.org/10.30935/cedtech/13152

Ngai

E. W. T.

Cheng

T. C. E.

(2002). A critical review of end-user information system satisfaction research and a new research framework. Omega, 30(6), 451–478.

Boussioux

Lane

J. N.

Zhang

Jacimovic

Lakhani

K. R.

(2024). The crowdless future? Generative AI and creative problem-solving. Organization Science, 35(5), 1589–1607. https://doi.org/10.1287/orsc.2023.18430

Chen

Wang

Jiang

Cai

Chen

Wang

. (2023). Large language models meet Harry Potter: A dataset for aligning dialogue agents with characters. In Findings of the Association for Computational Linguistics: EMNLP 2023, 8506–8520. https://aclanthology.org/2023.findings-emnlp.570

Chen

Wong

Yang

Aguenza

Bhujangari

Lei

Prasad

Fluss

Phuong

Liu

Davis

(2023). Assessing the impact of prompting, persona, and chain of thought methods on ChatGPT’s arithmetic capabilities. (arXiv:2312.15006). arXiv. https://doi.org/10.48550/arXiv.2312.15006

Cheng

Luo

Benitez

Cai

(2022). The good, the bad, and the ugly: Impact of analytics and artificial intelligence-enabled personal information collection on privacy and participation in ridesharing. European Journal of Information Systems, 31(3), 339–363.

Clayton

(2023). ChatGPT for beginners: A step-by-step guide to crafting great prompts. Service Management 101 LLC.

Cook

(2023). How to write effective prompts for ChatGPT: 7 Essential steps for best results. Forbes. https://www.forbes.com/sites/jodiecook/2023/06/26/how-to-write-effective-prompts-for-chatgpt-7-essential-steps-for-best-results

Cress

Kimmerle

(2023) Co-constructing knowledge with generative AI tools: Reflections from a CSCL perspective. International Journal of Computer-Supported Collaborative Learning. https://doi.org/10.1007/s11412-023-09409-w.

10.

Dai

Lin

Jin

Tsai

Y. S.

Gašević

Chen

(2023). Can large language models provide feedback to students? A case study on ChatGPT. In 2023 IEEE International Conference on Advanced Learning Technologies (ICALT), 323–325.

11.

Das

Ghoshal

(2023). Can artificial intelligence ever develop the human touch and replace a psychiatrist?—A letter to the editor of the Journal of Medical Systems: Regarding “artificial intelligence in medicine & ChatGPT: De-tether the physician”. Journal of Medical Systems, 47(1), 32.

12.

Dean

Hender

J. M.

Rodgers

T. L.

Santanet

E. L.

(2006). Identifying quality, novel, and creative ideas: Constructs and scales for idea evaluation. Journal of the Association for Information Systems, 7(10), 646–698.

13.

Dreamson

Rhee

Han

Lee

(2023). Persona design: Representativeness and empathy through cultural integration. International Journal of Art and Design Education, 42(2), 246–260. https://doi.org/10.1111/jade.12455

14.

Dubé

Menon

(2000). Multiple roles of consumption emotions in post-purchase satisfaction with extended service transactions. International Journal of Service Industry Management, 11, 287–304.

15.

Dwivedi

Banerjee

(2024). GenAI-ing ideas? Understanding human-is collaboration for creative problem solving. ICIS 2024 Proceedings. 30. https://aisel.aisnet.org/icis2024/aiinbus/aiinbus/30

16.

Gewirtz

(2023). How to write better ChatGPT prompts for the best generative AI results. ZDNet. https://www.zdnet.com/article/how-to-write-better-chatgpt-prompts/

17.

Gupta

Shrivastava

Deshpande

Kalyan

Clark

Sabharwal

Khot

(2024). Bias runs deep: Implicit reasoning biases in persona-assigned LLMs. (arXiv:2311.04892). arXiv. https://doi.org/10.48550/arXiv.2311.04892

18.

Han

Choi

(2023). ChatGPT era, technology-induced changes and possibilities in reading. Journal of AI Humanities, 14, 83–111. https://doi.org/10.46397/JAIH.14.4

19.

Han

Zhao

Ding

Liu

Sun

(2022) PTR: Prompt tuning with rules for text classification. AI Open, 3, 182–192. https://doi.org/10.1016/j.aiopen.2022.11.003

20.

Henrickson

Meroño-Peñuela

(2023). Prompting meaning: A hermeneutic approach to optimising prompt engineering with ChatGPT. AI & Society, 1–16. https://doi.org/10.1007/s00146-023-01752-8

21.

Hoffman

D. L.

Novak

T. P.

(1996). Marketing in hypermedia computer-mediated environments: Conceptual foundations. Journal of Marketing, 60(3), 50–68.

22.

Hua

Cheng

Hou

Luo

(2020). Monetary rewards, intrinsic motivators, and work engagement in the IT-enabled sharing economy: A mixed-methods investigation of Internet taxi drivers. Decision Sciences, 51(3), 755–785.

23.

Isaksen

S. G.

Treffinger

D. J.

(1985). Creative problem solving: The basic course. Brearly Limited.

24.

Kasneci

Sessler

Küchemann

Bannert

Dementieva

Fischer

Gasser

Groh

Guennemann

Huellermeier

Krusche

Kutyniok

Michaeli

Nerdel

Pfeffer

Poquet

Sailer

Schmidt

Seidel

Stadler

Weller

Kuhn

Kasneci

(2023). ChatGPT for good? On opportunities and challenge: Of large language model: For education. Leaning and Individual Differences 103, Article 102274. https://www.sciencedirect.com/science/article/abs/pii/S1041608023000195

25.

Lang

P. J.

Greenwald

M. K.

Bradley

M. M.

Hamm

A. O.

(1993). Looking at pictures: Affective, facial, visceral and behavioral reactions. Psychophysiology, 30, 261–273.

26.

Lee

Choi

Marakas

G. M.

Singh

S. N.

(2019). Two distinct routes for inducing emotions in HCI design. International Journal of Human-Computer Studies, 124, 67–80.

27.

Lin

Wei

Lekhawipat

(2018) Time effect of disconfirmation on online shopping. Behaviour & Information Technology, 37(1), 87–101.

28.

Liu

Chilton

L. B.

(2022) Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI. 2022, 1–23. https://doi.org/10.1145/3491102.3501825

29.

L. S.

(2023). The clear path: A framework for enhancing information literacy through prompt engineering. The Journal of Academic Librarianship, 49(4), 102720. https://doi.org/10.1016/j.acalib.2023.102720

30.

Mahlke

Minge

Thüring

(2006). Measuring multiple components of emotions in interactive contexts. In CHI’06 extended abstracts on Human factors in computing systems (pp. 1061–1066).

31.

Mahmood

M. A.

Burn

J. M.

Gemoets

L. A.

Jacquez

(2000). Variables affecting information technology end-user satisfaction: A meta-analysis of the empirical literature. International Journal of Human-Computer Studies, 52(4), 751–771.

32.

Marozzo

(2025). Iterative resolution of prompt ambiguities using a progressive cutting-search approach. https://doi-org.proxy.cau.ac.kr/10.48550/arXiv.2505.02952

33.

Memmert

Bittner

E. A. C.

(2022). Complex problem solving through human-AI collaboration: Literature review on research context. Proceedings of the 55th Hawaii International Conference on System Sciences, HICSS 2022.

34.

Memmert

Cvetkovic

Bittner

(2024). The more is not the merrier: Effects of prompt engineering on the quality of ideas generated by GPT-3. In Bui

T. X.

(Ed.), Proceedings of the 57th Hawaii International Conference on System Sciences (pp. 1052–1061). University of Hawaii at Manoa. https://scholarspace.manoa.hawaii.edu/bitstreams/8568df6d-b5b3-4985-97fc-3b2f4236bb82/download

35.

Mumford

M. D.

Baughman

W. A.

Maher

M. A.

Costanza

D. P.

Supinski

E. P.

(1997). Process-based measures of creative problem-solving skills: IV. Category combination. Creativity Research Journal, 10(1), 59–71.

36.

Muncy

J. A.

Hunt

S. D.

(1984). Consumer involvement: Definitional issues and research directions. Advances in Consumer Research, 11(1), 193–196.

37.

Oliver

R. L.

(1977). Effect of expectation and disconfirmation on postexposure product evaluations—An alternative interpretation. Journal of Applied Psychology, 62(4), 480–486.

38.

OpenAI (2023). GPT best practices. OpenAI, 2023, https://platform.openai.com/docs/guides/gpt-best-practices

39.

Oppenlaender

(2023). A taxonomy of prompt modifiers for text-to-image generation. Behaviour & Information Technology, 43(15), 3763–3776. https://doi.org/10.1080/0144929X.2023.2286532

40.

Orrù

Piarulli

Conversano

Gemignani

(2023). Human-like problem-solving abilities in large language models using ChatGPT. Frontiers in Artificial Intelligence, 6, 1199350. https://www.frontiersin.org/articles/10.3389/frai.2023.1199350/full

41.

Osborn

(1963). Applied imagination: Principle and procedures of creative thinking. Scribner.

42.

Ouyang

Jiang

Almeida

Wainwright

C. L.

Mishkin

Zhang

Agarwal

Slama

Ray

Schulman

Hilton

Kelton

Miller

Simens

Askell

Welinder

Christiano

Leike

Lowe

(2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf

43.

Puccio

G. J.

Mance

Murdock

M. C.

(2010). Creative leadership: Skills that drive change. Sage.

44.

Puccio

G. J.

Murdock

M. C.

Mance

(2005). Current developments in creative problem solving for organizations: A focus on thinking skills and styles. The Korean Journal of Thinking & Problem Solving, 15(2), 43–76.

45.

Reynolds

McDonell

(2021). Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems, 1–7. https://doi.org/10.1145/3411763.3451760

46.

Rick

Giacomelli

Wen

Laubacher

R. J.

Taubenslag

Heyman

J. L.

Knicker

M. S.

Jeddi

Maier

Dwyer

Ragupathy

Malone

T. W.

(2023). Supermind ideator: Exploring generative AI to support creative problem-solving. (arXiv:2311.01937). arXiv. https://doi.org/10.48550/arXiv.2311.01937

47.

Santosa

P. I.

Wei

K. K.

Chan

H. C.

(2005). User involvement and user satisfaction with information-seeking activity. European Journal of Information Systems, 14(4), 361–370. https://doi.org/10.1057/palgrave.ejis.3000545

48.

Short

C. E.

Short

J. C.

(2023). The artificially intelligent entrepreneur: ChatGPT, prompt engineering, and entrepreneurial rhetoric creation. Journal of Business Venturing Insights, 19, e00388. https://doi.org/10.1016/j.jbvi.2023.e00388

49.

Thüring

Mahlke

(2007). Usability, aesthetics and emotions in human–technology interaction. International Journal of Psychology, 42, 253–264.

50.

Tsai

H. T.

Chien

J. L.

Tsai

M. T.

(2014). The influences of system usability and user satisfaction on continued Internet banking services usage intention: empirical evidence from Taiwan. Electronic Commerce Research, 14(2), 137–169. https://doi.org/10.1007/s10660-014-9136-5

51.

Urban

Děchtěrenko

Lukavský

Hrabalová

Svacha

Brom

Urban

(2024) ChatGPT improves creative problem-solving performance in university students: An experimental study. Computers & Education, 215. https://doi.org/10.1016/j.compedu.2024.105031

52.

Vasconcelos

A. R.

Santos

R. P. dos

. (2023), Enhancing STEM learning with ChatGPT and Bing Chat as objects to think with: A case study. EURASIA Journal of Mathematics, Science and Technology Education, 19(7). https://doi.org/10.29333/ejmste/13313

53.

Velásquez-Henao

J. D.

Franco-Cardona

C. J.

Cadavid-Higuita

(2023). Prompt engineering: A methodology for optimizing interactions with AI-language models in the field of engineering. DYNA, 90(230), 9–17. https://doi.org/10.15446/dyna.v90n230.111700

54.

Venkatesh

Brown

S. A.

Bala

(2013). Bridging the qualitative-quantitative divide: Guidelines for conducting mixed methods research in information systems. MIS Quarterly, 21–54.

55.

Venkatesh

Brown

S. A.

Sullivan

Y. W.

(2016). Guidelines for conducting mixed-methods research: An extension and illustration. Journal of the AIS, 17(7), 435–495.

56.

Vernon

Hocking

Tyler

T. C.

(2016). An evidence-based review of creative problem solving tools: A practitioner’s resource. Human Resource Development Review, 15(2), 230–259.

57.

Wang

Yang

Cai

Yin

(2024). Unleashing ChatGPT’s power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering. IEEE Transactions on Learning Technologies, 1–13. https://doi.org/10.1109/TLT.2023.3324714

58.

Weber

Kordyaka

Palombo

Siemon

Niehaves

(2023). Is a fool with an AI tool still a fool?: An empirical study of the creative quality of human-AI collaboration. ACIS 2023 Proceedings. https://dblp.org/rec/conf/acis/WeberKPSN23.html

59.

White

Hays

Sandborn

Olea

Gilbert

. . . Schmidt

D. C.

(2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv:2302.11382, arXiv. https://doi.org/10.48550/arXiv.2302.11382

60.

World Economic Forum [WEF] (2020). Here are 5 skills researchers say employers are looking for right now. World Economic Forum.

61.

Yilmaz

F. G. K.

Fatma

(2023). The effect of generative artificial intelligence (AI)-based tool use on students’ computational thinking skills, programming self-efficacy and motivation. Computers and Education: Artificial Intelligence, 4, 100147. https://doi.org/10.1016/j.caeai.2023.100147

62.

Zamfirescu-Pereira

J. D.

Wong

R. Y.

Hartmann

Yang

(2023). Why Johnny can’t prompt: How non-AI experts try (and fail) to design LLM prompts. CHI ’23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–21. https://doi.org/10.1145/3544548.3581388

63.

Zhang

Peng

Sun

Guo

Chen

Zhao

(2025) AI delivers creative output but struggles with thinking processes. arXiv:2503.23327. https://doi-org.proxy.cau.ac.kr/10.48550/arXiv.2503.23327

64.

Zsidó

A. N.

(2024). The effect of emotional arousal on visual attentional performance: A systematic review. Psychological Research, 88(1), 1–24. https://doi.org/10.1007/s00426-023-01852-6

Effects of Prompt Elements on Problem-Solving Performance and User Experience: Insights from ChatGPT Interactions

Abstract

Keywords

Introduction

Literature Review and Research Questions

ChatGPT and Prompt Engineering

CPS and ChatGPT

User Satisfaction and Emotions

Research Methods

Sample and Data Collection

Design of the Research Method

Development of Measurements

Analyses and Results

Descriptive Statistics

Prompt Engineering and Problem-Solving Performance

Prompt Engineering and User Satisfaction

Prompt Engineering and User Emotional Arousal

User Satisfaction Toward ChatGPT and Problem-Solving Performance

Discussion and Implications

Research Observations

Implications

Limitations and Future Research

Conclusions

Footnotes

ORCID iD

Author Contributions

Funding

Declaration of Conflicting Interests

Data availability

References