Sage Journals: Discover world-class research

Abstract

Objective:

In-person counseling faces limitations in timing and geographical accessibility, often causing adolescents and young adults (AYAs) to miss timely psychological support. With the advent of large language models (LLMs), the mental health care industry has increasingly focused on developing chat counseling services as supplementary tools to reduce these barriers. However, existing services have two primary limitations: tendency toward generic advice and absence of human-like dialogue. To overcome these limitations, this article proposes BetterMood, a human-like AI counseling service specifically for Korean-speaking AYAs.

Methods:

Our design for BetterMood separately addressed the content and delivery of counseling dialogue. For content, we develop a concern-aware counseling LLM refined through prompt-engineering with a novel prompt derived from collected counseling data. For delivery, we create a human-like AI counselor that employs a chunk-based streaming methodology to enable human-like dialogue. We then conducted a user study with 10 adolescents, 110 young adults, and 8 professional clinicians to assess the feasibility and user experience across four domains: (i) interaction capability, (ii) perceived support, (iii) usability, and (iv) ethical safety.

Results:

Our user study indicates that BetterMood’s interactive capabilities, particularly its ability to suggest appropriate responses, received positive feedback from 90.0% of adolescents, 90.9% of young adults, and 75.0% of professional clinicians. Stratified analysis revealed that outcomes regarding perceived support and usability of the service differed across cohorts and initial screening status. Furthermore, independent evaluations by eight professional clinicians demonstrated moderate agreement for individual ratings but excellent reliability for the aggregated assessment.

Conclusion:

Positive user experience and high inter-rater reliability among clinicians support BetterMood’s potential as an accessible supplementary tool for initial psychological support.

Keywords

Adolescents’ and young adults’ mental health care AI counseling concern-aware counseling LLM human-like AI counselor chunk-based streaming

Introduction

Timely access to effective mental health care remains limited for many adolescents and young adults (AYAs).^1,2 Multiple barriers, including clinician shortage, geographic inequalities, and social stigma, contribute to this problem.^3–5 Meta-analysis indicate that fewer than 50% of AYAs who need psychological support utilize in-person mental health services, highlighting substantial gaps in access.^6,7

To address these challenges, digital mental health interventions have emerged as scalable alternatives to improve accessibility.^8,9 Recent studies further highlight AYAs’ distinctive digital engagement needs, with confidentiality, interactivity, and personalized support emerging as primary considerations.^10–12 When interventions are tailored to these needs, AYAs’ familiarity with digital platforms can facilitate uptake.^13,14 Systematic reviews report that adolescent-focused digital tools, particularly those using conversational AI, demonstrate clinical benefits and higher retention.^15–17

Building on this evidence base, researchers and industry have expanded AI-enabled mental health solutions.¹⁸ Among these, large language model (LLM)-powered conversational agents (e.g. Woebot, Wysa) have gained traction for offering on-demand, stigma-reducing support.^19,20 Scoping reviews identify more than 50 mental health chatbots, over 20 explicitly using LLMs, across academic pilots and commercial products.²¹ Randomized and quasi-experimental trials in AYAs demonstrate that LLM-powered conversational agents produce small-to-moderate reductions in internalizing symptoms (Hedges’ $g = 0.28$ – $0.55$ ).^9,15,16

Despite these advances, leading reviews and empirical studies highlight two limitations: (a) tendency toward generic advice, and (b) absence of human-like dialogue. These limitations particularly weaken support for AYAs, who prioritize context-sensitive, emotionally attuned, interactive care. They are not minor stylistic issues; rather, theory and evidence indicate that they can diminish therapeutic impact.^22–25

(a) Tendency toward generic advice. This refers to interactions that ignore users’ specific contexts and emotions.^22,23 Instead, they deliver broad, formulaic recommendations (e.g. “talk to someone you trust”). Theoretically, such genericity clashes with core counseling frameworks. Person-centered therapy,²⁶ for example, emphasizes empathic understanding, unconditional positive regard, and genuineness.^27,28 Generic advice misses the individualized attunement that those mechanisms require. Empirically, qualitative studies with AYAs show that lack of contextualization lowers perceived support and speeds disengagement.^23,29 Consistent with these findings, as Figure 1 illustrates, the open-source LLM-based chatbot ChatCounselor³⁰ acknowledges frustration yet still defaults to sweeping suggestions (e.g. “seek help from teachers or classmates”).

Figure 1.

Sample dialogue from ChatCounselor.

(b) Absence of human-like dialogue.^24,25 We define human-like dialogue as (i) multimodal social cues that convey empathy and social presence, and (ii) low-latency turn-taking that sustains conversational flow and rapport.^31,32 Theoretically, Social Presence Theory³³ and Media Richness³⁴ perspectives posit that richer, more immediate communication channels enhance interpersonal understanding. In digital mental health, the construct of a digital therapeutic alliance extends this logic by linking perceived alliance to adherence and outcomes.^35,36 Empirically, interventions that integrate multimodal, human-like feedback yield higher alliance scores and retention than text-only agents.^37–39 Yet most LLM chatbots remain text-based and asynchronous, limiting nonverbal expression and real-time responsiveness. Reflecting this limitation, empirical studies show that only a minority of AYAs perceives these chatbots as emotionally attuned or supportive when compared with human counselors.^40–42 Likewise, prior work shows that even brief (e.g. several-second) pauses in voice-based automated agents can disrupt rapport and dampen engagement.^43–45

To overcome these two limitations, we propose BetterMood, a human-like AI counseling service specifically designed for Korean-speaking AYAs, as illustrated in Figure 2. Our main contributions are twofold: (a) a concern-aware counseling LLM (From now on, by counseling LLM, we mean a concern-aware counseling LLM whenever there is no ambiguity.), and (b) a human-like AI counselor (By AI counselor, we mean a human-like AI counselor whenever there is no ambiguity.). For the first contribution, the main challenge is integrating effective psychotherapeutic counseling knowledge into a general-purpose LLM so that it can move beyond generic advice. To achieve this, we collect counseling data from both real counseling scripts and synthetic counseling scripts. Using these data, we devise a novel prompt based on psychotherapy and counseling techniques, then perform prompt-engineering. Additionally, we implement per-client session-level management to stores contextual information and each client’s concerns, ensuring continuity across visits. For the second contribution, the main challenge is creating seamless human-like counseling experience. To this end, we develop the AI counselor that delivers text responses generated by our counseling LLM through visual feature, voice, and gestures. To further support human-like dialogue with the AI counselor, we introduce a chunk-based streaming methodology that ensures minimal latency. This approach processes the counseling LLM’s text responses on a sentence-by-sentence basis. Once a sentence is finalized, the TTS module converts it into audio. Then, the audio is lip-synced with the AI counselor to enable streaming of responses.

Figure 2.

BetterMood counseling flow.

Through this design, BetterMood integrates a human-like AI counselor with a concern-aware LLM, providing counseling services to Korean-speaking AYAs without constraints of time or location. It is expected to serve as a supplementary tool that helps users overcome psychological barriers.

Methods

As illustrated in Figure 2, each conversation cycle between the client and the AI counselor in BetterMood proceeds as follows. The counseling begins by capturing the client’s audio and transmitting it to the BetterMood server. The captured audio is immediately forwarded to the STT module, which leverages OpenAI’s Whisper API.⁴⁶ Once the audio is transcribed, the resulting text is routed to the counseling LLM, which generates a sequence of responding sentences. As soon as each sentence is generated, it is passed to the TTS module, which uses ElevenLabs API⁴⁷ for audio synthesis. Then, the synthesized audio is fed into the visual response generation module, processing AI counselor’s response. Finally, the AI counselor’s response is played back to the client as a set of short video segments, completing a single conversation cycle. Through this process, BetterMood aims to provide a counseling experience that closely simulates real human interaction. To evaluate the practicality and immediate user perceptions of BetterMood, we conduct a user study with three cohorts: 10 adolescents, 110 young adults, and 8 professional clinicians. The adolescent and young adult participants are screened and divided into screen positive and screen negative groups based on their screening results. After each counseling session, participants complete a satisfaction survey assessing various aspects of the service. Additionally, the professional clinicians participate in counseling sessions using BetterMood and provide evaluations from an expert perspective. Prior to the study, written informed consent is obtained from all participants. For participants over 18, written informed consent is obtained from themselves. For participants under 18, written informed consent is obtained from their legally authorized representatives (parents or legal guardians), with simultaneous assent from the minors. The technical details of concern-aware counseling LLM, and human-like AI counselor will be discussed more extensively in the subsequent sections.

Concern-aware counseling LLM

Current chat counseling services often provide general solutions immediately without properly exploring and discussing clients’ concerns. To address this problem, we collect counseling data and analyze how counseling sessions are conducted. Based on insights into counseling practices, we devise a novel prompt grounded in psychotherapy and counseling techniques. Then we conduct prompt-engineering to ensure our counseling LLM provides responses closely tailored to clients’ concerns. Counseling data collection. Our counseling data consist of two types of scripts: (a) real counseling scripts and (b) synthetic counseling scripts. The real counseling scripts are collected from actual counseling sessions between a counselor and students. We recruit a highly credentialed Level 1 Licensed Professional Counselor employed at the Student Counseling Center of Seoul National University, as well as students who score above the cut-off (Participants are classified using the following cut-off scores: STAI-X-1 (≥52), PHQ-9 (≥10), BIS-15 (≥39), and endorsement of 4 or more items on ASRS-V1.1 Part A.) on at least two of the four mental health self-report questionnaires: STAI-X-1 (anxiety),^48–50 PHQ-9 (depression),^51,52 BIS-15 (impulsivity),^53–56 and ASRS-V1.1 Part A (ADHD).^57–59 These questionnaires are selected based on their relevance to prevalent mental-health concerns among Korean AYAs.^60,61 Prior to the study, we confirmed the usage rights for all questionnaires. The PHQ-9, BIS-15, and ASRS-V1.1 Part A are publicly available for research purposes. For the STAI-X-1 (Korean standardized version), we obtained written permission via email from the copyright holder prior to data collection. Across the sessions, discussions typically focus on academic stress, career choices, familial conflicts, and interpersonal relationship difficulties. Counseling sessions are conducted remotely via Zoom, last approximately 30 minutes, and are recorded only after participants provide written informed consent, including clear information on potential risks, confidentiality boundaries, and participants’ right to withdraw. Recordings are transcribed verbatim using the Naver ClovaNote,⁶² which demonstrates high transcription accuracy for Korean speech.⁶³ Transcripts undergo a quality-control process in which an independent reviewer compares each transcript against the original audio recording and corrects identified discrepancies. Through this thorough procedure, 5 high-quality real counseling scripts are generated.

Before participation, all individuals receive a detailed explanation of the study’s goals, data handling procedures, potential risks, privacy measures, and their rights, both through written materials and verbal briefing. Recording of counseling sessions occurs only after explicit consent, with all participants reminded that they can withdraw at any time without consequence. To ensure confidentiality and ethical standards, all transcripts undergo an anonymization process: personal identifiers, sensitive references, and any information that might reveal participant identity are systematically removed. Fully de-identified transcripts are stored on encrypted institutional servers with access restricted to authorized research personnel only. No identifiable or sensitive participant information is shared outside the research team or used in any publication or tool development.

For synthetic counseling scripts, we collaborate with experts from the Department of Psychology at Korea University. Graduate psychology researchers author these scripts, explicitly basing their dialogues on established psychotherapeutic approaches such as cognitive behavioral therapy (CBT). Scripts address common psychological concerns among AYAs, including academic stress, career choices, familial conflicts, and interpersonal relationship difficulties. Each synthetic script represents a counseling dialogue of approximately 20 to 25 minutes and is systematically structured to reflect realistic therapeutic interactions. To maintain high data quality and clinical realism, we apply a two-step validation procedure. Initially, each script undergoes peer review, allowing authors to receive mutual feedback and enhance script fidelity; subsequently, a psychology professor comprehensively reviews each script, conducting practical role-play scenarios to test therapeutic appropriateness, psychological plausibility, and ethical sensitivity. Throughout script creation, explicit measures are taken to minimize biases, avoid reinforcement of stereotypes, and sensitively handle potentially distressing topics. This multi-layered approach yields 100 robust, clinically realistic synthetic counseling scripts.

Prompt-engineering. To ensure that our counseling LLM delivers responses closely tailored to clients’ concerns, we developed a novel prompt comprising two core components: (a) psychotherapeutic techniques derived from CBT and SP, and (b) structured counseling techniques for effective interaction. For the first component, we analyze our counseling dataset and identify the four most frequently used psychotherapeutic techniques derived from CBT and SP. These are well-established counseling approaches known for their efficacy in addressing various mental health conditions.^64–66 Figure 3 illustrates samples of our counseling data, highlighting these four techniques. Table 1 presents a segment of our prompt that incorporates these techniques, and the following paragraphs provide explanations of each technique and its application.

Problem solving skills training

In order to provide meaningful and personalized support, it is essential to engage in a more deepened conversation about the issue rather than immediately listing solutions to clients’ problems. To achieve this, our prompt is designed to ask clients for more details of their situations and difficulties while helping them to identify the underlying causes of their concerns (Table 1(a)). Through collaborative discussions, clients gain clarity and explore possible solutions.⁶⁴

Normalization and acceptance

Figure 3.

Counseling data samples.

Table 1.

Examples of our prompt with psychotherapeutic techniques.

	Technique	Prompt
(a)	Problem solving skillstraining	Ask the client to elaborate more on the problem. Try to find the root cause of the client’s trouble, rather than provide general solutions.
(b)	Normalization andacceptance	Here are some expressions. Refer to them and modify appropriately. - Normalization: “This is not just your problem; others are going through similar issues. It’s completely natural to feel these emotions in this situation.”
(c)	Summarization andrestatement	You should rephrase what the client said, and empathize with it.
(d)	Praise, encouragement, and reassurance	Here are some expressions. Refer to them and modify appropriately. - Praise: “Just coming here today and sharing your thoughts is already a significant step forward.”

Clients may suffer from the belief that their problems are unique to them, which can lead to further distress. Normalization alleviates this by reassuring clients that others face similar challenges, reducing self-blame and feelings of abnormality.⁶⁷ As shown in Table 1(b), we incorporate few-shot examples into the prompt to normalize clients’ concerns. By applying this, our counseling LLM helps clients accept their emotions as natural and valid, thereby reducing their anxiety and making the counseling session more effective.⁶⁷

Summarization and restatement

By summarizing or restating clients’ statements, rather than offering brief reactions such as “Yes” or “I see,” counselors demonstrate a higher level of attentiveness.⁶⁸ Furthermore, this approach can assist clients in clarifying their problems. We design our prompt to rephrase clients’ concerns and demonstrate empathy related to those issues (Table 1(c)). This approach enables our counseling LLM to show that it understands clients’ concerns, fostering an empathetic interaction.⁶⁶

Praise, encouragement, and reassurance

Counselors frequently offer praise to their clients, since clients can greatly benefit simply from receiving encouragement. We include examples of praise in the prompt, as shown in Table 1(d), to provide generous encouragement to clients. Such praise and support enhance their self-esteem, which is crucial for successful counseling.⁶⁶

For the second component, counseling techniques, we establish a clear counselor persona within our prompt, articulated as: “You are a kind counselor for teenagers.” To maintain an appropriately conversational and empathetic tone, we include explicit instructions such as “Maintain a conversational and empathetic tone.” Additionally, we provide structured guidance tailored to each phase of the counseling process—including initial greetings, exploring client concerns, facilitating problem-solving discussions, and effectively concluding the session—thereby enabling our counseling LLM to generate suitable and contextually relevant responses at every stage. Furthermore, we provide few-shot examples to address various client conditions such as depression, anxiety, and extreme situations like suicidal thoughts. Our prompt is refined with feedback from psychology experts to guarantee the appropriateness of responses.

Utilizing the prompt, we develop our counseling LLM based on the GPT-4o model with temperature=0.5, top_p=1.0. In comparative analysis with the sample dialogue from ChatCounselor shown in Figure 1, we conduct a full counseling session on identical topics using our counseling LLM. As shown in Supplementary Material S1, while ChatCounselor offers general solutions, our counseling LLM engages in deeper explorations of the client’s concerns. Context saving and long-term counseling. In real-world counseling, most clients require more than 10 sessions to show meaningful improvement, since problems are rarely resolved in a single session.⁶⁹ Similarly, clients using AI counseling service may seek continued guidance over multiple sessions. To support this, we maintain a database and generate Json Web Tokens (JWTs) to securely encode the client’s name and phone number as identifiers. At the end of each counseling session, the conversation context is stored based on the identifiers. When the client returns for another session, the counseling LLM retrieves the individually stored conversation context and delivers a re-engagement message (e.g. “Welcome back! I’m glad to see you again”). This ensures continuity in the conversation, allowing for a constructive consultation and a sense of personalized care, by providing individualized responses based on the client’s past experiences and previous discussions.

Human-like AI counselor

AI counselor construction. Building rapport between clients and AI counselors is essential for creating a comfortable and engaging experience. Research on virtual rapport states that clients communicate more effectively when they feel connected to their conversational partners.^70,71 Other research points out that familiar avatars not only enhance user comfort but also foster greater collaboration and interaction in virtual counseling environments.⁷² However, existing chat counseling services fail to support human-like dialogue, making it difficult for clients to establish a virtual rapport. In order to overcome this limitation, BetterMood deploys AI counselor as an utterance medium of counseling LLM, which consists of three components: visual feature, voice, and gestures.

To maximize virtual rapport, we first aim to make user-friendly AI counselor for AYAs. A face-swapping technology is applied to generate user-friendly visual feature, resembling celebrities, athletes, or friendly figures so that clients feel comfort and relaxed.⁷³ Then the visual feature is integrated with the corresponding voice using ElevenLabs. Lastly, we enable our AI counselor to mimic gestures that help counseling sessions. These gestures include actions indicative of careful listening and empathetic understanding, which help the client feel heard and understood.

The text response generated by the counseling LLM is synthesized into audio that reflects the intended voice characteristics. BetterMood then integrates this audio with the selected visual feature and gestures. Subsequently, MuseTalk,⁷⁴ a lip-sync generation model, synchronizes visual feature and gesture elements with the synthesized audio to produce the AI counselor’s response. Human-like dialogue. As illustrated in Figure 2, delivering the AI counselor’s response to the client requires sequential execution of four modules, namely STT, concern-aware counseling LLM, TTS, and visual response generation. This sequential processing approach, hereafter referred to as the conventional method, leads to a considerable overall response time, which hinders real-time human-like dialogue. To address this challenge, we propose a chunk-based streaming methodology that significantly reduces total response time by delivering the AI counselor’s response in 1-second chunks. Figure 4 provides a direct comparison between the conventional method and our proposed approach by illustrating the entire process from the client’s speech to the AI counselor’s response delivery. To clearly demonstrate the differences, we present an example in which the AI counselor’s response consists of 3 sentences, with a total audio duration of 12 seconds.

Figure 4.

Comparison of the conventional and our methods for AI counselor response delivery.

Figure 4(a) shows the conventional method’s process of delivering the AI counselor’s response. Initially, once the client finishes speaking, the STT module transcribes the spoken audio into text. This text is then provided to the counseling LLM, which generates a complete text response. Subsequently, the generated text is passed to the TTS module, where this response is synthesized into a 12-second audio. Finally, this audio is forwarded to the visual response generation module, where it is lip-synced with the AI counselor using MuseTalk. Throughout this sequential process, the client must wait approximately 15 seconds for all stages to be completed before hearing or seeing any part of the AI counselor’s response.

In contrast, Figure 4(b) illustrates our proposed method. Since the modules involved in delivering the AI counselor’s response operate independently, we adopt a pipelining approach. Rather than waiting for the counseling LLM to generate the entire text response, each sentence is immediately forwarded to the TTS module as soon as it is generated. Furthermore, we divide the synthesized audio into 1-second chunks, effectively breaking down the large lip-sync process into smaller tasks. As a result, the client waits approximately 5 seconds before hearing or seeing the first part of the AI counselor’s response, while subsequent response chunks continue to be generated. As denoted by $R$ in Figure 4, our proposed method achieves a substantial reduction in response time compared to the conventional approach. Overall, our proposed method achieves approximately a 64.6% reduction in response time compared to the conventional method. Detailed comparative results are provided in Supplementary Material S2.Enhancing natural dialogue Beyond improving response delivery latency, BetterMood’s interactive UI/UX design significantly enhances the naturalness of dialogues between the client and the AI counselor. To achieve human-level realism, we implement automated speech detection that identifies speech and silence intervals, removing the need for manual microphone activation. As shown in Figure 5, we compute the root mean square (RMS) amplitude of incoming audio and enforce a 1-second debouncing period to distinguish brief pauses from speech termination.⁷⁵ Real-time visual indicators let clients know when their speech is being monitored, encouraging open and deeper engagement.

Figure 5.

Automated silence detection.

Moreover, AI counselor responses are rendered immediately to the client via MediaSource buffering,⁷⁶ minimizing perceptible latency. Alongside this continuous rendering, the AI counselor performs subtle behavioral gestures, such as acknowledging nods or note-taking actions, to emulate the dynamics of human counselors. These combined designs effectively hide residual latency and ensure an authentic counseling experience.

User study

This study aimed to evaluate the practicality and initial user perceptions of the BetterMood. A cross-sectional user study was conducted remotely via the BetterMood platform in South Korea from September to December 2024, involving three distinct cohorts: adolescents, young adults, and professional clinicians. The study was explicitly designed to focus on capturing short-term user experience. Importantly, this study does not assess clinical efficacy or sustained psychological change, but rather participants’ first impressions and satisfaction after a single session. Adolescents and young adults. The process of the study for AYAs is as follows.

Participant recruitment

A total of 120 individuals participated in the user study, comprising 10 adolescents (aged 13–17) and 110 young adults (aged 18–24). Adolescents were recruited through online announcements posted on middle- and high-school bulletin boards within Seoul, whereas young adults were recruited via nationwide university online community boards and social media services. Prior to the study, all participants received comprehensive written guidelines.

Screening

Participants were instructed to complete four self-report questionnaires designed to assess mental health disorders: STAI-X-1 (anxiety),^48–50 PHQ-9 (depression),^51,52 BIS-15 (impulsivity),^53–56 and ASRS-V1.1 Part A (ADHD).^57–59 The following cut-off scores were applied to identify participants as screen positive: STAI-X-1 scores of 52 or higher (elevated anxiety), PHQ-9 scores of 10 or higher (moderate-to-severe depressive symptoms), BIS-15 scores of 39 or higher (significant impulsivity), and endorsement of four or more items on the ASRS-V1.1 Part A (probable ADHD). Participants who met or exceeded at least one of these cut-offs were classified as screen positive; all others were classified as screen negative.

AI counseling on BetterMood

Participants freely conducted a single 10- to 15-minute counseling session on BetterMood addressing their psychological concerns, such as academic pressure, career decisions, family issues, and interpersonal concerns.

User satisfaction survey

After counseling sessions, participants responded to a satisfaction survey designed to assess various aspects of the service. The survey consisted of 9 questions: 7 closed-ended, which were measured on a 4-point Likert scale, and 2 open-ended. The closed-ended questions were designed to evaluate four aspects of BetterMood, with one or two questions for each aspect: (i) Interaction Capability, (ii) Perceived Support, (iii) Usability and (iv) Ethical Safety.⁷⁷ The open-ended questions required participants to freely describe both the strengths and limitations. Detailed information on the survey items is shown in Supplementary Material S4-1.

Among the 10 adolescents, 3 (30.0%) were male and 7 (70.0%) were female. Screening classified 4 adolescents (40.0%) as the screen positive group and 6 (60.0%) as the screen negative group. In the group of 110 young adults, 41 (37.3%) were male and 69 (62.7%) were female. Within this group, 49 (44.5%) were categorized as screen positive, while the remaining 61 (55.5%) were screen negative. Table 2 summarizes these demographic details, and Figure 6 illustrates the distribution of screen positive cases across anxiety, depression, impulsivity, and ADHD for each age group.

Figure 6.

Screening results.

Table 2.

Demographics of adolescents and young adults.

Group	Variable	Category	N	%
Adolescent	Gender	Male	3	30.0
		Female	7	70.0
	Screening Status	Screen Positive	4	40.0
		Screen Negative	6	60.0
Young Adult	Gender	Male	41	37.3
		Female	69	62.7
	Screening Status	Screen Positive	49	44.5
		Screen Negative	61	55.5

As shown in Table 3, two-sided Fisher’s exact test revealed no significant differences in either gender distribution ( $p = 0.744$ , $ϕ$ =0.042) or screening status ( $p = 1.000$ , $ϕ = 0.025$ ) between AYAs. Both $ϕ$ values are well below the 0.10 threshold, indicating negligible demographic imbalance across the two groups.

Professional clinicians. We recruited 8 qualified professional clinicians, certified by the South Korea Ministry of Health and Welfare (MOHW) and the Korean Clinical Psychology Association (KCPA). They engaged in counseling sessions on BetterMood and completed a satisfaction survey. The survey consisted of 14 questions: 12 closed-ended, which were measured on a 4-point Likert scale, and 2 open-ended. The closed-ended questions covered the same four evaluation aspects as those in the adolescents’ and young adults’ survey,that is, (i) interaction capability, (ii) perceived support, (iii) usability, and (iv) ethical safety. Additionally, the open-ended questions required professional clinicians to freely describe both the strengths and limitations of BetterMood. Detailed information on the survey items is shown in Supplementary Material S5-1. Ethical considerations The study was approved by the Korea University Institutional Review Board (KUIRB-2024-0458-01). Participants were provided with detailed information on the purpose and procedures of the user study, and written informed consent was obtained from all participants. For those under 18, written informed consent was obtained from legally authorized representatives (parents or legal guardians), with simultaneous assent from minors before study initiation. The data collected during the user study, including participants’ screening results and satisfaction survey responses, were strictly de-identified and securely stored on an encrypted server. Data analysis Responses to the closed-ended survey items administered to AYAs were analyzed with respect to two independent variables: age group and screening status. Because direct group comparisons are challenging when both variables are considered simultaneously, we applied the Breslow–Day (BD) test for homogeneity and the Cochran–Mantel–Haenszel (CMH) $χ^{2}$ test, controlling for one variable while examining the association between the other variable and the response outcomes. This analytic approach allows us to quantify the relationships between each survey item and the two variables, thereby yielding meaningful statistical insights. On the other hand, inter-rater reliability among clinicians was estimated using the intraclass correlation coefficient (ICC), both for single raters [ICC(2,1)] and for the average of all raters [ICC(2,8)], to assess evaluator consistency.

Results

This section presents the findings from the closed-ended items of the user study, organized into four evaluation aspects: interaction capability, perceived support, usability, and ethical safety. Interaction capability reflects the AI counselor’s ability to understand conversational context and respond appropriately. Perceived support denotes participants’ immediate, self-reported sense of concern being addressed and mood change after a single session. Usability captures the convenience and ease of using BetterMood. Ethical safety refers to any ethical concerns or discomfort reported while interacting with the system. Detailed item-level results are provided in Tables 4 and 5, reporting the proportion of positive responses (“agree” and “strongly agree”). Table 4 presents responses from AYAs, and Table 5 from professional clinicians.

Table 3.

Baseline demographic equivalence between AYAs.

	$p$ -Value	$ϕ$
Gender	0.744	0.042
Screening status	1.000	0.025

Note. $p$ value was derived from Fisher’s exact test

$ϕ$ was computed as $ϕ = (a d - b c) / (\sqrt{(a + b) (c + d) (a + c) (b + d)})$ .

Table 4.

Closed-ended results of user satisfaction survey for AYAs (“agree” and “strongly agree”).

		Interaction capability		Perceived support		Usability		Ethical safety
		$Q_{1}^{AYA}$	$Q_{2}^{AYA}$	$Q_{3}^{AYA}$	$Q_{4}^{AYA}$	$Q_{5}^{AYA}$	$Q_{6}^{AYA}$	$Q_{7}^{AYA}$
Adolescents	Screen Positive	75.0	75.0	75.0	25.0	75.0	75.0	75.0
	Screen Negative	100.0	100.0	100.0	100.0	83.3	100.0	100.0
	Total	90.0	90.0	90.0	70.0	80.0	90.0	90.0
Young Adults	Screen Positive	89.8	87.8	44.9	73.5	71.4	63.3	75.5
	Screen Negative	91.8	90.2	59.0	80.3	83.6	80.3	83.6
	Total	90.9	89.1	52.7	77.3	78.2	72.7	80.0

Note. This table presents the proportion of positive responses(“agree” and “strongly agree”) among AYAs. All values are reported as percentages.

Detailed information on the survey items for AYAs is shown in Supplementary Material S4-1.

Table 5.

Closed-ended results of user satisfaction survey for professional clinicians (“agree” and “strongly agree”).

	Interaction capability			Perceived support			Usability			Ethical safety
	$Q_{1}^{PC}$	$Q_{2}^{PC}$	$Q_{3}^{PC}$	$Q_{4}^{PC}$	$Q_{5}^{PC}$	$Q_{6}^{PC}$	$Q_{7}^{PC}$	$Q_{8}^{PC}$	$Q_{9}^{PC}$	$Q_{10}^{PC}$	$Q_{11}^{PC}$	$Q_{12}^{PC}$
Professional clinicians	75.0	37.5	50.0	25.0	100.0	87.5	75.0	100.0	100.0	0.0	12.5	0.0

Note. This table presents the proportion of positive responses (“agree” and “strongly agree”) among professional clinicians.

All values are reported as percentages. Detailed information on the survey items for professional clinicians is shown in Supplementary Material S5-1.

For ethical safety, lower agreement indicates a more positive outcome, as the statements reflect potential ethical concerns.

The user satisfaction survey items were combined into a single satisfaction scale that showed good internal consistency. To evaluate this, we computed Cronbach’s $α$ and McDonald’s $ω$ for each AYA subgroup (adolescents, young adults) and for the overall sample. Both coefficients assess how consistently the items tap a common construct; values around 0.70–0.79 are generally deemed acceptable, 0.80–0.89 good, and $\geq 0.90$ excellent. The coefficients were high across all groups ( $α = 0.896$ , $ω = 0.914$ for adolescents(n=10); $α = 0.868$ , $ω = 0.875$ for young adults( $n = 110$ ); $α = 0.871$ , $ω = 0.878$ for overall AYAs( $n = 120$ )), indicating strong item-to-item consistency and supporting the scale’s coherence in assessing BetterMood’s core functionality. To minimize recall and social-desirability bias, the questionnaire was administered immediately after each counseling session within the BetterMood service.

BetterMood received high ratings for interaction capability across all participant groups. Among adolescents, 90.0% agreed that the AI counselor’s responses were contextually appropriate ( $Q_{1}^{AYA}$ ), and an equal proportion felt that the counselor listened attentively ( $Q_{2}^{AYA}$ ). Young adults showed similarly high agreement, with 90.9% affirming contextual appropriateness ( $Q_{1}^{AYA}$ ) and 89.1% indicating attentive listening ( $Q_{2}^{AYA}$ ). Professional clinicians also evaluated positively, with 75.0% finding responses contextually appropriate( $Q_{1}^{PC}$ ). These results suggest that users across cohorts generally perceived BetterMood’s ability to engage in meaningful and contextually relevant interactions.

Perceived support received moderate ratings, with notable differences across cohorts. Among adolescents, 90.0% reported perceived concern resolution ( $Q_{3}^{AYA}$ ), and 70.0% indicated self-reported mood change ( $Q_{4}^{AYA}$ ). In contrast, only 52.7% of young adults indicated perceived concern resolution ( $Q_{3}^{AYA}$ ), while 77.3% noted self-reported mood change ( $Q_{4}^{AYA}$ ). Just 25.0% of the professional clinicians perceived immediate support from a single session ( $Q_{4}^{PC}$ ). Nevertheless, all clinicians expressed support for continued research into AI-based counseling ( $Q_{5}^{PC}$ ), and 87.5% acknowledged its potential to enhance access to mental health services ( $Q_{6}^{PC}$ ).

Usability was rated highly positive across all cohorts. Among adolescents, 80.0% indicated willingness to engage with the AI counselor again ( $Q_{5}^{AYA}$ ), and 90.0% would recommend the service to their peers ( $Q_{6}^{AYA}$ ). Young adults also reported high usability satisfaction, with 78.2% expressing willingness to reuse the service ( $Q_{5}^{AYA}$ ) and 72.7% recommending it ( $Q_{6}^{AYA}$ ). Professional clinicians positively evaluate usability as well, with 75.0% finding the system intuitive ( $Q_{7}^{PC}$ ) and 100% confident in independently navigating ( $Q_{8}^{PC}$ ) and using the system ( $Q_{9}^{PC}$ ).

Ethical safety concerns were minimal. Adolescents reported feeling comfortable during sessions ( $Q_{7}^{AYA}$ ) in 90.0% of responses, with young adults indicating comfort ( $Q_{7}^{AYA}$ ) at 80.0%. All professional clinicians reported no unethical experiences or fears of blame ( $Q_{10}^{PC}$ , $Q_{12}^{PC}$ ), and 87.5% explicitly dismissed the possibility of harm from the AI counselor ( $Q_{11}^{PC}$ ).

Stratified analysis

Using age group as a stratification variable, we conducted the Breslow–Day (BD) homogeneity test and the Cochran–Mantel–Haenszel (CMH) $χ^{2}$ test (Table 6). For self-reported mood change ( $Q_{4}^{AYA}$ ), the BD test yielded $p = 0.033$ , indicating that the odds ratios differed across age groups; thus, estimating a common odds ratio was inappropriate. To minimize small-sample bias, we instead fitted a Firth-penalized logistic regression model that included age group, screening status, and their interaction. The model coefficients and age-specific adjusted odds ratios are presented in Supplementary Material S3. For likelihood of recommending( $Q_{6}^{AYA}$ ), the BD test supported the homogeneity assumption ( $p = 0.387$ ). The CMH test produced $χ^{2} (1) = 4.828$ , $p = 0.028$ , with a common odds ratio of $0.392$ (95% CI = 0.169–0.913), demonstrating a significant association between screening status and response. This result showed that the screen positive group reported lower endorsement of the item than the screen negative group. No significant associations were observed for the remaining items ( $Q_{1}^{AYA}$ - $Q_{3}^{AYA}$ , $Q_{5}^{AYA}$ , and $Q_{7}^{AYA}$ ), all of which had CMH $p$ -values $> 0.05$ .

Table 6.

Age-stratified CMH and breslow-day tests for screening status effects (only AYAs).

Question	BD $p$ -value	CMH $p$ -value	Common OR	OR 95% CI
$Q_{1}^{AYA}$ . Contextual Appropriateness	0.270	0.469	0.636	[0.185, 2.184]
$Q_{2}^{AYA}$ . Attentive Listening	0.268	0.462	0.652	[0.207, 2.050]
$Q_{3}^{AYA}$ . Perceived Concern Resolution	0.323	0.095	0.530	[0.251, 1.120]
$Q_{4}^{AYA}$ . Self-reported Mood Change	0.033	0.114	0.517	[0.225, 1.190]
$Q_{5}^{AYA}$ . Likelihood of Re-Engaging	0.903	0.120	0.498	[0.206, 1.202]
$Q_{6}^{AYA}$ . Likelihood of Recommending	0.387	0.028	0.392	[0.169, 0.913]
$Q_{7}^{AYA}$ . Comfortable Level During Conversation	0.313	0.193	0.546	[0.218, 1.366]

Note. BD = Breslow–Day test for homogeneity of odds ratios CMH = Cochran–Mantel–Haenszel test stratified by age group OR = odds ratio

CI = confidence interval. All p-values are two-tailed. Underlined BD $p$ -values and Bold CMH $p$ -values indicate $p < 0.05$ .

Sample sizes: Adolescents $n = 10$ Young Adults $n = 110$ .

Using screening status as the stratification variable, we again applied the BD homogeneity test and the CMH $χ^{2}$ test (Table 7). For self-reported mood change ( $Q_{4}^{AYA}$ ), the BD test yielded $p = 0.033$ , indicating a violation of the homogeneity assumption; detailed results are therefore provided in the previously reported Firth-penalized logistic regression (Supplementary Material S3). In this stratification, perceived concern resolution ( $Q_{3}^{AYA}$ ) satisfied the homogeneity assumption (BD $p = 0.301$ ) and showed a significant association (CMH $χ^{2} (1) = 5.024$ , $p = 0.025$ ), with a common odds ratio of $9.075$ (95% CI $= 1.004$ – $82.020$ ). In other words, after controlling for the screening status, the adolescents differed from the young adults in $Q_{3}^{AYA}$ responses by a factor of more than nine. No statistically significant associations were observed for the remaining items ( $Q_{1}^{AYA}$ , $Q_{2}^{AYA}$ , $Q_{5}^{AYA}$ - $Q_{7}^{AYA}$ ), all of which had CMH $p$ -values $> 0.05$ .

Table 7.

Screening status-stratified CMH and breslow-day tests for age effects (only AYAs).

Question	BD $p$ -value	CMH $p$ -value	Common OR	OR 95% CI
$Q_{1}^{AYA}$ . Contextual Appropriateness	0.270	0.909	0.880	[0.100, 7.738]
$Q_{2}^{AYA}$ . Attentive Listening	0.267	0.944	1.081	[0.124, 9.390]
$Q_{3}^{AYA}$ . Perceived Concern Resolution	0.301	0.025	9.075	[1.004, 82.020]
$Q_{4}^{AYA}$ . Self-reported Mood Change	0.033	0.572	0.648	[0.149, 2.814]
$Q_{5}^{AYA}$ . Likelihood of Re-Engaging	0.903	0.924	1.082	[0.213, 5.507]
$Q_{6}^{AYA}$ . Likelihood of Recommending	0.374	0.245	3.579	[0.396, 32.357]
$Q_{7}^{AYA}$ . Comfortable Level During Conversation	0.309	0.459	2.256	[0.260, 19.599]

Note. BD = Breslow–Day test for homogeneity of odds ratios; CMH = Cochran–Mantel–Haenszel test stratified by screening status; OR = odds ratio

CI = confidence interval All p-values are two-tailed. Underlined BD $p$ -values and Bold CMH $p$ -values indicate $p < 0.05$ .

Sample sizes: Screen Positive $n = 53$ Screen Negative $n = 67$ .

Clinician inter-rater reliability

Eight professional clinicians independently evaluated each counseling session conducted on BetterMood. Inter-rater agreement was assessed with the ICC calculated from a two-way random-effects model. The single-measure ICC (ICC (2,1)) was 0.56, indicating moderate agreement among individual raters, whereas the average-measure ICC (ICC (2,8)) was 0.90, reflecting excellent reliability when scores are aggregated across raters. Thus, while individual ratings vary to some extent, the pooled scores exhibit very high consistency. Full statistics are presented in Table 8.

Table 8.

Inter-rater reliability for clinician ratings (two-way random, absolute agreement).

Model	Unit of analysis	ICC	95% CI	Interpretation
ICC(2,1)	Single rater	0.56	0.44–0.71	Moderate
ICC(2,8)	Average of 8 raters	0.90	0.86–0.98	Excellent

In summary, the primary aim of this study was to assess the AI counselor’s feasibility and user experience as a supplementary tool. Accordingly, all findings are drawn solely from participants’ immediate, self-reported impressions after a single session. The results indicate strong interactive capability, usability, and ethical safety across adolescents, young adults, and professional clinicians, although perceived support varied notably between cohorts.

Discussion

BetterMood serves as an accessible supplementary tool for AYAs. Because our conclusions rest solely on participants’ immediate self-reports after a single session, BetterMood should be viewed as a gateway or adjunct resource, not a replacement for professional care. Acknowledging this limitation, the remainder of the Discussion reviews our quantitative and qualitative findings (expanded in the Supplementary Materials S4-2 and S5-2) and explains how each insight informs system refinements and future research.

Design implications

Interaction capability. Approximately 90.0% of AYAs agreed that the AI counselor’s replies were contextually appropriate ( $Q_{1}^{AYA}$ ) and empathic ( $Q_{2}^{AYA}$ ). A young adult reflected, “When the AI counselor nodded, made eye contact, and moved its mouth in sync with words, I felt deeply immersed and understood, realizing that my struggles were not mine alone”. This comment shows that our approach to building virtual rapport through the AI counselor successfully created empathic counseling experiences. Also, 75.0% of clinicians reported that the AI counselor delivered timely and relevant responses ( $Q_{1}^{PC}$ ). A clinician added, “It was nice to be asked questions by the AI counselor to get me to think for myself and explore my own mind”. This comment illustrates how our prompt design, which integrated psychotherapy and counseling techniques, enabled the counseling LLM to explore user concerns more deeply. Perceived support. AYAs exhibited a pronounced difference in perceived concern resolution ( $Q_{3}^{AYA}$ ). While 90.0% of adolescents self-reported that their concerns were resolved following counseling, only 52.7% of young adults did so. This between-group discrepancy is explored further in the “Interpretation ofsubgroup differences” section. Regarding self-reported mood change measure ( $Q_{4}^{AYA}$ ), both cohorts showed positive responses starting at 70.0%. A representative comment was: “Overall, I felt like the AI counselor listened to me very well, which helped me feel comfortable during the counseling session”. These findings suggest that our emphasis on active listening and empathic responses successfully fostered clients’ psychological comfort. Conversely, professional clinicians took a more cautious stance: only 25% perceived immediate support from a single session ( $Q_{4}^{PC}$ ). One clinician noted: “The AI counselor’s solutions are sometimes broad, which could feel less specialized from a professional perspective”. Nonetheless, all clinicians agreed that continued research into AI counselor development is worthwhile in the clinical domain ( $Q_{5}^{PC}$ ). In summary, while the majority of clients perceived their mood change with the AI counselor, further validation and refinement are necessary to establish its clinical efficacy. Usability. Over 80% of adolescents and 70% of young adults rated BetterMood as easy to use ( $Q_{5}^{AYA}$ and $Q_{6}^{AYA}$ ). Comments highlighted natural language flow and realistic embodiment, for example, “This service demonstrates excellent performance in recognizing and processing speech, and the natural responses of the AI counselor allow the conversation to flow smoothly, creating a sense of comfort during the session.” Clinicians echoed these sentiments ( $Q_{7}^{PC}$ – $Q_{9}^{PC}$ ), emphasizing non-judgmental disclosure: “I felt comfortable sharing my thoughts, knowing that the AI counselor would not judge or evaluate me thoughtlessly.” These comments suggest that our interactive UI/UX design plays a crucial role in creating a natural counseling experience. This observation shows the service’s ease of use and its supportive environment, which encourages open communication. Ethical safety. 90% of adolescents and 80% of young adults felt comfortable with the system’s privacy safeguards ( $Q_{7}^{AYA}$ ). One adolescent pointed out, “There appears to be an advantage in terms of confidentiality $\dots$ at least there’s no chance of a person directly passing my information on.” Clinicians likewise judged the service ethically safe ( $Q_{10}^{PC}$ – $Q_{12}^{PC}$ ). These results indicate that BetterMood provided an ethically safe and confidential environment, and professional clinicians similarly perceived the service as ethically safe.

Interpretation of subgroup differences

Screening status-related difference (age-stratified). According to Table 6, after adjusting for age, the screen negative group tends to endorse likelihood of recommending( $Q_{6}^{AYA}$ ) than the screen positive group. To interpret this pattern, we analyzed the open-ended comments that followed the closed-ended survey. The screen negative group cited the ability to engage in counseling without burden as a primary advantage. One remarked: “It is very positive that counseling can be accessed anytime when needed $\dots$ . For clients who want to receive counseling but still find it difficult to share their stories with others, this could be very helpful $\dots$ .” Conversely, the screen positive group, who were often already aware of their own concerns, tended to prefer more in-depth counseling. One participant noted: “People who already understand their own problems probably wouldn’t turn to an AI counselor $\dots$ . offering perspectives that some users may already find familiar.”

These contrasting perspectives demonstrate a clear difference in how the two cohorts perceive the AI counselor based on screening status. The screen negative group viewed the session as a quick, low-barrier outlet for emotional expression, valuing its accessibility and expressing willingness to recommend it to peers. In contrast, the screen positive group judged that more immediate helpfulness and tailored interaction was necessary, leading them to be more conservative in their likelihood of recommending the service. Based solely on these short-term perceptions gathered immediately after a single 10- to 15-minute interaction, the AI counselor appears most immediately helpful as an emotional outlet for screen negative group, whereas screen positive may still require richer conversation or supplemental human care. Age-related difference (screening status-stratified). After adjusting for screening status in the CMH analysis (Table 7), we identified a single significant age-related difference: perceived concern resolution( $Q_{3}^{AYA}$ ). Adolescents were more likely than young adults to report that the session had solved their problem. As one adolescent remarked, “It helped when the AI said my worries are normal for anyone my age.” This reaction aligns with Erikson’s identity-versus-role-confusion stage, in which reassurance that one’s feelings are normal satisfies the developmental need for a stable sense of self.⁷⁸ Young adults, by contrast, are already negotiating autonomy-driven decisions about work, finances, and relationships. Unsurprisingly, many judged the same dialogue “not that useful,” a response consistent with self-determination theory, which argues that motivation in emerging adulthood depends less on validation and more on opportunities to exercise agency and competence.

Together, these two frameworks clarify why an empathy-first design feels curative to adolescents yet incomplete to young adults. Retaining empathic normalization for adolescents while adding goal-setting prompts, decision aids, and other solution-focused tools for young adults would tailor the chatbot to the distinct motivational priorities of each developmental stage.

Comparative analysis

This section compares BetterMood with open-source LLM-based chatbots (ChatCounselor and MentaLLaMA), focusing on psychotherapeutic methodology that we used for our concern-aware LLM and perceived user experience.

For this comparison, each model (BetterMood, ChatCounselor, MentaLLaMA) served as the counselor role. The client agent was implemented with OpenAI GPT-4o, presenting scenarios across four key youth concerns: academic stress, career choices, familial conflicts, and interpersonal relationship difficulties. For each topic, 25 counseling sessions were conducted, yielding 100 sessions per model. Each session comprised 10 multi-turn interactions (one counselor and one client message exchanged per turn, 10 total per session), resulting in 1000 dialog turns per model.

Each counseling dialogue was then rated across seven core competency domains:

Psychotherapeutic Techniques: –

Problem Solving/Skills Training

–

Normalization and Acceptance

–

Summarization and Restatement

–

Praise, Encouragement, and Reassurance

Perceived User Experience: –

Interaction Capability

–

Perceived Support

–

Ethical Safety

An evaluation agent (OpenAI GPT-4o) assigned scores from 1 to 5 for each domain per dialogue. Table 9 below presents the average scores:

Table 9.

Quantitative comparison results.

Domain	BetterMood	ChatCounselor	MentaLLaMA
Problem Solving/Skills Training	3.06	3.00	2.93
Normalization & Acceptance	4.30	4.02	4.01
Summarization & Restatement	4.00	3.75	3.35
Praise/Encouragement/Reassurance	4.24	3.99	3.90
Interaction Capability	4.02	4.00	3.99
Perceived Support	4.23	4.07	3.98
Ethical Safety	3.99	4.00	3.93

Note. Average expert ratings (Scale: 1–5) for each counseling domain.

BetterMood demonstrated superior performance compared to open-source models across most evaluation domains, particularly excelling in emotional support and empathy, where it achieved the highest scores in normalization, perceived support, and encouragement. This advantage likely stems from targeted prompt engineering specifically designed for affective communication. Additionally, BetterMood showed superior capabilities in summarization and reflection, effectively restating and reflecting user input, which contributed to stronger rapport and enhanced interaction coherence. However, all three models exhibited relatively low performance in problem solving and skills training, indicating that current LLM-based chatbots convey empathy rather than deliver structured behavioral change techniques.

Note: All results from open-source model comparisons are based on automated, multi-session agent-based simulations followed by domain-specific evaluation by OpenAI GPT-4o, and should be interpreted within the context of simulated data.

Policy implications

Recent prohibitions on AI counseling platforms in the United States highlight the need for stronger evidence to guide regulation. Our study contributes to this evidence base by providing the user perspective. While our findings do not address therapeutic outcomes, the high user satisfaction and favorable first impressions we observed suggest that these platforms hold promise and that a categorical ban could preclude a potentially valuable resource for mental health support. Based on our findings, we suggest that policymakers in South Korea and elsewhere might consider a framework that positions AI counselors as adjunctive or introductory tools rather than substitutes for licensed therapists. Policy could focus on standards for usability, safety, ethics, and responsible integration with existing healthcare. For example, regulators could require platforms to clearly label that the service is a support tool, not a substitute for therapy. Additionally, they could mandate the implementation of safety guardrails that effectively triage crisis situations and escalate them to human professionals. This approach could help de-stigmatize and broaden access to care, while ensuring these platforms operate as a safe bridge to professional clinical services.

Limitation

Before outlining the study’s specific limitations, it is worth underscoring that our aim was an early probe into BetterMood’s practicality and the immediate user perception. The results are therefore a foundation, not a final clinical assessment. Viewed in this light, the limitations that follow are not simply shortcomings; they outline the work that lies ahead.

(a) Post-session measures: Although we used self-report screening tools to group participants at the outset, no follow-up assessments with the same instruments were carried out. Therefore, we cannot speak to objective psychological change or therapeutic efficacy; the results reflect only participants’ immediate impressions after a single interaction with the AI counselor.

(b) Single, brief session: Each participant engaged with the system for only 10 to 15 minutes. Such brief, one-time encounters do not capture the broader dynamics of ongoing mental health support or sustained engagement. Accordingly, the study should be viewed as an initial feasibility and user-experience exploration rather than a clinical validation.

(c) User satisfaction questionnaire: The user satisfaction survey employed in this study was a self-developed instrument. To ensure its relevance and content validity, the questionnaire was systematically constructed. Key domains for evaluating AI counselors, including (i) interaction capability, (ii) perceived support, (iii) usability, and (iv) ethical safety, were drawn from the evaluation framework for conversational agents in health interventions proposed by Ding et al.⁷⁷ Based on these domains, specific items were drafted and subsequently reviewed for clarity and relevance by three independent experts in the fields of clinical psychology. However, the instrument has not undergone formal psychometric validation. Therefore, while the items are grounded in established literature and expert consensus, the reliability and validity of the satisfaction scores have not been statistically confirmed, and the findings should be interpreted within this context.

(d) Novelty effects: Our results based on self-reported surveys may be skewed by biases and the novelty effect, since participants’ limited prior exposure to AI mental-health tools could prompt transient curiosity or optimism. Future research should use repeated or longitudinal sessions to control for these effects and capture more stable user perceptions.

(e) Sample size and composition: Although adolescents, young adults, and professional clinicians were included, subgroup sizes, particularly for adolescents and clinicians, were small. Larger and more demographically balanced samples are needed to improve generalizability.

(f) Cultural and linguistic scope: BetterMood was developed in Korean and evaluated mainly with Korean-speaking users. Because its underlying model is GPT-4o, the system can also provide counseling in English (and potentially other languages) when appropriately prompted, and we used this capability for a preliminary English-language comparative analysis. Even so, a full international rollout will still demand careful localization beyond literal translation to reflect diverse expressions, values, and mental-health frameworks. Future work should therefore test usability and perceived helpfulness across a wider range of linguistic and cultural settings to confirm the model’s cross-cultural applicability.

Conclusion

This study presents BetterMood, a human-like AI counseling service. It is designed to address two key limitations of existing AI-driven chat counseling services: (a) tendency toward generic advice, and (b) absence of human-like dialogue. Our first contribution is the development of a concern-aware counseling LLM. This model incorporates psychotherapeutic knowledge into a general-purpose LLM using targeted prompt engineering. Our second contribution is an interactive AI counselor that employs a chunk-based streaming approach, delivering responses synchronized with visual expressions, speech, and gestures in one-second segments, thus enabling more human-like interactions.

Future research needs to incorporate comparative studies with existing chat counseling services to accurately assess BetterMood’s efficacy. Additionally, long-term, multi-session evaluations are essential for understanding sustained mental health outcomes. Recruiting larger and more diverse groups of professional clinicians will further enhance insights into clinical expectations, guiding targeted improvements. Through continuous model refinement, direct comparative analysis, and comprehensive longitudinal evaluations, we aim to advance BetterMood into an effective and reliable supplementary tool for mental health care.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251392294 - Supplemental material for BetterMood: A human-like AI counseling service for adolescents and young adults

Supplemental material, sj-docx-1-dhj-10.1177_20552076251392294 for BetterMood: A human-like AI counseling service for adolescents and young adults by Do Hyung Kim, Soeun Baek, Joonsung Lee, Taehwi Lee, Soyeon Park, Beomchan You, Ji-Won Hur, Minah Kim and Chang-Gun Lee in DIGITAL HEALTH

Footnotes

Acknowledgments

We would like to thank all the participants who took part in this study. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Grant No. 2023R1A2C3003007).

ORCID iDs

Do Hyung Kim

Soeun Baek

Joonsung Lee

Taehwi Lee

Soyeon Park

Beomchan You

Ji-Won Hur

Minah Kim

Chang-Gun Lee

Ethical approval

The study was approved by the Korea University Institutional Review Board (KUIRB) and the approval number was KUIRB-2024-0458-01.

Consent to participate

Informed consent to participate was written.

Contributorship

DHK and CGL conceived the presented idea. DHK, SB, and SP contributed to managing the collection of real counseling data. JWH supervised the generation of synthetic counseling data. All collected counseling data were reviewed by JWH and MK. DHK and SB developed the concern-aware counseling LLM. JL and TL created the human-like AI counselor. DHK and BY contributed to developing the chunk-based streaming methodology. DHK, SB, JL, TL, SP, and BY supervised the user study. All authors participated in drafting, revising, and approving the final manuscript for submission. CGL supervised the overall research project.

Funding

The author(s) disclosedreceipt of the following financial support for the research, authorship, and/orpublication of this article: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Grant No. 2023R1A2C3003007).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental materials for this article are available online.

References

Valentine

Hall

Sayal

, et al. Waiting-list interventions for children and young people using child and adolescent mental health services: a systematic review. BMJ Mental Health 2024; 27: e300844.

Iyer

Boksa

Joober

, et al. An approach to providing timely mental health services to diverse youth populations. JAMA Psychiatry 2025; 82: 470–480.

Hoffmann

Attridge

Carroll

, et al. Association of youth suicides and county-level mental health professional shortage areas in the US. JAMA Pediatr 2023; 177: 71–80.

McBain

Cantor

Kofner

, et al. Ongoing disparities in digital and in-person access to child psychiatric services in the United States. J Am Acad Child Adolesc Psychiatry 2022; 61: 926–933.

Sheikhan

Henderson

Halsall

, et al. Stigma as a barrier to early intervention among youth seeking mental health services in Ontario, Canada: a qualitative study. BMC Health Serv Res 2023; 23: 1–12.

Wang

, et al. Treatment rates for mental disorders among children and adolescents: a systematic review and meta-analysis. JAMA Netw Open 2023; 6: e2338174.

Ghafari

Nadi

Bahadivand-Chegini

, et al. Global prevalence of unmet need for mental health care among adolescents: a systematic review and meta-analysis. Arch Psychiatr Nurs 2022; 36: 1–6.

Potts

Kealy

McNulty

, et al. Digital mental health interventions for young people aged 16–25 years: scoping review. J Med Internet Res 2025; 27: e72892.

Walder

Frey

Berger

, et al. Digital mental health interventions for the prevention and treatment of social anxiety disorder in children, adolescents, and young adults: systematic review and meta-analysis of randomized controlled trials. J Med Internet Res 2025; 27: e67067.

10.

Knapp

Cohen

Kruzan

, et al. Teen perspectives on integrating digital mental health programs for teens into public libraries (“i was always at the library”): qualitative interview study. JMIR Form Res 2025; 9: e67454.

11.

Daniel

Volcko

Bassi

, et al. Exploring youth perspectives on digital mental health platforms: qualitative descriptive study. JMIR Hum Factors 2025; 12: e69907.

12.

Wanniarachchi

Greenhalgh

Choi

, et al. Personalization variables in digital mental health interventions for depression and anxiety in adolescents and youth: a scoping review. Front Digital Health 2025; 7: 1500220.

13.

Stiles-Shields

Ramos

Ortega

, et al. Increasing digital mental health reach and uptake via youth partnerships. NPJ Mental Health Res 2023; 2: 9.

14.

Adjei-Boateng

Ikoh

. Digital mental health interventions for adolescents and young people: evaluating efficacy and accessibility. Cureus 2025; 17: e85943.

15.

Feng

Hang

, et al. Effectiveness of AI-driven conversational agents in improving mental health among young people: systematic review and meta-analysis. J Med Internet Res 2025; 27: e69639.

16.

, et al. Chatbot-delivered interventions for improving mental health among young people: a systematic review and meta-analysis. Worldviews Evidence Based Nurs 2025; 22: e70059.

17.

Jabir

Lin

Martinengo

, et al. Attrition in conversational agent-delivered mental health interventions: systematic review and meta-analysis. J Med Internet Res 2024; 26: e48168.

18.

Jia

. A scoping review of AI-driven digital interventions in mental health care: mapping applications across screening, support, monitoring, prevention, and clinical education. Healthcare 2025; 13: 1205.

19.

Zhang

Lee

, et al. Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. NPJ Digital Med 2023; 6: 236.

20.

Suharwardy

Ramachandran

Leonard

, et al. Feasibility and impact of a mental health chatbot on postpartum mental health: a randomized controlled trial. AJOG Global Rep 2023; 3: 100165.

21.

Yuan

Garcia Colato

Pescosolido

, et al. Improving workplace well-being in modern organizations: a review of large language model-based mental health chatbots. ACM Trans Manag Inf Syst 2024; 16: 1–26.

22.

Siddals

Torous

Coxon

. “t happened to be the perfect thing”: experiences of generative ai chatbots for mental health. NPJ Mental Health Res 2024; 3: 48.

23.

Dosovitsky

Bunge

. Development of a chatbot for depression: adolescent perceptions and recommendations. Child Adolesc Ment Health 2023; 28: 124–127.

24.

Brandtzaeg

Skjuve

Dysthe

, et al. When the social becomes non-human: young people’s perception of social support in chatbots. In: CHI’21: proceedings of the 2021 CHI conference on human factors in computing systems, pp.1–13.

25.

Kuhlmeier

Bauch

Gnewuch

, et al. Designing chatbots to treat depression in youth: qualitative study. JMIR Hum Factors 2025; 12: e66632.

26.

Rogers

. The necessary and sufficient conditions of therapeutic personality change. J Consult Psychol 1957; 21: 95–103.

27.

Yao

Kabir

. Person-centered therapy (Rogerian therapy). Treasure Island, FL: StatPearls Publishing, 2023.

28.

Ort

Moore

Farber

. Therapists’ perspectives on positive regard. Person Center Exp Psychother 2022; 22: 1–15.

29.

Kostenius

Lindström

Potts

, et al. Young peoples’ reflections about using a chatbot to promote their mental well-being in northern periphery areas: a qualitative study. Int J Circumpolar Health 2024; 83: 2369349.

30.

Liu

Cao

, et al. Chatcounselor: a large language model for mental health support. arXiv preprint arXiv:230915461, 2023.

31.

ter Stal

Jongbloed

Tabak

. Embodied conversational agents in ehealth: how facial and textual expressions of positive and neutral emotions influence perceptions of mutual understanding. Interact Comput 2021; 33: 173–187.

32.

Skantze

. Turn-taking in conversational AI systems: challenges and directions. Comput Speech Lang 2021; 67: 101178.

33.

Short

Williams

Christie

. The social psychology of telecommunications. London, UK: John Wiley & Sons, 1976.

34.

Daft

Lengel

. Organizational information requirements, media richness and structural design. Manage Sci 1986; 32: 554–571.

35.

Malouin-Lachance

Capolupo

Laplante

, et al. Does the digital therapeutic alliance exist? Integrative review. JMIR Ment Health 2025; 12: e69294.

36.

Haaf

Schefft

Krämer

, et al. Working alliance and its link to guidance in an internet-based intervention for depressive disorders: a secondary analysis of a randomized controlled trial. Front Psychiatry 2024; 15: 1448823.

37.

. Depression intervention using AI chatbots with social cues: a randomized trial of effectiveness. J Affect Disord 2025; 389: 119760.

38.

Wei

Freeman

Rovira

. A randomised controlled test of emotional attributes of a virtual coach within a virtual reality mental health treatment. Sci Rep 2023; 13: 11517.

39.

Osmanovic Thunström

Carlsen

Ali

, et al. Usability comparison among healthy participants of an anthropomorphic digital human and a text-based chatbot as a responder to questions on mental health: randomized controlled trial. JMIR Hum Factors 2024; 11: e54581.

40.

Liu

Giorgi

Aich

, et al. The illusion of empathy: how AI chatbots shape conversation perception. arXiv preprint arXiv:241112877, 2024. Version 4, last revised 2025-03-06.

41.

Young

Jawara

Nguyen

, et al. The role of AI in peer support for young people: a study of preferences for human- and AI-generated responses. In: Proceedings of the 2024 CHI conference on human factors in computing systems (CHI’24), pp.1–18. Honolulu, HI: Association for Computing Machinery. https://dl.acm.org/doi/10.1145/3613904.3642574.

42.

Sanjeewa

Iyer

Apputhurai

, et al. Perception of empathy in mental health care through voice-based conversational agent prototypes: experimental study. JMIR Form Res 2025; 9: e69329.

43.

Kum

Lee

. Can gestural filler reduce user-perceived latency in conversation with digital humans? Appl Sci 2022; 12: 10972.

44.

Zhang

Tsiakas

Schneegass

. Explaining the wait: how justifying chatbot response delays impact user trust. In: Proceedings of the 6th ACM conference on conversational user interfaces (CUI’24). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3640794.3665550.

45.

Mahmood

Wang

Yao

, et al. User interaction patterns and breakdowns in conversing with LLM-powered voice assistants. Int J Hum Comput Stud 2024; 182: 103406.

46.

OpenAI. Whisper speech-to-text API. https://platform.openai.com/docs/guides/speech-to-text.

47.

ElevenLabs. Elevenlabs text-to-speech API. https://elevenlabs.io/.

48.

Spielberger

Gorsuch

Lushene

, et al. Manual for the state-trait anxiety inventory (form Y1–Y2), Vol. IV. Palo Alto, CA: Consulting Psychologists Press, 1983.

49.

Kim

. The relation between trait anxiety and social abilities: focusing on Spielberger’s STAI (unpublished master’s thesis). Korea University, Seoul, South Korea, 1978.

50.

Lee

, et al. Acute emotional impact of peer suicide and student-related factors. Psychiatry Investig 2024; 21: 1094–1101.

51.

Kroenke

Spitzer

Williams

. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001; 16: 606–613.

52.

Fonseca-Pedrero

Díez-Gómez

Pérez-Albéniz

, et al. Youth screening depression: validation of the patient health questionnaire-9 (PHQ-9) in a representative sample of adolescents. Psychiatry Res 2023; 328: 115486.

53.

Meule

. Cut-off scores for the Barratt impulsiveness scale-short form (BIS-15): sense and nonsense. Int J Neurosci 2024; 134: 1149–1152.

54.

Meule

Mayerhofer

Gründel

, et al. Half-year retest-reliability of the Barratt impulsiveness scale-short form (BIS-15). Sage Open 2015; 5: 2158244015576548.

55.

Lee

Park

, et al. The study on reliability and validity of Korean version of the Barratt impulsiveness scale-11-revised in nonclinical adult subjects. J Kor Neuropsychiatr Assoc 2012; 51: 378–386.

56.

Kattein

Schmidt

Brandt

, et al. Association of increased impulsiveness and internet use disorder in adolescents and young adults with different main activities on the internet. Z Kinder- Jugendpsychiatr Psychother 2021; 50: 17–24.

57.

Kim

Lee

Joung

. The WHO adult ADHD self-report scale: reliability and validity of the Korean version. Psychiatry Investig 2013; 10: 41.

58.

Kessler

Adler

Ames

, et al. The World Health Organization adult adhd self-report scale (ASRS): a short screening scale for use in the general population. Psychol Med 2005; 35: 245–256.

59.

Adler

Newcorn

. Administering and evaluating the results of the adult adhd self-report scale (ASRS) in adolescents. J Clin Psychiatry 2011; 72: e20.

60.

Kim

Lee

. Introduction of child and adolescent mental health services in Korea and their role during the COVID-19 pandemic: focusing on the ministry of education policy. Soa Chongsonyon Chongsin Uihak (J Child Adolesc Psychiatry) 2023; 34: 4–14.

61.

Kim

Lim

, et al. Korean adolescents’ coping strategies on self-harm, ADHD, insomnia during COVID-19: text mining of social media big data. Front Psychiatry 2023; 14: 1192123.

62.

Naver clova note api. https://clovanote.naver.com/ .

63.

Kwon

. A case study on pronunciation self-assessment for Korean learners using speech recognition technology, ‘naver clova note’. Teach Korean Foreign Lang 2023; 69: 1–35.

64.

Beck

. Cognitive behavior therapy: basics and beyond. 2nd ed. New York, NY: Guilford Press, 2011.

65.

Klein

Jacobs

Reinecke

. Cognitive-behavioral therapy for adolescent depression: a meta-analytic investigation of changes in effect-size estimates. J Am Acad Child Adolesc Psychiatry 2007; 46: 1403–1413.

66.

Winston

Lujack

. Supportive psychotherapy, chapter 92. Hoboken: John Wiley & Sons, Ltd., 2015.

67.

Dudley

Bryant

Hammond

, et al. Techniques in cognitive behavioural therapy: using normalising in schizophrenia. J Norw Psychol Assoc 2007; 44: 562–571.

68.

Pinsker

. A primer of supportive psychotherapy. Hillsdale, NJ: Analytic Press, 1997.

69.

Lambert

. Effectiveness of psychotherapeutic treatment. Resonanzen–E-Journal für biopsychosoziale Dialoge in Psychosomatischer Medizin, Psychotherapie, Supervision und Beratung 2015; 3: 87–100.

70.

Gratch

Okhmatovskaia

Lamothe

, et al. Virtual rapport. In:

Gratch

Young

Aylett

et al. (eds) Intelligent virtual agents. Berlin/Heidelberg: Springer; 2006: pp.14–27.

71.

Huang

Morency

Gratch

. Virtual rapport 2.0. In:

Vilhjálmsson

Kopp

Marsella

et al. (eds) Intelligent virtual agents. Berlin/Heidelberg: Springer; 2011: pp.68–79.

72.

Türkgeldi

Özden

Aydoğan

. The effect of appearance of virtual agents in human–agent negotiation. In: AI, volume 3. MDPI; 2022: pp.683–701. DOI: 10.3390/ai3030039. https://www.mdpi.com/2673-2688/3/3/39.

73.

FaceFusion. Facefusion: industry leading face manipulation platform, 2025. https://github.com/facefusion/facefusion. GitHub repository (accessed 19 May 2025).

74.

Zhang

Liu

Chen

, et al. Musetalk: real-time high quality lip synchronization with latent space inpainting. arXiv preprint, 2024. https://arxiv.org/abs/2410.10122v2.

75.

Maas

Rastrow

Goehner

, et al. Domain-specific utterance end-point detection for speech recognition, 2017. https://www.amazon.science/publications/domain-specific-utterance-end-point-detection-for-speech-recognition.

76.

W3C. Media source extensions. https://www.w3.org/TR/media-source/.

77.

Ding

Simmich

Vaezipour

, et al. Evaluation framework for conversational agents with artificial intelligence in health interventions: a systematic scoping review. J Am Med Inform Assoc 2024; 31: 746–761.

78.

Erikson

. Identity: youth and crisis. New York: W.W. Norton & Company, 1969.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.12 MB