Abstract
The present study offers an exploratory analysis of the creative processes and products at the heart of song lyric writing. Drawing on insights from creative metacognition research and work that integrates individual and collective aspects of creativity, we focus on instances where individuals generate artistic outcomes both for themselves and for others. Sixty-three participants (35 musical experts and 28 novices) were invited to complete a series of questionnaires and craft new lyrics for two distinct musical excerpts. They were instructed to write for either themselves (envisioning they would sing the final product) or for another individual (considering someone else would perform). The creativity of the lyrics was assessed using three complementary approaches: (i) self-ratings by participants, (ii) qualitative evaluations by six expert raters, and (iii) algorithmic analysis through distributional semantic modeling. Although we did not find significant differences in creativity between lyrics created for oneself and lyrics created for others, our analysis revealed that participants’ behaviors and states are more predictive of self-assessment and computational ratings of creativity than expert ratings. Furthermore, we identified a robust correlation between expert ratings and computational evaluations of the lyrics’ creativity.
Introduction
Creativity is a fundamental aspect of human cognition and one that has long been a subject of inquiry across a range of scientific and artistic domains. Much research on creativity gravitates around two interconnected issues, namely the evaluation of what makes a product “creative” and the understanding of the cognitive and neural processes involved in creative effort (Abraham, 2018; Benedek et al., 2019). As definitions and understandings of creativity exhibit different properties across cultures and historical periods (Niu & Sternberg, 2006), both orientations necessarily involve a focus on the relationships between single individuals and the society in which they are embedded (Glãveanu, 2013). Additionally, they entail the study of creative metacognition – that is, thinking about one's (creative) thinking that involves the monitoring and regulation of the ongoing creative process (see Jia et al., 2019). One way to investigate the interplay among these key factors of creative thought and action is to examine how the metacognitive dimension of self-perceived creativity influences the relationship between creative evaluation and production, and how this relationship may be affected when the social context (i.e., the spectrum of individuality and collectivity) is manipulated. In other words, research should investigate how social expectations and external judgments shape metacognitive evaluation processes, with particular attention to the alignment or discrepancy between external assessments – such as those made by experts or through computational methods (e.g., automated creativity scoring) – and individuals’ self-evaluations. This comparison between evaluative perspectives can also illuminate how these forms of assessment may rely on different levels of expertise.
To address these desiderata, the present work reports on an original empirical study carried out in a musical context – as we shall see, an ideal context to explore creativity from a range of complementary perspectives (e.g., Antović et al., 2023; Kempf et al., 2024a; 2024b; Schiavio et al., 2024). Arguably, no musical activity can be considered separate from the creative processes that underpin it – whether it involves learning to play an instrument, improvising, composing, or even listening to music (Cook, 2018, Schiavio et al., 2022c). Lyric generation is a particularly interesting case of artistic creativity. Throughout history, individuals with diverse musical backgrounds 1 have demonstrated an ability to blend instrumental music with lyrical content that is original and contextually appropriate – two fundamental aspects of creativity (see Runco & Jaeger, 2012). From an empirical standpoint, this setting offers significant advantages, as participants may feel less intimidated by the process of creating lyrics compared to performing music, which can often evoke heightened levels of anxiety or self-consciousness in both novice and expert performers. However, while the distinction between the products and processes of creativity in music has been increasingly addressed in recent literature (see van der Schyff et al., 2018), framing individuality, collectivity, and metacognition in relation to creativity and musical creativity may require deeper analysis – especially when these concepts are tied to the notion of “assessment.” Therefore, in the following section we will explore individual and collective perspectives on creativity and metacognition in greater depth before introducing the main study. As we will see, framing an empirical task where participants consider either themselves or someone else as the central actor in a creative behavior may offer valuable insights into how these two aspects compare, offering an opportunity to explore the potential for forms of creativity rooted in intersubjectivity to manifest even within a single individual.
Understanding Creativity
Understanding human creativity requires navigating its complex experiential landscape, including examining the overlaps and distinctions between individual and collective aspects of creative thought and action. This distinction has been the focus of previous research on musical improvisation and creativity (Schiavio et al., 2022a; 2022b; 2024) as well as studies in skill acquisition, which increasingly suggest that even seemingly individual activities often involve a “hidden” intersubjective dimension (Schiavio et al., 2019; see also Høffding & Satne, 2021). The core idea is that musical skills are developed and refined within a “community of practice” (Lave & Wenger, 1991), making social and cultural engagement fundamental to their emergence and expression, even in individual contexts. Accordingly, musicians, performers, lyricists, or composers, should not be viewed as isolated creators. This challenges the traditional myth of the lone genius, which has historically placed undue emphasis on individual agency in creativity research (see Cook, 2018; Hill, 2018). Instead, creative ideas and choices arise through continuous interaction with cultural and social environments. However, capturing this dynamic is complex, as it unfolds at multiple levels of awareness: While some creators consciously recognize the influences shaping their work, others may be less aware of or able to articulate these external factors.
It should also be said that individual and collective understandings of creative phenomena have provided important advances to our comprehension of creativity via empirical and theoretical contributions (e.g., Benedek et al., 2020; Feist, 1998; Simonton, 1999). However, seminal accounts of individual creativity have focused on the role of divergent and convergent thinking (Guildford, 1950; Mednick, 1968; Razumnikova, 2013), as well as attentional capacity (Mendelsohn, 1976; Kasof, 1997), aiming to offer systematic characterizations of how creative ideas emerge via processes of problem solving and (re-)combination of existing concepts. Conversely, scholarship that explores the networked realization of creative products and ideas has highlighted the communicative, ecological, organizational, and participatory resources of collective thought and action (Amabile, 1996; Perry-Smith & Challey, 2003; Sawyer & De Zutter, 2009). For both orientations, a wide range of factors is carefully considered to articulate generalizations that inform how different cognitive mechanisms operate and support creative ideation.
Divergent thinking tasks, for example, are often associated with combinations of ideas, whose properties (e.g., coherence, originality, fluency, and flexibility) can be tested with well-known assessment batteries and scoring systems (Runco & Acar, 2012, Saretzki et al., 2024). Research on individual creativity has also adopted a neuroscientific approach; this can offer important insights into the main brain networks associated with various creative abilities, including divergent thinking (Runco & Yoruk, 2014). Studies on collective, or distributed, forms of creativity, instead, often assume that crucial aspects of creative thought and action may be missed in individual accounts (Mockros & Csikszentmhalyi, 1999; Sawyer, 2006, 2012) and therefore tend to emphasize the importance of collaborative processes in understanding the full scope of creative dynamics. Empirical investigations driven by a similar rationale often employ various methodologies, including the use of scales (Silvia et al., 2012), focus group interviews (e.g., Bissola & Imperatori, 2011), and quantitative analyses of behavioral parameters at the group level (e.g., the observed changes in movement coordination between pairs of pianists improvising together reported in Walton et al., 2017).
This constellation of methodologies and empirical strategies offers a preliminary insight into how challenging empirical and conceptual analyses of creative cognition can be. And indeed, “assessment has been a vexing problem for creativity researchers over the decades, in part because creativity research aspires to observe and measure things that are atypical, novel, innovative, and unusual, be they products, ideas, or people” (Silvia et al., 2012). Despite this difficulty, sometimes referred to as the “criterion problem” (Brown, 1989), the field has produced an array of analytical tools that permeate research and theory, often integrating quantitative and qualitative methodologies (e.g., Gaggioli et al., 2020). And while this combination has been more often adopted in contexts of individual creative production, it has also been applied to social contexts, where panels of experts are asked to judge various products and their creative properties (Amabile, 1982).
Along these lines, a number of studies concerned with the creative aspects of art and education have focused on how expressive qualities such as the experience of flow (e.g., Łucznik et al., 2021) and compositional activity (e.g., Biasutti, 2015; 2018) can play out at the level of single agents as well as groups (see also Daikoku et al., 2021; Deliege & Wiggins 2006; Reybrouck, 2006; Wiggins, 2016 for more general approaches to musical creativity and inspiration). So, while classifications of individual and multi-agent creativities are expressed by different vocabularies and theoretical assumptions, they share similar empirical approaches and objects of investigation. The shift of unit of analysis, in other words, does not translate into new heuristics. This might give the impression that the separation between solo and collective creativity relies on a somewhat artificial move:2 That is, although conceptually useful, one may question whether this distinction genuinely helps improve our understanding of creative cognition. Recent research and theory have indeed moved beyond this dichotomy, exploring the overlap between individual and collective creativity looking at both products and processes (Schiavio & Benedek, 2020).
The present work, as anticipated at the start of this section, is in direct alignment with this goal, as it focusses on assessing creativity when individuals generate a new artistic outcome tailored explicitly for themselves while also considering scenarios where they envision another person who could potentially derive value from it. As a result, this research explores a “social” scenario while maintaining its “individual” essence concerning the creative task (see Høffding & Satne, 2021; Schiavio et al., 2022a; 2022b). Also, it offers an in-depth examination of the qualities of creative outputs using participant self-assessments, expert ratings, and algorithmic evaluations. This multifaceted approach bridges the study of processes and products, as well as individuality and collectivity, aiming to engage with the complexity of creative cognition in songwriting, specifically lyric generation. We have chosen this setting for our investigation as throughout history many individuals with diverse musical backgrounds have demonstrated the ability to blend instrumental music with original and appropriate lyrical content, resulting in a tapestry of narratives interwoven into a cohesive artistic whole.
Creative Metacognition
This focus on writing lyrics allows us to delve into the interplay between creativity and metacognition, illuminating the role of reflective and regulatory cognitive processes in the context of artistic production. Metacognition has been a recurring theme in various branches of psychology, with a primary focus on learning, memory, and educational perspectives (Norman et al., 2019). However, as noted by Lebuda and Benedek (2023), there is now a growing interest in exploring metacognition's role in more complex cognitive activities such as reasoning, decision making, and problem solving (see e.g., Jia et al., 2019; Rivas et al., 2022). That said, conceptualizing metacognition remains a challenge, with ongoing debates around its structure, processes, and definition. Again following Lebuda and Benedek (2023), we observe that one of the most established proposals on the structure of metacognition involves two main components: metacognitive knowledge and metacognitive experience. The former contains explicit information related to tasks and strategies stored in long-term memory, while the latter reflects dynamic cognitive activities concerned with the monitoring and control of task performance.
Rhodes (2019) mentioned that a classic way to explore metacognition entails engaging participants in specific tasks, such as dart throwing, text reading, foreign language vocabulary learning, or general knowledge question answering. During these tasks, researchers would gather performance judgments from the participants. These judgments can be obtained through two methods: prospectively, before the criterion task, where participants assess their likelihood of success (e.g., “What is the probability of hitting the target?”), or retrospectively, after completing the criterion task, where participants evaluate the probability of their response being correct (e.g., “What is the likelihood that my answer to this question is correct?”). Through the juxtaposition of these assessments against the actual outcomes of the criterion task, researchers can discern two complementary indicators, one focused on the average distance to criterion measures and the other on correlations with criterion measures. An additional avenue for comparison emerges when participants’ appraisals are contrasted with those of a panel of experts (also known as Consensual Assessment Technique, see Amabile, 1982), thereby facilitating an analysis of both internal and external insight – a method that has worked efficiently in investigations into performance creativity in sport (Gesbert et al., 2022).
However, in domains such as music, the establishment of unequivocal criteria remains elusive. This is particularly evident in the scrutiny of lyrics, where recent advancements in computational assessment, especially the divergent semantic integration (DSI) approach by Johnson et al. (2023), may offer novel avenues for analysis. The DSI approach evaluates how frequently words in a text co-occur next to each other based on a large corpus of text (e.g., all Wikipedia entries). Each word is represented in a multidimensional vector space, where frequently co-occurring words are situated closer to each other. To calculate the DSI score, the average of the distances between each word of a text is computed. Unusual combinations of words are reflected in a higher DSI score, which suggests a more creative text (Johnson et al., 2023). By employing DSI scores extracted from computational models, we could aspire to unveil more potentially “objective” benchmarks for gauging creativity, comparing them with self-assessments and external judgments. This convergence of heterogeneous viewpoints could in turn illuminate the cognitive processes underlying the creation of lyrics. Given the multifaceted significance of lyrics, encompassing social, artistic, emotional, and cognitive dimensions, such research not only enriches the realm of music but also affords insights into the broader panorama of human creativity.
Rationale for the Study
The present exploratory research builds on, and expands, the insights presented in the previous sections aiming (i) to increase our comprehension of both individual and collective creativity across the spectrum of processes and outcomes, and (ii) to examine how metacognitive insights can exert an influence on the dynamics of creative expression. To address this double and deliberately broad objective, the present study reports on an original empirical investigation that delves into the nuanced impact of various self-assessment metrics on novice and expert musicians while also examining correlations between these self-evaluations, computational creativity measures (DSI scores), and external judgments. We devised an experiment wherein participants were invited to generate novel song lyrics under two conditions: (a) envisioning themselves as the performers of the song, or (b) imagining someone else as the performer. After completing the lyric-creation task, participants engaged in reflective self-assessment concerning their emotional state (e.g., their enjoyment of the task), and behavior (e.g., if they took notes when writing lyrics during the trial), alongside providing self-rated scores for the creativity of their lyrics. Complementing this self-report data, we then enlisted the expertise of a number of songwriters to evaluate the creativity of the participants’ lyrics, and we also employed a computational assessment using DSI scores.
This experimental framework serves a twofold purpose. First, it enables us to examine the factors underlying participants’ self-perceived creativity in multiple ways. These factors could relate to the metacognitive dimension that dynamically guides the interaction between creative evaluation and production during the task. Additionally, it could help to explore whether altering the social orientation of the task influences changes in these factors. Ultimately, such work may lead to insights into how individuals incorporate social expectations and judgments into their metacognitive evaluation processes. Second, by juxtaposing computational and expert panel ratings with participants’ self-ratings, we can determine whether the factors influencing self-perceived creativity align with those shaping external evaluations. This comparison can illuminate potential congruence or divergence between these assessments. It should also be noted that our study encompasses both individuals with musical expertise and novices to capture any variances influenced by their level of proficiency. This deliberate inclusion acknowledges the importance of expertise in potential outcome disparities.
Methods
Participants
In the first phase of the study, participants were invited to write and self-assess the lyrics they wrote for two short instrumental pieces of music under two conditions. To do this, we initially recruited 95 participants via a dedicated platform (Prolific, www.prolific.co), social media, and large-group emails. We aimed for a sample size of a maximum of 100 participants to allow for a thorough evaluation of the lyrics’ creativity by our expert raters. Thirty-two participants were eventually removed from our data set as they had not followed the instructions properly. Excluded participants included those who uploaded PDF files without any lyrics (either intentionally or because they forgot to save the file properly), individuals who typed random letters instead of creating lyrics, or participants who typed their lyrics into a single text-field. It should be noted that such high exclusion rates are not uncommon in online experiments (Thomas & Clifford, 2017). Participants in online experiments might aim to complete experiments as fast as possible to increase their earnings, leading to missing and low-quality responses. The final data set consisted of 63 participants (23 men, 39 women, 1 non-binary; age: M = 29.59, SD = 11.41, Range = 18–68). These were divided into two groups: musical experts (n = 35; 13 men, 21 women, 1 non-binary; age: M = 27.06, SD = 11.05, Range = 18–68) and musical novices (n = 28; 10 men, 18 women; age: M = 32.75, SD = 11.26, Range = 20–62). Participants were considered musical experts based on all the following criteria: (i) currently playing a musical instrument (which could include voice); (ii) having at least three years of formal training on a musical instrument (including voice); (iii) engaging in regular, daily practice of a musical instrument (including voice) for at least three years.
For the second phase of the study, in which we evaluated the lyrics created by our participants, we recruited six expert raters (3 men, 3 women; age: M = 30.67, SD = 11.54, Range = 21–49). They had at least 5 years of experience in songwriting (M = 12, SD = 9.04) and a minimum of 7 years of experience in performing music (M = 15.17, SD = 5.19). Expert raters individually rated all the lyrics via an online interface on a six-item questionnaire (see Table A3). The inter-rater agreement across all rated lyrics amounted to at least 70% 3 for each rating item, except for the question asking to assess how well the lyrics go with the music (i.e., can be sung efficiently). For this question, the inter-rater agreement could not be estimated as the fitted linear mixed model was singular, that is, no unique solution could be found for the model's parameters. Inter-rater agreement was assessed by calculating the intraclass correlation coefficient (ICC) as described by Koo and Li (2016). ICC estimates and 95% confidence intervals were calculated using the R package “psych” (Revelle, 2025) based on a mean rating (k = 6), absolute agreement, two-way mixed effect model. All participants gave informed consent, and the study was approved by the ethics committee of the University of Graz.
Tasks and Materials
Stimuli for the study included an instructional video and two short instrumental musical excerpts. As described below, the latter were administered through videos providing a visual representation of the melodic lines. The instructional video had a duration of 4 min and offered a debriefing of the various tasks of the experiment (how to fill out the forms, where to write the lyrics, etc.), showing concrete examples. The two short musical excerpts were composed by the first author for the purpose of the study (available upon request). The first excerpt (“song1”) had a duration of 41 s; it has a melodic line played by the flute, accompanied by piano, strings, and drums. The tempo is moderate (quarter = 71) and is in D minor with a modulation into C major towards the end. The second stimulus (“song2”) has a duration of 53 s and also has its melodic line played by the flute, accompanied by drums and other instruments. The “character” of this latter piece is different from “song1” in that it has a livelier course (quarter = 98), and a major key (it is in D major with a modulation into B minor towards the end). In both cases, participants were invited to write their own lyrics thinking of the melodic line of the flute as a vocal line, on which the created lyrics can be sung. The musical excerpts were created via software MuseScore (www.musescore.org, version 2) as MIDI files. They were then extracted as .wav files and presented to participants along with a video offering a piano-roll visualization of the melodic line (see the videos provided in the repository in the data availability statement). The piano-roll visualization was generated by the software Audacity (Audacity, audacityteam.org). In a piano-roll visualization, notes are displayed as horizontal bars on a grid, where the vertical axis represents pitch, and the horizontal axis represents time. Melody and rhythm of a musical piece are, hence, clearly depicted. The videos of the songs, including the piano roll visualization, were particularly useful for non-musicians as they did not show notation; instead of the notes, they showed a succession of a series of boxes at different heights representing the notes of the melodic line (see Fig. 1). We chose this option because, as outlined below in more detail, in the subsequent phase of the experiment participants were invited to download a PDF file displaying the same visual representation of the songs (including boxes) they saw on video and write their lyrics directly into these boxes. By doing so, we wanted to ensure that participants had a clear sense of the different notes that made up the vocal melodies, so that they would be aware that words must often be broken down into syllables to be sung properly.

Part of the lyrics and the piano-roll visualization of song1 as submitted by one participant.
Procedure
Participants took part in the study remotely, with their own computer. They either received a link via email or accessed the experiment portal through the Prolific participant recruitment platform (Prolific, prolific.co) directly. The study was implemented using the jsPsych JavaScript framework (de Leeuw, 2015); it was administered online using Pavlovia (www.pavlovia.org) and was carried out in two phases. While the first phase involved musical experts and musical novices, the second one only involved expert raters.

Experimental procedure of the first phase of the study. Participants started with a background questionnaire and afterwards ran through two conditions (fully counterbalanced) in which they were asked to create lyrics for two musical excerpts, or songs (i.e., S1 and S2). They had to either write a song for themselves, imagining they would be the singer (SELF), or for someone else, imagining any other person being the singer (OTHER). Each condition featured a different song.
Data Analysis
Our analysis was carried out using R (R Core Team, 2025). For the computation of linear mixed models, we used the package “lme4” (Bates et al., 2015) and applied a sum contrast coding scheme to all categorical variables. Due to the sum contrast scheme, the linear regression tested the difference between the mean across a categorical variable and each value of this variable. In the case of a dichotomous variable, the regression simply tested the difference between the two values of this variable. The marginal R2m and conditional R2c values are calculated by the “sjPlot” (Lüdecke, 2024) package using the method proposed by Nakagawa and colleagues (2017). All reported regression coefficients are standardized. We decided to use linear mixed models in our analysis as we had a twofold aim: to inspect differences in creativity regarding participants’ behavior and state during the task and to estimate whether these factors (measurements of behaviors and states) predict participants’ creativity in writing lyrics. Other measurements (e.g., everyday creativity (AEC) or openness to experience (BFI-O)), were not included in the final analysis as we aimed to explore solely the influence of the participants’ experience during the task. Yet, the experimental design made it necessary to include the condition and the musical expertise (expert/novice) as confounding factors in our analysis. By using linear mixed models and carrying out post-hoc comparisons we could complete both aims using a single model. The linear mixed models and their coefficients are reported in detail in Tables 1–3, while post-hoc comparisons are reported within the running text. The analysis focused on three different levels of creativity assessments: 1) Self-assessment of the creativity of the lyrics, that is, how creative the participants rated their own lyrics for each song; 2) Expert ratings of the lyrics’ creativity, that is, how creative our expert songwriters rated the lyrics of each participant; 3) Algorithmic assessment of the creativity of the lyrics, that is, a computational assessment of the lyrics’ creative features using distributional semantic modelling (Johnson et al., 2023).
Linear mixed model (M.1) to evaluate the association between participants’ creativity self-assessments of their lyrics and their reported behaviors and states during the lyric writing task.
σ2 (Residual variance); τ00 (Between-group-variance); ICC (Intraclass correlation coefficient); Marginal R2m (Explained variance by fixed effects); Conditional R2c (Explained variance by fixed and random effects).
Linear mixed model (M.2) to evaluate the association between the expert creativity ratings of the lyrics and the participants’ reported behaviors and states during the lyric writing task.
σ2 (Residual variance); τ00 (Between-group-variance); ICC (Intraclass correlation coefficient); Marginal R2m (Explained variance by fixed effects); Conditional R2c (Explained variance by fixed and random effects).
Linear mixed model (M.3) to evaluate the association between the algorithmic assessments of the lyrics (divergent semantic integration; DSI) and the participants’ reported behaviors and states during the lyric writing task.
σ2 (Residual variance); τ00 (Between-group-variance); ICC (Intraclass correlation coefficient); Marginal R2m (Explained variance by fixed effects); Conditional R2c (Explained variance by fixed and random effects).
Results
Self-Assessments
Our analysis revealed that participants’ behavior and state during the songwriting task are highly predictive of their creativity self-assessment ratings of their lyrics. As illustrated in Table 1, the independent variables of our model (M.1) explain a substantial proportion of R2m = .17 of the variation in participants’ creativity ratings. Inspecting the factors at play in detail, we found a significant (ß = .29, 95% CI[.10, .47]) main effect of “enjoyment.” Hence, people who enjoyed the music more also rated their lyrics as more creative.
Expert Ratings
We then tried to predict the creativity of the lyrics, as judged by our expert raters (M.2), considering the same predictors we used in model (M.1). As illustrated in Table 2, the independent variables explained only R2m = .05 of the variation of the dependent variable in model (M.1). We found a main effect of “time,” indicating that the time it took participants to complete the task was significantly (ß = .12, 95% CI[.01, .22]) correlated with higher expert creativity ratings. Moreover, our results indicate that musical experts show a significantly weaker correlation (ß = −.13, 95% CI [−.23, −.03]) between the self-rated difficulty of the task and the experts’ creativity ratings when compared to musical novices. A post-hoc comparison revealed that for novices, self-rated difficulty is significantly correlated with experts’ creativity ratings (Novices: Mdifficulty = 0.13, 95% CI [0.01, 0.24]), while for musical experts the correlation was not significant (Experts: Mdifficulty = −0.08, 95% CI [−0.18, 0.02]).
Algorithmic Assessment
Finally, we aimed to explore the performance of a BERT model in rating the creativity of lyrics. Similar to the work by Johnson et al. (2023), we calculated a Pearson correlation between the average expert ratings and the DSI score for each of our participants’ lyrics. We found the correlation between these variables to reach r = .44 (95% CI [.29, .57]). Out of interest we also calculated the correlation between the DSI score and participants’ self-assessment ratings of the creativity of their lyrics, which only amounted to r = .25 (95% CI [.08, .41]). The correlation between participants’ self-assessment ratings and the experts’ creativity ratings was not significant (r = .13, 95% CI [−.05, .30]).
We computed the linear mixed model (M.3) to estimate to what degree the participants’ reported states and behaviors predicted the DSI scores of their lyrics. In model (M.3) the independent variables were found to explain an R2m = .21 of the variation of the DSI scores. We found a significant main effect of musical expertise (ß = .24, 95% CI [.03, .46]), with a post-hoc comparison showing that musical experts exhibited significantly higher DSI scores than musical novices (Experts – Novices: M = 0.49, t(61.7) = 2.286, p = .026). Furthermore, our analysis revealed that self-rated difficulty correlated significantly (ß = .20, 95% CI [.02, .39]) with the DSI score. We also found a significant interaction between musical expertise and self-rated difficulty (ß = −.20, 95% CI [−.38, −.01]). Post-hoc comparisons show that the correlation between self-rated difficulty and DSI scores is only significantly different from zero for novices (Novices: Mdifficulty = 0.22, 95% CI [0.07,0.38]) but not for experts (Experts: Mdifficulty = 0.003, 95% CI [-0.14, 0.15]). We performed additional post-hoc comparisons to inspect the significant interaction (ß = .22, 95% CI [.01, .42]) between musical expertise and taking notes in detail. These analyses showed that among the participants who did not take notes, experts performed significantly better (Experts – Novices: Mnotes(No) = 0.92, t(69.9) = 3.764, p < .001) than novices. Such a difference between musical experts and novices could not be found for the participants taking notes (Experts – Novices: Mnotes(Yes) = 0.05, t(82.4) = 0.158, p = .875).
(Figure 3).

Correlation between the DSI (divergent semantic integration) scores as computed using the BERT model and a) participants’ creativity self-ratings and b) the expert creativity ratings respectively.
Discussion and Conclusion
This study has examined the creative output and experience of musical novices and experts in a song lyric writing task. The central aim was to investigate the factors underlying perceptions of creativity, and this was achieved by comparing self-, expert, and computational assessments of song lyric creativity. Our intention was to study participants’ experiences of the writing task, exploring differences relating to demands of either writing lyrics as though for oneself to sing, or for an imagined other.
A main finding was the positive relationship between participants’ enjoyment of the task and their positive rating of the self-assessed creativity of their song lyrics in model (M.1). This is consistent with research on creative metacognitive feelings (Puente-Diaz & Cavazos-Arroyo, 2018) and creative self-efficacy, defined as belief in one's own creative ability (Puente-Diaz, 2015), and more generally with enjoyment being the strongest motive for creative behavior (Benedek et al., 2020). For example, Puente-Diaz and Cavazos-Arroyo (2018) found evidence that metacognitive feelings such as perceived ease of generating ideas positively influenced task enjoyment via increased creative self-efficacy. It would be interesting to explore creative self-efficacy as a mediator between task enjoyment and confidence in the creative outcome in musical tasks. Also, enjoyment predicted self-assessed creativity but not expert- or algorithmically-scored creativity, highlighting the self-relevant yet partly subjective nature of creative engagement. Of interest here is the possible interrelationship between metacognitive decisions such as whether to engage in notetaking as a tool to support creativity (Lebuda & Benedek, 2023) and creative beliefs such as the extent to which the nature of creativity is, or should be, intuitive or spontaneous. For example, expert participants may have felt as though more spontaneous processes (without the need to make notes) correspond to more intuitive and inspired creative practice. Conversely, musical novices may be less inclined to trust their intuition, having potentially lower degrees of creative self-efficacy; this could mean that they perceived notetaking to have facilitated their creative work, serving as a scaffolding to their outputs.
A further distinction between musical experts and novices that emerges from our findings is the lower correlation between self-perceived task difficulty and creative success for experts than for novices, as found in model (M.2). Although the musical expert participants were not recruited as song writers, their broader musical knowledge of melodic and rhythmic structures may have improved their confidence in setting lyrics to the stimuli and supported their ability to evaluate their resulting creative efforts. In contrast, novice participants without these foundational levels of domain-relevant knowledge may be more influenced by metacognitive factors such as their assessment of task difficulty when reflecting on their creative outcomes. Interestingly, in none of our analyses did we find musical expertise to interact with the social perspective that participants were asked to adopt when writing their song lyrics, that is, as though writing for themselves to sing or as though someone else would sing them. This contrasts with work by Lu and colleagues (2023) that suggests that experts may inhibit social information in creative tasks to focus on more self-centered processes (Lu et al., 2023). Instead, we found that experts and novices did not show any difference in self-ratings of creativity when writing lyrics for themselves or someone else. Of interest here is the perspectival framework outlined by Glãveanu (2015) which emphasizes that creativity is a sociocultural process, involving a dialogue between self/other perspective-taking and self-reflection. He argues that “creative action emerges […] when we are able to ‘move’ (imaginatively and/or practically)” (p. 171) between perspectives. This ability is arguably equally developed in both novices and experts, suggesting that musical creativity can be nurtured from the very beginning of one's musical journey and arguably does not fully depend on musical skills and knowledge (see Schiavio & Nijs, 2022).
Although the role of musical expertise was not relevant for differentiating between groups in self- and expert-ratings, our computational analysis in model (M.3) indicates that musical experts exhibit higher DSI scores compared to non-experts. This observation is particularly intriguing as DSI scores pertain exclusively to the verbal domain, whereas raters also considered how lyrics were embedded within the musical material. This raises the question of whether it is verbal aptitude – such as the ability to manipulate language and compose lyrics – rather than musical proficiency that may account for the observed differences. An alternative explanation could be that effective word setting, such as writing lyrics, inherently involves a sensitivity to various musical elements as rhythm, timbre, contour, and others. Musical experts might be more aware of these musical elements (Carey et al., 2015). These musical considerations might lead lyricists with musical expertise to select less common word conjunctions that better fulfil the musical demands of word setting, which could, in turn, be reflected in the DSI scores.
The results of the computational analysis in model (M.3) reveal that novices who find the task more challenging tend to produce lyrics that are rated as more creative by our algorithm. In contrast, experts exhibit a significantly lower correlation between computational creativity ratings and their output, though this difference ultimately fails to reach statistical significance. Lastly, it is not surprising that experts who did not take notes performed significantly better than novices who also refrained from notetaking (see the results in model (M.3)). Our analyses show how the different assessments of creativity complement each other by picking up different aspects of the creative process: Task enjoyment is linked to increased self-ratings; time spent on the task is positively correlated with expert ratings; and for novices, task effort is associated with higher expert ratings and DSI scores. Overall, there is a greater alignment between self-perceived creativity and participants’ subjective experiences of the task in model (M.1) than with external judgements of lyrical creativity in model (M.2). Even more interestingly, comparing model (M.3) to the other models shows that participants’ subjective experience of the task is better in predicting DSI scores than self-ratings and especially expert ratings of creativity. Yet the assessments of creative success from the expert songwriters are informative (see model (M.2) for the analysis). For example, it is interesting to note that more objective task behaviors, such as time taken to write the lyrics, positively predicted the lyrical creativity judgements of the external songwriters. Expert creativity ratings diverge in their alignment with more subjective measures of perceived task difficulty experienced by the novice and expert musicians: Lower creativity ratings were associated with higher levels of perceived task difficulty for the musical novices, where no such relationship was found for the musical experts. Future research may benefit from the insights afforded by employing the Consensual Assessment Technique (CAT) (Amabile, 1982). Here, a panel of experts is invited to judge the creativity of the musical (in this case lyrical) product, through a process of refining their collective judgement of what may constitute creative success. Qualitative data regarding what is subjectively important for song lyric raters can only enrich our understanding.
In the current study, we instead took a further step toward objectivity through the addition of the calculation of DSI scores. Our main finding was a moderately strong correlation with the expert raters’ evaluations of lyrical creativity and substantially less agreement with participants’ own assessment of creativity. As discussed, it seems as though participants’ assessment of their creative success is strongly influenced by their subjective experience of performing the task. A range of metacognitive factors can be expected to dynamically shape the experience of task success from an initial evaluation of the task characteristics and anticipation of one's own performance, through the on-task evaluation of success, to the retrospective evaluation of creative success (see Lebuda & Benedek, 2023).
To conclude, our study demonstrates how different measures of creativity reflect different aspects of individuals’ experience in creative song lyric writing. An important finding is the relevance of metacognitive experience in the self-assessment of creativity, while participant experience is less related to expert assessments of creative performance. This finding again demonstrates that expert- and self-assessments of creativity should not be seen as exchangeable but rather as complementary measures reflecting different aspects (Kaufman, 2019). Interestingly, the computational evaluation of lyrics using DSI scores is associated with expert-ratings than with self-ratings. Future research may wish to further probe the parallels between the DSI scores of song lyrics and the expert songwriters’ views of lyrical creativity. Setting words to music is a musical task that extends beyond integrating semantic divergent concepts (as assessed by the DSI scores), and there is scope to apply a wider range of computational methods to investigate different dimensions of melodic, harmonic, and rhythmic creativity (see Pearce, 2005).
Footnotes
Acknowledgments
This research was funded in whole, or in part, by the Austrian Science Fund (FWF) (10.55776/P32460). For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
Action Editor
Emily Payne, University of Leeds, School of Music.
Peer Review
Mark Reybrouck, KU Leuven, Department of Musicology.
Mihailo Antović, University of Niš, Faculty of Philosophy.
Contributorship
A.S. and A.K were responsible for conceptualization, methodology, validation, resources, project administration, writing – original draft, and writing – review & editing. A.K. was responsible for software, formal analysis, investigation, data curation, and visualization. A.S. was responsible for supervision and funding acquisition. M.B. and F.B. were responsible for conceptualization and writing – review & editing.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The study was approved by the ethics committee of the University of Graz.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Austrian Science Fund, (grant number 10.55776/P32460).
Data Availability Statement
Notes
Appendix
Questionnaire on which expert songwriters evaluated lyrics.
| Question | Response |
|---|---|
| The lyrics go well with the music (i.e., can be sung efficiently). | Likert – [1 (Strongly disagree), 7 (Strongly agree)] |
| The lyrics are coherent with the music. | Likert – [1 (Strongly disagree), 7 (Strongly agree)] |
| Overall, I find the lyrics creative. | Likert – [1 (Strongly disagree), 7 (Strongly agree)] |
| The lyrics are original. | Likert – [1 (Strongly disagree), 7 (Strongly agree)] |
| The lyrics are surprising. | Likert – [1 (Strongly disagree), 7 (Strongly agree)] |
| Overall, I like the lyrics. | Likert – [1 (Strongly disagree), 7 (Strongly agree)] |
