Abstract
Music psychological research has either focused on individual differences of music listening behavior or investigated situational influences. The present study addresses the question of how much of people's listening behavior in daily life is due to individual differences and how much is attributable to situational effects. We aimed to identify the most important factors of both levels (i.e., person-related and situational) driving people's music selection behavior. Five hundred eighty-seven participants reported three self-selected typical music listening situations. For each situation, they answered questions on situational characteristics, functions of music listening, and characteristics of the music selected in the specific situation (e.g., fast - slow, simple - complex). Participants also reported on several person-related variables (e.g., musical taste, Big Five personality dimensions). Due to the large number of variables measured, we implemented a statistical learning method, percentile-Lasso, for variable selection, which prevents overfitting and optimizes models for the prediction of unseen data. Most of the variance in music selection behavior was attributable to differences between situations, while individual differences accounted for much less variance. Situation-specific functions of music listening most consistently explained which kind of music people selected, followed by the degree of attention paid to the music. Individual differences in musical taste most consistently accounted for person-related differences in music selection behavior, whereas the influence of Big Five personality was very weak. These results show a detailed pattern of factors influencing the selection of music with specific characteristics. They clearly emphasize the importance of situational effects on music listening behavior and suggest shifts in widely-used experimental designs in laboratory-based research on music listening behavior.
Keywords
“What music does to people at different times, why they choose to listen to it so much, and why they choose a particular type of music while engaged in a particular activity – all of these are important unanswered questions” (Konečni, 1982, p. 500)
Recent technical innovations allow the listener to listen to any kind of music in almost any situation, transforming music-listening behavior on two levels. First, engagement with music has become highly individual, and second, people now have the opportunity to listen to music in almost any everyday situation. These developments provide new opportunities for studying individual differences and situational influences of music-listening behavior, reflecting the major questions of the person-situation debate in personality psychology (see Fleeson & Noftle, 2008 for review). Following a synthesis approach, research on human behavior in daily life, including music listening, can potentially provide more reliable results and models by considering both levels of influence.
In music psychology, few studies on music-listening behavior to date have integrated both person-related and situational levels of influence. The following paragraph outlines the findings of those studies that did consider both levels. Krause and North (2017) have used person-related (e.g., sex, age, importance of music) and situational variables (e.g., time of day, activity) to predict music listening in a certain situation, how much choice people had in what they heard, how participants liked the music they were listening to, how engaged they were, and how arousing they perceived the music to be. Randall and Rickard (2017) developed a two-level model of personal music listening (i.e., listening via headphones) with regard to affective changes attributable to music listening. They found that affective changes due to music are almost entirely determined by the situation, whereas individual differences have only marginal effects. Furthermore, Greb, Schlotz, and Steffens (2017) explored the most important person-related and situational variables predicting functions of music listening (i.e., why a person listens to music in a certain situation). By quantifying the relative weight of individual and situational influences, they showed that music-listening functions are primarily attributable to characteristics of the situation. This predominance of situational influences on the goals and effects of music listening gives rise to a number of new questions. For example, what music do people select in order to accomplish their goals in a specific situation? What are the key variables ultimately driving individuals’ music choices? Randall and Rickard (2017) shed some light on these questions by predicting the perceived emotional qualities of music using situational and person-related variables, but their characterization of music chosen by individuals was limited to the affective dimensions of valence and arousal. However, music perception comprises more characteristics, and these might be differentially influenced by situational and person-related variables (e.g., the tempo of a piece of music might be differentially perceived based on situational characteristics). Consequently, the present study focused on predicting a broader variety of subjective characteristics of music selected in daily life situations, such as tempo, melody, and complexity, by integrating variables related to listener, situation, and function of music listening.
Person-related variables
Previous research has found that demographic characteristics of listeners, their personality, musical taste, strength of music preference, and musical training are all potentially relevant variables contributing to music-listening behaviors. Demographic variables such as sex or age have consistently been shown to relate to music-listening behavior in daily life. For example, males under 34 years of age were found to visit live music events more often than females (Eventbrite & Media Insight Consulting, 2016) and also to purchase and download music more often (Aguiar & Martens, 2013). With regard to the functions of music listening, research has consistently revealed that females tend to use music for affective functions (e.g., expressing feelings and emotions), coping, and enhancement (Boer et al., 2012; Chamorro-Premuzic, Swami, & Cermakova, 2012; Kuntsche, Le Mevel, & Berson, 2016), while men tend to use music for cognitive or intellectual reasons (Chamorro-Premuzic et al., 2012). Young people (10–34 years old) show a clear tendency to access recorded music via digital channels such as YouTube, digital streaming, downloads, or online radio (Eventbrite & Media Insight Consulting, 2016) and are more likely to access copyright-infringing music (Avdeef, 2012; International Federation of the Phonographic Industry, 2016). In contrast, people older than 30 years of age are more likely to use legal download sources, to buy CDs, and to listen to music on a CD player or via radio (Avdeef, 2012).
Ferwerda, Yang, Schedl, and Tkalcic (2015) demonstrated several relationships between personality and the way individuals browse and select music from streaming services. For example, individuals scoring high on Openness to experience are more likely to choose mood taxonomies offered by streaming services to browse through music collections, while individuals scoring high on Conscientiousness are more likely to use activity taxonomies. In addition, numerous studies linking personality dimensions (Big Five) with musical taste and preferences for certain musical styles indicate an indirect relation between personality dimensions and music-selection behavior (e.g., Greenberg, Baron-Cohen, Stillwell, Kosinski, & Rentfrow, 2015; Rentfrow, Goldberg, & Levitin, 2011; Rentfrow & Gosling, 2003). This indirect relation is supported by Dunn, de Ruyter, and Bouwhuis (2012), who found positive correlations between individuals’ musical taste and their actual listening behavior in daily life. Also, Greb et al. (2017) showed that fans of blues and jazz music tend to listen to music for intellectual stimulation, while fans of techno and electronic dance music tend to listen to music to move and enhance their well-being. Individuals who consider music to be an important part of their life tend to seek situations that involve music and are also more engaged with music when listening to it (Krause & North, 2017). Furthermore, Elpus (2017) showed that people who received school-based musical training and education are more likely to engage in musical activities such as playing an instrument or singing, while Stratton and Zalanowski (2003) found students majoring in music listened to a greater diversity of music than non-music majors.
Situational variables
Conceptualizing a situation is notoriously difficult; definitions and terminologies consequently vary between different research fields and even within the same field (for reviews see Rauthmann, 2015 or Rauthmann, Sherman, & Funder, 2015). Rauthmann et al. (2015) proposed a taxonomy that differentiates between situational cues (i.e., measurable situational properties such as time or weather), situational characteristics (i.e., the individual perception and experience of situational cues), and situational classes, which are abstract groups or types of situations based on similar cues or characteristics. In terms of this taxonomy, music psychology research on situational influences has mostly focused on cues such as location, activity, presence of others, or time of day.
Previous research has shown that the listening location influences goals and functions of music listening (North, Hargreaves, & Hargreaves, 2004). In addition, the effects of music listening and the experience of music vary by location type (Krause & North, 2017; Krause, North, & Hewitt, 2014). Furthermore, Krause and North (2017) found that type of location predicts the presence of music as well as perceived arousal of the music. Recent research has highlighted a person’s activity while listening to music as the most influential situational variable for explaining how people use music in a specific situation (Greb et al., 2017). In addition, activity has been shown to be an important predictor of the presence of music, a person’s engagement with music, and a person’s experience of the arousing qualities of music in a given situation (Krause & North, 2017). Finally, Randall and Rickard (2017) found a negative association between traveling and perceived valence as well as a positive association between housework and the perceived arousal of the music heard. Research has consistently shown that the functions of music listening vary depending on the presence of others (Greb et al., 2017; North et al., 2004; Rana & North, 2007). For example, people tend to use music to pass the time or to support concentration when they are alone, but they use music to create a particular atmosphere when together with friends (Greb et al., 2017; North et al., 2004). These findings suggest that the presence of others also has an influence on the music chosen in a specific situation. Moreover, several studies have suggested that functions of music listening vary by time of day (Krause et al., 2014; North et al., 2004). For example, North et al. (2004) indicated that music is more likely to be used to help pass time during the workday (8:00 a.m. to 4:59 p.m.) than during the evening (5:00 p.m. to 11:00 p.m.). In another study by Krause and North (2017), participants were less likely to encounter music as the day progressed from morning to evening. It remains unclear whether these variations in the functions of music listening are also associated with specific musical choices, thus prompting the current study.
Besides the above-mentioned situational cues, there are also several concomitant person-related variables influenced by situations. For example, current mood as well as goals and functions of music listening have been shown to strongly vary by situation and also to impact musical choices. Recent daily life research has found a positive association between initial affective state at the moment a person decides to listen to music and perceived affective characteristics of the music selected, while controlling for a broad set of potential covariates (Randall & Rickard, 2017). While these results are supported by findings of several studies that reported similar mood-congruent music selection effects (Skånland, 2013; Thoma, Ryf, Mohiyeddini, Ehlert, & Nater, 2012), they are challenging several theories and an enormous body of research. This research states either that music is selected to moderate arousal to an optimal level (Konečni, Crozier, & Doob, 1976; Konečni & Sargent-Pollock, 1976) or that it is used to reach certain arousal-state goals, such as becoming energized during exercise (North & Hargreaves, 2000; for an overview of these opposing theories see Hargreaves & North, 2010). In general, further research is required to clarify the relationship between momentary mood and the music selected in daily life.
Music listening serves a number of functions beyond mood regulation (for an overview, see Schäfer, Sedlmeier, Städtler, & Huron, 2013). These functions have been shown to predominantly vary between situations (Greb et al., 2017) and to be associated with specific music styles (North et al., 2004). Randall and Rickard (2017) found that functions can be used to make predictions about the affective qualities of music selected at a certain time. More specifically, they found a negative association between the use of cognitive functions of music listening and the perceived (positive) valence of the music selected.
In order to understand the music selected to fulfill the various functions of music listening, the present study aimed to predict the characteristics of the music selected by considering the above-discussed listener and situation variables. We had three specific objectives: To investigate the relative influence of person-related and situational factors on music-selection behavior (i.e., estimating between- and within-person variance). To control for a broad multivariate set of potentially influencing factors (i.e., the variables discussed above, for an overview see Figure 1) as they occur in reality in contrast to previous studies that predominantly have focused on bivariate relations of specific variables and music-listening behavior. To identify key person-related and situational variables that reliably predict music-selection behavior in daily life using a statistical-learning approach that avoids overfitting of the statistical model.

Variables measured in the online survey.
To this end, we conducted an online survey asking participants to sequentially report three self-chosen listening situations typically occurring in their daily lives. For each listening situation, participants answered questions related to the situation, the music heard, and the functions of music listening. In addition, we measured multiple person-related variables (e.g., personality, musical taste).
Using statistical learning methods for variable selection
Given the numerous potentially relevant variables discussed above, we were faced with several challenges. Research consistently has shown that common model selection procedures such as stepwise procedures (including forward, backward, combined forward-backward, all possible subset selection) lead to overestimation of regression coefficients (Chatfield, 1995; Steyerberg, Eijkemans, & Habbema, 1999) and to selection of irrelevant predictors (Derksen & Keselman, 1992). These problems, known as overfitting, are more likely to occur with decreasing sample size (n) to predictor (p) ratio (Babyak, 2004; Derksen & Keselman, 1992). In general, as the number of predictor variables included in a model grows, so does the likelihood of finding relationships in sampled observations which are not present in the actual population (Babyak, 2004). Overfitting relates to the tendency of statistical models to mistakenly fit sample-specific noise (for reviews see Babyak, 2004; Hawkins, 2004) and might be one of the factors underlying the replication crisis in psychology (Yarkoni & Westfall, 2017). An overfitted model is not going to produce reliable predictions on unseen data as it contains relations which are only present in the sample used to estimate the model and not in the general population. Therefore, avoiding overfitting when estimating statistical models was one of our core aims and is one of the primary objectives of the field of statistical learning. In recent years, statistical learning theory has developed several techniques to optimize models for the prediction of unseen data and to reduce overfitting. More specifically, regression regularization methods (also referred to as shrinkage methods) are often used in the context of the problem (Gareth, Witten, Hastie, & Tibshirani, 2015). The Lasso, originally proposed by Tibshirani (1996), has become a popular approach to variable selection in regression. It places a penalty on the regression coefficients, shrinking them all towards zero and sets some coefficients exactly to zero. The Lasso features a tuning parameter λ that controls the amount of shrinkage applied to the coefficients. The value of this tuning parameter is chosen using K-fold cross-validation, a technique of randomly splitting the set of observations into K folds of approximately the same size. Subsequently, K-1 folds (the training set) are used to estimate a statistical model, while the remaining fold (the validation set) is used to compute the mean squared error (MSE). In the regression setting, the MSE is given by
where
The selection of the optimal tuning parameter λopt via cross-validation is based on a number series of λ values (grid). This grid should cover a range from zero, indicating no shrinkage and all predictors included in the final model, to λmax, a value of λ for which all coefficients are set to zero and the model is empty. During the cross-validation process, a K-fold cross-validation error is calculated for each λ-value of the grid. Finally, the λ-value that yielded the smallest cross-validation error is chosen as λopt. The Lasso can therefore be used for variable selection and does not impose the limitations of stepwise selection methods (Tibshirani, 1996; Whittingham, Stephens, Bradbury, & Freckleton, 2006).
As we needed to include numerous specific potentially relevant variables to predict an outcome, we had to address a high-dimensional regression problem (Chapman et al., 2016). In addition, we were not basing hypotheses on specific predictor-outcome associations. Therefore, we used a specific Lasso regression procedure that is suitable for this application as it is robust against overfitting, optimized to make predictions on unseen data, and has been specifically developed for multiple observations within clusters.
Method
Sample
Participants were recruited via mailing lists of German universities, posters at Goethe University Frankfurt, and Facebook. Respondents could enter a lottery to win a 15 Euro voucher for Amazon (chance of winning 1 in 10) as an incentive.
In total, 945 people began the study. Subsequently, 176 participants discontinued participation during the description of the first situation, 133 while describing the second situation, and nine while reporting the third and last situation. Additionally, 40 respondents did not follow the instructions, reporting multiple situations in the first text field. Consequently, we excluded these participants (N = 358; 38% of those who started the study) from the analyses. This exclusion rate is comparable to that of other online studies (e.g., Egermann & McAdams, 2013). The remaining 587 participants (58% female) included in the study had a mean age of 25.4 years (SD = 7.0). This final sample was characterized by rather minor deviations within one SD from age-specific average T-values based on a norm sample using a short version of the Big Five Inventory (Rammstedt, 2007). Despite being statistically significant (one-sample t-tests: all ps < .01), deviations of sample means were minor for Agreeableness (T = 51) and Extraversion (T = 49), while average Conscientiousness (T = 44) and Neuroticism (T = 44) scores were moderately lower, and Openness scores moderately higher (T = 56) than the norm-based average.
Design and measures
The questionnaire covered four areas: the situation, the functions of music listening in the specific situation, music characteristics, and personal information (see Supplemental material online).
The situation section asked several questions about the participants’ ability to choose the music, presence of others, and time of day (see Supplement Section A).
The music individuals listened to in specific situations was characterized via seven-step bipolar rating scales. Specifically, we asked for familiarity (unknown–known), liking (I do not like–I like a lot), and seven musical characteristics, namely: calming–exciting, less melodic–very melodic, less rhythmic–very rhythmic, slow–fast, sad–happy, simple–complex, peaceful–aggressive. These musical characteristics were compiled by a group of experts, including musicologists, music psychologists, and audio engineers, with the objective of easily describing music in daily life. For the purpose of avoiding unsystematic variance in the data, participants alternatively could check unspecific/I do not know for each of these items (see Supplement Section B).
Functions of music listening were measured by factor scores on five factors described by Greb et al. (2017). These factors are based on 22 items capturing a wide range of functions of music listening that could vary across different situations (see Supplement Section C), labeled Intellectual Stimulation, Mind Wandering & Emotional Involvement, Motor Synchronization & Enhanced Well-Being, Updating One’s Musical Knowledge, and Killing Time & Overcoming Loneliness. As previous research has indicated that a listening experience might involve multiple functions (e.g., Greasley & Lamont, 2011), we assessed all functions for each situation.
In addition, we gathered the following person-related information: gender, age, Big Five personality traits using the BFI-10 (Rammstedt, Kemper, Klein, Beierlein, & Kovaleva, 2013), and intensity of music preference measured by a six-item inventory (Schäfer & Sedlmeier, 2009). We also assessed musical training using the third scale of the Gold-MSI consisting of seven items (Schaal, Bauer, & Müllensiefen, 2014) and musical taste via an inventory described in Greb et al. (2017) that captures six taste dimensions: Blues & Jazz (blues, jazz, funk, soul, reggae), Techno & EDM (techno, EDM, house, rap/hip-hop), Other Cultures & Latin (other cultures, Latin, world music, classical), Volksmusik & Schlager (German “Volksmusik” and German “Schlager”), Pop (pop), and Rock & Metal (rock, metal). This inventory also allows participants to indicate if they are not familiar with a certain style of music. For these styles, no liking ratings were collected (see Supplement Section D). For a schematic overview of all variables reported in the present study, see Figure 1.
Procedure
The data were collected through the same survey used by Greb et al. (2017). While Greb et al. (2017) investigated the effect of personal and situational factors on why people listen to music in a specific situation, the current investigation is focused on the effect of situational and personal factors on the actual music that is selected in a specific situation. Therefore, the present study uses another subset of situations and additional variables (i.e., music selected in a specific situation) that were not analyzed by Greb et al. (2017).
Data were collected online (browser-based) through Unipark/EFS Survey software (Questback GmbH). After clicking the participation link or scanning a QR code from a poster, participants were redirected to the online survey. The welcome page informed participants about the general procedure and focus of the study, the voluntariness of participation, their ability to discontinue the study at any time, and the opportunity to take part in a lottery to win a voucher. Thereafter, the task of the survey – to sequentially describe three self-selected situations in which participants typically listen to music – was explained. First, participants were asked to describe the specific situation in a concise sentence with as much as detail as necessary. Then, participants answered questions regarding the situation, the music, and functions of music listening in that specific situation (see Supplement Sections A to C). These three sections were successively answered for each of the three situations. Subsequently, participants reported on person-level variables (Appendix Section D). Finally, if desired, they could provide their email address to take part in the raffle to win the Amazon voucher.
Data analysis
As our aim was to analyze music-selection behavior, we excluded all situations in which participants indicated that they did not have any control about the music present in a given situation (excluded categories: possibility of choice “no” [85 situations] and “unspecific” [94 situations]). The final data included 1,582 situations from 586 participants.
As reported in Greb et al. (2017), each individual situation description was classified into one of 11 activity categories, and listening location was discarded due to high correlations between activity and location categories. Table 1 provides the activity category labels, descriptions, and relative frequencies.
Explanation and descriptive statistics of the 11 activity categories.
Note. Each situation described in free response format (N = 1,582) was classified into one of the activity categories.
Based on the high number of missing values, which were due to the response option of unspecific/I don’t know, we excluded valence (400 missing values, 25% of total data) and arousal (342 missing values, 22% of total data) from the major analysis. We calculated separate analyses investigating the effects of valence and arousal because we expected them to be important variables. The results are reported separately. In addition, we excluded familiarity, liking and calming–exciting from the analysis due to skewed distributions. This finally resulted in six outcome variables considered in the present analysis: less melodic–very melodic, less rhythmic–very rhythmic, slow–fast, sad–happy, simple–complex, peaceful–aggressive. For each outcome variable, we excluded all cases in which participants selected unspecific/I don’t know.
Situational cues, functions of music listening, and characteristics of the music heard were measured three times per person, creating a two-level structure of measures (situations) nested within persons. We therefore used multilevel linear regression modeling, as it allows the inclusion of time-varying (i.e., situation-related) predictors and the analysis of unbalanced designs, while at the same time accounting for non-independence of observations within subjects. Categorical variables were included as dummy variables (coded as 0, 1). All within-person predictors (i.e., all responses that were measured separately for each situation) were centered at each person’s mean to avoid any confounding effects with between-person variability (Enders & Tofighi, 2007).
As one of our aims was to identify the most important variables predicting music-listening behavior (i.e., musical characteristics people choose to listen to) and due to the high number of independent variables (Figure 1) we used a percentile-Lasso regression method for generalized linear mixed models. Recent research has shown that the optimal value of the tuning parameter λ (λopt) chosen by cross-validation (and therefore also the final model) is extremely sensitive to the fold assignment of the cross-validation procedure (Krstajic, Buturovic, Leahy, & Thomas, 2014; Roberts & Nowak, 2014). To overcome these limitations, we implemented the percentile-Lasso method proposed by Roberts and Nowak (2014). This method deals with the problem of fold sensitivity by using repeated cross-validation, leading to less variation in λopt. In detail, the percentile-Lasso selects λopt from a set of optimal values (derived from each cross-validation cycle) by calculating the θ-percentile of this set. In most circumstances, θ = 0.95 produces good and reliable results (Roberts & Nowak, 2014). In addition, the percentile-Lasso allows the implementation of the “one-standard-error” (1-SE) rule to select λopt. The main purpose of the 1-SE rule, as proposed by Hastie, Tibshirani, and Friedman (2009), is to choose the most parsimonious model whose accuracy is comparable with the best model. The 1-SE rule is applied by selecting the largest value of λ whose corresponding cross-validation error is within one standard error of the minimum cross-validation error as λopt.
In our data analysis, we repeated 100 ten-fold cross-validations. For each cross-validation cycle, the optimal value of λ according to the 1-SE rule was calculated. From this set of 100 potentially optimal values, the 95th percentile was selected as the final λopt. For each outcome variable, we determined the value of λ for which all coefficients were set to zero (λmax) by successively increasing λ by 1 until the condition was met. 1 Then, an individual λmax value was taken as the maximum grid value for each model. We used a grid length of K = 100 and an exponential form for the grid to achieve higher resolution of values towards 0. More specifically, we used the following grid for all models:
where λk denotes the k-th element of the grid, K is the grid length, and λmax the value of λ where all predictors were set to zero. As suggested by Tibshirani (2013), we calculated the null space of each predictor matrix and found the null vector for all matrices. This ensured that the Lasso solutions were unique.
We applied this procedure to each outcome variable separately, leading to six final models. All calculations were performed using the glmmLasso package (Groll, 2017) within the development environment R-Studio (RStudio Team, 2015) of the software R.3.0.2 (R Core Team, 2015). For our categorical variables (which were entered as dummy-coded variables), we used a group Lasso estimator as proposed by Groll and Tutz (2014). It applies the same amount of shrinkage to all dummy variables that constitute one categorical variable (e.g., the variable time of day is constituted by early morning, morning, noon, afternoon, evening, and night). Therefore, the Lasso either completely includes a categorical variable (i.e., all constituting dummy variables) or completely excludes it from the final model (for more detailed information see Meier, Van De Geer, & Bühlmann, 2008; Yuan & Lin, 2006). Estimation of p-values for non-zero coefficients was based on re-estimation and Fisher scoring as implemented in glmmLasso (Groll, 2017).
In accordance with Roberts et al. (2016), we took the nested structure and the number of data points per participant into account when randomly splitting the data into 10 folds (i.e., into training and validation sets) for cross-validation. We decided to randomly split our data at the level of the individual (Level 2). Therefore, any training and validation set contained measurements from the same person, and the models were optimized to predict values of unseen individuals. This approach does not allow the inclusion of random effects of Level 1 predictors but should lead to highly reliable fixed effects. We calculated the repeated cross-validation error as the mean of the cross-validation error across 100 repetitions as a measure of fit index. This index is small if the predicted responses are close to the true responses. In addition, we calculated marginal R2 as proposed by Nakagawa, Schielzeth, and O’Hara (2013) after re-estimating the final model using the lme4 (Bates, Maechler, Bolker, & Walker, 2015) and the MuMIn (Barton, 2016) packages. Marginal R2 indicates the proportion of variance explained by the fixed effects.
Results
Situational vs. person-related influences on characteristics of music selected
Intra-class correlation coefficients (ICCs) based on an intercept-only model for each musical characteristic are shown in Table 2. Intra-class correlation coefficients indicate the amount of variance attributable to person-related and situational levels. For the six musical characteristics studied here, ICCs varied between .09 for fast–slow and .32 for peaceful–aggressive. The ICC for fast–slow indicates that between-person differences accounted for 9% of the variance, while within-person differences between situations accounted for 91% of the variance. Across all models, between-person differences on average accounted for 23% and within-person differences between situations for 77% of the variance, signifying high variability within individuals and the potentially important role of situational characteristics in the music selections of individuals.
Multilevel estimations of within- and between-subject effects for musical characteristics. Predictors selected by the percentile-Lasso with repeated 10-fold cross-validation (CV) (see text for details).
Note. SE = standard error; ICC = intra-class correlation coefficient; CV = cross-validation; EDM = electronic dance music; SD = standard deviation.
a n = 1,318 observations within 547 persons. b n = 1,330 observations within 547 persons. c n = 1,270 observations within 537 persons. d n = 1,196 observations within 525 persons. e n = 1,210 observations within 524 persons. f n = 1,262 observations within 536 persons. g 0 = female; 1 = male.
*p < .05. **p < .01. ***p < .001
Predicting characteristics of music selected
Figure 2 shows the coefficient paths of the percentile-Lasso and λopt based on repeated cross-validation for the six musical characteristics, illustrating how coefficients of predictors tend towards zero with a growing amount of shrinkage (i.e., with growing λ). When a predictor is set to zero, it is eliminated from the model. When λmax is reached, all coefficients are set to zero. For the musical characteristics melodic and rhythmic, only one predictor was selected, while multiple predictors were included for the other models. The development of regression coefficients also illustrates their interdependence. More specifically, some coefficients rise when other coefficients are set to zero.

Coefficient paths of the percentile-Lasso models for six musical characteristics.
Table 2 shows the maximal grid values (λmax), the optimal tuning parameter λopt, the repeated cross-validation error, marginal R2 , and the estimations of regression parameters for predictor variables included in the six models. The repeated cross-validation error varied between 1.45 for sad–happy and 1.97 for simple–complex, and marginal R2 ranged from .35 for slow–fast to .04 for melodic. Whereas the cross-validation error of sad–happy indicates the best model in terms of predictions on unseen data, the model slow–fast had the highest proportion of explained variance, with the largest marginal R2 . The number of selected variables fell between 1 for melodic and rhythmic and 13 for complex. On the level of situational variables, functions of music listening were included in all six models, degree of attention in four models, and activity and presence of others in three models. Variables most often included on the person-related level were musical taste (included in three models) and intensity of music preference (included in two models). In contrast, personality traits and gender were only present in one model each, while age and musical training were not included in any model. The following sections provide a more detailed overview of the predictors included in each of the six models separately for situational and person-related levels.
Situational variables
The five factors of functions of music listening was the only group of variables included in all six models. When participants reported listening to music for intellectual stimulation, they tended to listen to more melodic, less fast, less happy, more complex, and less aggressive music. Mind wandering and emotional involvement was related to less happy and more complex music. Participants tended to choose more rhythmic, faster, happier, and more aggressive music when wanting to move and enhance their well-being. Updating one’s musical knowledge led to faster, happier, less complex, and more aggressive music choices. Slower and less aggressive music was used to pass the time and overcome loneliness.
With regard to the activities included in the six models, the analyses revealed several findings. Music reported for working or studying was less fast, less happy, and more peaceful. For relaxing and falling asleep, participants reported listening to slower, less happy, and less aggressive music. While exercise was associated with faster and more aggressive music, coping with emotions was related to less fast, less happy, but also more aggressive music.
Participants reported a tendency to listen to slower, less happy, and more peaceful music when alone. Situations in which others were present (without communication) showed a similar pattern, differing only in a faster tempo of the music in comparison to that chosen when alone.
Given freedom of choice, participants were likely to select more complex music. In contrast, listening to the radio was associated with less complex music choices.
Moreover, the degree of attention participants reported to pay to the music was related to faster, less happy, more complex, and more aggressive music. However, the relationship between the degree of attention and the happiness of the music did not reach significance in the re-estimation step.
The time of day was only included in the predictive model of peaceful–aggressive, indicating that listening to music in the afternoon was related to more aggressive music choices, whereas music listening in the evening was associated with less aggressive music.
As mentioned in the data analysis section, we repeated the complete analyses with the data set, including valence and arousal to determine whether they would be selected by the percentile-Lasso. This analysis revealed valence and arousal to be included in two models. Reported valence (positive mood) at the moment of the decision to listen to music was associated with happier (β = .21, p < .001) and more complex music (β = .08, p = .02). When participants reported relatively high arousal when deciding to listen to music, they tended to select faster (β = .10, p < .001) or more aggressive music (β = .07, p = .02).
Person-related variables
Musical taste factors were included in three out of the six models, revealing several individual differences. In detail, participants who endorsed enjoying Blues and Jazz tended to listen to slower music, while fans of Techno and EDM reported a tendency to listen to faster and less complex music. Whereas fans of Pop and Volksmusik and Schlager tended to listen to less complex music, participants who reported liking Rock and Metal were disposed to listen to music with increased tempo, higher complexity and more aggressiveness. Participants with high intensity of music preference reported listening to faster and more complex music. The personality traits of Openness to experience, Agreeableness, and Neuroticism remained in one model only, predicting the selection of simple versus complex music. Specifically, participants scoring high on Openness to experience tended to listen to more complex music, while those with high Agreeableness and Neuroticism scores leaned towards less complex music. Finally, men reported listening to more aggressive music than women.
Discussion
This study investigated the relative influence of person-related and situational factors on music-selection behavior in daily life by integrating a broad set of potentially important variables in comprehensive models. A statistical learning procedure (percentile-Lasso) optimized for predicting unseen data was used to identify the key variables of both levels influencing the selection of music with defined characteristics by individuals within specific, comprehensively characterized situations. Findings demonstrated that the characteristics of music selections predominantly varied within persons, that is, between situations. However, both the relative contribution of situational and individual effects as well as the number of predictor variables contributing to music selection varied, indicating that some characteristics mainly vary between situations while others are more affected by individual differences. Notably, functions of music listening was the only group of variables that was included in each model, and hence can be seen as the most important situational variables with regard to a broad set of characteristics of music selected in specific situations. Although less broadly represented, musical taste factors was also found to be an important group of variables explaining individual differences in music-selection behavior in three out of six models. Taken together, 29 situational and 14 person-related predictors were found to contribute to the prediction of unseen data, clearly reflecting the importance of variance attributable to situational differences. Due to the fact that all models were optimized to make predictions on unseen persons, the effects found should be highly reliable.
The significance of situational factors found in the present study is consistent with current research showing that functions of music listening and affective changes in response to music are mainly influenced by the listening situation (Greb et al., 2017; Randall & Rickard, 2017). For example, the ICC of .18 we found for the sad–happy outcome variable is close to findings from a recent experience sampling study by Randall and Rickard (2017), who reported an ICC of .14 for valence of music selected (negative–positive). This highly situational selection behavior might be explained in part by recent technological developments that provide music listeners with high degrees of freedom for listening to all kinds of music in almost any situation.
The detailed patterns uncovered by the present investigation suggest that people’s music-selection behavior is mainly driven by the functions of music listening, degree of attention a person pays to the music, current activity, and the presence of others while listening. These findings are partly consistent with Randall and Rickard (2017), who demonstrated strong associations between functions of music listening, activity, and the actual music selected. Randall and Rickard (2017) also found cognitive reasons for listening – which are broadly comparable to our intellectual stimulation factor – to be associated with the selection of less positive/happy music.
Our finding that musical taste was an important variable explaining individual differences of music-selection behavior complements findings by Dunn et al. (2012) who reported positive correlations between liking for musical styles and listening durations for these styles. Our results indicate that musical taste (measured via liking for musical styles) is also related to preferences for certain characteristics of music listened to in daily life. Nevertheless, the amount of variance attributable to between-person differences for all musical characteristics was lower than the amount of variance attributable to situational differences. This contradicts the common belief that individuals’ music-selection behavior is mainly driven by musical taste.
The fact that Big Five personality traits were only selected in one out of six models indicates a rather weak association between personality traits and music-selection behavior in daily life. This finding is in line with a recently conducted meta-analysis by Schäfer and Mehlhorn (2017) showing that Big Five personality traits cannot substantially account for variance between individuals in musical taste and preferences. We found associations only between personality traits and the selection of complex music. Our finding that Openness to experience is positively associated with the selection of complex music is consistent with Schäfer and Mehlhorn (2017) who demonstrated a positive correlation between Openness and the liking for more complex musical styles.
The current study focused on musical characteristics selected in specific situations. Hence, we could not determine which style of music people selected in everyday life, so further research is needed in this area. This would aid in examining how people differ in their selection with regard to different styles and also check for within-style variability (e.g., Rentfrow et al., 2012). It may be that a person constantly listens to a favorite style of music but selects music with different musical characteristics within that style based on the situation. Nevertheless, Rentfrow et al. (2012) conclude that individual differences in musical preferences are largely based on sonic characteristics of the music. From this, one would also expect large individual differences with regard to musical characteristics selected in daily life. This is contrary to our findings, which show rather small individual variations.
Results from our separate analysis of the role of current mood on music-selection behavior complement the findings by Randall and Rickard (2017), who demonstrated that people generally tend to select mood-congruent music. We found positive associations between valence (positive mood) and the selection of happier and more complex music, as well as between arousal and the selection of faster and more aggressive music. These four musical characteristics go beyond the analysis of music selection by Randall and Rickard (2017) that limited its measurement to perceived valence and arousal of the music. Nevertheless, the characteristics found to be associated with current mood in our study can be interpreted in the framework of valence and arousal: happier music is likely to be perceived as more positive, while faster, more aggressive, and more complex music is likely to be perceived as more arousing. From this perspective, our results reflect mood-congruent selection of music. In contrast to Randall and Rickard (2017), however, not all of our outcome variables were associated with current mood. For example, current mood was not related to the selection of more melodic or more rhythmic music in our analysis. This might be due to our more differentiated measurement of characteristics of music selected (six musical characteristics) compared to perceived valence and arousal of the music as used by Randall and Rickard (2017). In general, our findings provide a detailed picture of the relationship between current mood and music selected and largely support the notion that people select mood-congruent music. This conclusion is also supported by the finding of a negative association between coping with emotions and the selection of less happy music in our study.
Interestingly, person-related variables were included in just three models (slow–fast, simple–complex, peaceful–aggressive). As demonstrated by ICCs, the models of music complexity and aggressiveness showed the strongest associations with individual differences, and the model predicting selection of fast music showed the highest amount of variance within individuals (i.e., a minimum of between-person variance). This raises the question as to why no person-related predictors were selected in the remaining models (less melodic–very melodic, less rhythmic–very rhythmic, sad–happy) despite considerable between-person variance in these outcomes. It is likely that highly relevant traits for these outcome variables were not represented by our measures of individual differences. For example, there is some evidence that trait empathy is associated with the selection of sad music (e.g., Vuoskoski, Thompson, McIlwain, & Eerola, 2012) and that alexithymia may explain individual differences in the perception of emotions expressed by music (Taruffi, Allen, Downing, & Heaton, 2017).
Another remarkable result was the varying number of predictor variables included in each model. The extreme parsimoniousness of the models predicting the selection of very melodic or very rhythmic music might indicate an important role of individual differences. Some situational associations for those two variables might vary between individuals, which could be accounted for by including random slope parameters in the mixed-effects regression models. These individual deviations from the overall slope means might be best explained by cross-level interactions (i.e., person x situation interaction effects). For instance, individuals scoring high on Extraversion might tend to listen to more complex music while working and studying, while persons scoring low on Extraversion might tend to select simpler music (Furnham & Allass, 1999). We decided against the inclusion of random slopes and interaction effects on the basis of very limited numbers of observations within participants in our sample (max. three data points per participant), which would make model estimation unstable and potentially unreliable. Hence, future research could benefit from the inclusion of random slopes, implying that a larger number of situations should be sampled per individual.
The variation of repeated cross-validation errors and marginal R2 values across the different models clearly shows that high R2 values are not necessarily associated with small repeated cross-validation errors (i.e., good predictions on unseen individuals). For example, while the model predicting the selection of slow–fast music revealed the highest marginal R2 of .35, the model showing the best prediction on unseen individuals (sad–happy) revealed a marginal R2 value of .23. In addition, the two models melodic and rhythmic, both of which contained only a single predictor, yielded comparable or even slightly better repeated cross-validation errors than the two models predicting complex and aggressive music (both containing several predictors). On one hand, this highlights the importance and reliability of the single predictors in the models melodic and rhythmic. On the other hand, it might indicate slightly overfitted models for complex and aggressive, despite our use of the 1-SE rule that protects against overfitting.
In addition, the present investigation demonstrated that innovative statistical learning techniques can effectively be used to inform psychological research. We believe that the analysis of intensive longitudinal data from studies of daily life that include large numbers of potentially interacting variables would strongly benefit from such techniques. For example, using cross-validation methods could lead to higher reliability of variable selection due to avoidance of overfitting. The concept of optimizing models by predicting unseen data is a core strength of statistical learning procedures. The use of such methods prevents the researcher from overfitting by optimizing R2 and therefore is likely to result in more precise estimation of effects. In addition, R2 values represent better estimations of the true values in the general population of interest (for an overview, see Yarkoni & Westfall, 2017). This characteristic of statistical learning procedures partially explains the rather low marginal R2 values of some of our models, and is likely to be a consequence of more precise estimations.
As mentioned in the introduction, defining what constitutes a situation is a difficult endeavor. Following the taxonomy proposed by Rauthmann et al. (2015), current research clearly shows the significance of situational characteristics (i.e., the individual perception and experience of situational cues) for the prediction of human behavior (Sherman, Rauthmann, Brown, Serfass, & Jones, 2015). On a higher level, situational classes form abstract groups or types of situations based on similar cues or characteristics. This study, as well as most of the other studies dealing with situational influences on music listening, used measurements of situational cues and characteristics to investigate situational effects. However, it might be more beneficial to attempt to cluster situational cues and characteristics into situational classes. By combining several situational cues and characteristics, such classes could provide a more abstract and condensed form of situational variable. These could then be used to make predictions about music-listening behavior, thereby saving the researcher from interpreting seemingly endless single associations between certain situational variables and behavioral outcome variables of interest. In addition, some situations are normatively related to specific functions of music listening and to specific music characteristics. For example, music in a dance club is intended to evoke movement, and it is very likely to be rhythmic and fast. From this perspective, a more abstract level of situation, as given by situational classes, would provide an opportunity to clearly differentiate such normative situations from situations in which people have greater freedom to choose music.
Our study comes with a number of limitations. First, our data result from retrospective self-report and are therefore vulnerable to memory effects, social desirability, and other biasing factors. This also implies that ecological validity might be limited, even though the reports were based on daily life situations. As mentioned earlier, we collected a maximum of three data points per participant. While this allowed us to estimate within-subject effects (i.e., situational effects), additional data points would have led to more precise estimations with potentially higher representativeness for participants’ daily lives. This limitation was deliberate in order to minimize the time required to complete the online survey and avoid threats to data quality. Although we asked participants to describe listening situations that typically occur in their daily lives, we do not know how representative the three situations were of a participant’s actual behavior. Hence, future research should replicate our findings using methods with higher ecological validity and better representativeness of situations, such as ambulatory assessment or related methods (Hektner, Schmidt, & Csikszentmihalyi, 2007; Randall & Rickard, 2013; Shiffman, Stone, & Hufford, 2008; Trull & Ebner-Priemer, 2014). Such methods usually collect momentary data in participants’ daily lives; momentary reports are virtually unaffected by memory effects and provide intensive longitudinal data with potentially high representativeness (Mehl & Conner, 2012). In addition, the use of such methods will provide more complete situational data compared to our approach of measuring recollections of typical situations, as we had to offer an unspecific response option for some variables, which resulted in a relatively high proportion of missing values.
Second, the present study relates to the measurement of music characteristics, which was based on participants’ reports. As the perception of these characteristics might vary between individuals (e.g., Taruffi et al., 2017), future research should broaden the measurement of music selected by supplementary measures, such as objective musical features obtained by music-information retrieval (e.g., loudness, tempo) or musical styles selected. This could offer further insights and would provide answers to additional questions, such as: Do subjectively reported characteristics correlate with objectively derived characteristics of music selected? Do fans of certain styles of music predominantly listen to their favorite styles in everyday life? However, individual music selection is based on individual perception. Therefore, subjective measurements such as those applied in our study should be complemented, but still included, in future studies investigating music-selection behavior.
Third, due to the fact that, to the best of our knowledge, no package or software solution exists that is able to perform a Lasso regression on a multivariate multilevel model, our approach does not account for covariations between our six outcome variables. Hence, it is important to mention that our results of modeling predictors of different musical characteristics are based on independent models. A single multivariate model might lead to slightly different results.
Taken together, the present study demonstrates that music-selection behavior strongly varies between situations within individuals. This situational variability was best explained by situation-specific functions of music listening, while musical taste was found to be the most important variable explaining differences on the individual level. In general, a better understanding of which music people listen to in different situations to accomplish certain listening goals might help experimental researchers to properly select music for the investigation of specific functions or effects of music listening. Future research should integrate situational variables into research design in order to provide optimal conditions for investigating specific effects of music as well as to increase the reliability and external validity of results.
Supplemental material
Supplemental Material - Understanding music-selection behavior via statistical learning: Using the percentile-Lasso to identify the most important factors
Supplemental Material for Understanding music-selection behavior via statistical learning: Using the percentile-Lasso to identify the most important factors by Fabian Greb, Jochen Steffens and Wolff Schlotz in Music & Science
Footnotes
Acknowledgements
We would like to thank Andreas Groll for his advice in using his glmmLasso package. We also thank Melanie Wald-Fuhrmann and the three reviewers for their critical reading of earlier versions of the manuscript. Finally, we are thankful to Claudia Lehr, Ingeborg Lorenz, and Mia Kuch whose support has been of value for the realization of the study.
Contributorship
FG and JS designed the study and collected data. FG developed and performed the statistical analysis in conjunction with WS. FG wrote the first draft of the manuscript. All authors interpreted data, reviewed and edited the manuscript, and approved the final version of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
All experimental procedures were ethically approved by the Ethics Council of the Max Planck Society, and were undertaken with informed consent of each participant.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Peer review
Amanda Krause, University of Melbourne Faculty of VCA and MCM.
David Greenberg, City University of New York.
Philippe-Aubert Gauthier, Groupe d’Acoustique de l’Université de Sherbrooke, Mechanical Engineering, Université de Sherbrooke.
Supplemental material
The supplemental material is available online with the article.
Note
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
