Abstract
Natural environments have positive effects on mental health, but the nature of this relationship requires further understanding. There may be psychosocial aspects of this relationship that are reflected in the semantic structure of language. Natural language processing methods provide a tool to explore this possibility. In this study, machine learning-derived vector representations of words were provided by a neural network-based language model. This was combined with statistical analyses to test whether nature-related words have particularly strong positive versus negative connections to mental health. Statistically significant associations were indeed found between a range of nature-related words and positive-versus-negative word pairs related to mental health. The results thus confirm a semantic connection between nature and mental health as represented using computational methods trained on language usage. This raises the possibility that semantic associations could play a role in nature's influence on mental health, for instance through appraisal processes. The results provide a proof of principle of a methodological approach that could be used to further probe hypotheses on nature and well-being. Finally, the current results provide information on relationships between specific aspects of nature environments and mental health that may be of use to future research.
Introduction
The experience of nature clearly affects us emotionally, for instance by inspiring awe or igniting our imagination, and it would be no wonder if it could impact our mental health. Research has indeed provided increasing overall evidence that time spent in green and blue natural spaces (e.g., rivers, lakes, forests, botanical gardens, or fields) is associated with improved mental health and life satisfaction (Bratman, Hamilton, & Daily, 2012; Bratman et al., 2019; Chang et al., 2020; Lackey et al., 2021; Mygind et al., 2019; Panno et al., 2020; Pouso et al., 2021; Theodorou et al., 2021).
In this vein, compelling evidence indicates that nature-based therapy, for example, Shinrin-Yoku, shows a positive effect on both mental and physical health outcomes (Kotera, Richardson, & Sheffield, 2022; Markwell & Gladwin, 2020) and, moreover, some recent studies are finding that this beneficial effect can be observed even when exposing people to nature through simulated natural environments, such as virtual reality (Pasca et al., 2021; Spano et al., 2022; White et al., 2018; Yeo et al., 2020).
Such results are in line with several theoretical frameworks. One is the biophilia hypothesis (Barbiero & Berto, 2021; Kellert & Wilson, 1993), which is grounded in the evolutionary relationship between humans and their natural environment. The hypothesis proposes that humans have a genetical predisposition to learn to develop an interest in and emotional connection with nature, in particular, forms of nature that provide an environment with individual and evolutionary advantages. Despite our increasing separation from nature in the modern world, researchers argue that the imprint of wilderness has remained deep within the human psyche (Estes, 1992).
Other possible explanations for mental health benefits of nature are provided by stress recovery theory (Ulrich, 1979; Ulrich et al., 1991) and attentional recovery theory (Kaplan, 1995; Ohly et al., 2016). These hypotheses and frameworks are not necessarily incompatible, but it is not certain which of them, if any, or which combination of them, is a complete and valid explanation of the connection between nature and well-being.
However, overall, they provide a clear expected direction of associations between nature and mental health. This is in line with a “least controversial” formulation of biophilia, which posits that humans have “a need and propensity to affiliate with nature” while remaining equivocal about the proximate or ultimate (Joye & van den Berg, 2011) cause of this relationship (Kahn, 1997). In this article, we use the term “biophilia” in this general descriptive sense.
One route to furthering our understanding of the benefits of connectedness to nature is to consider the psychological associations that could be involved in biophilia, and which could potentially mediate predisposing biological and varying environmental factors (Bruni & Schultz, 2010). It has been argued that the benefits of nature derive from the meaning the individual assigns to it based on their experience and emotional associations (Barbiero & Berto, 2021). Appraisal is a well-known determinant of the emotional impact of experiences (Kappas, 2006; Lazarus, 1991).
Processes underlying appraisal may lie outside our conscious awareness; we may be biased to appraise or interpret stimuli in certain ways, which subsequently affects our responses. Such processes can be studied using so-called implicit measures, which assess cognitive and affective processes through tools such as the implicit association test. Research using this kind of measure has provided empirical support for an implicit connectedness to nature in terms of individuals' automatic associations as expressed behaviorally (Bruni & Schultz, 2010; Olivos & Aragonés, 2013; Schiebel, Gallinat, & Kühn, 2022; Schultz, Shriver, Tabanico, & Khazian, 2004).
A complementary way to explore the nature of associations between human well-being and nature is through semantic relationships expressed in language usage. Language is key to the human experience and words are tools that express humanity's relationship with the external world and provide access to a human's inner world (Brencio & Bauer, 2020). It has been previously noted that nature is widely represented in metaphors (Kahn, 1997; Lawrence, 1993). A different question, however, is whether nature-related words are semantically related to positive mental health-related words.
One way to explore and define this relationship is through natural language processing, that is, machine learning methods that represent the meaning of words based on how they are used in everyday language (Mikolov, Chen, Corrado, & Dean, 2013a; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013b). We briefly sketch how neural network-based language models provide a mathematical representation of meaning. The core idea is as follows: For a given vocabulary of words and a very large number of sentences using those words, train a neural network to take a word as its input and provide probabilities of which words surround it in sentences. The input to the network represents each word as a distinct input node; the input nodes connect to a hidden layer of, for example, 300 hidden nodes; and the hidden layer finally connects to output nodes representing each word in the vocabulary at the preceding position in the sentence, at the subsequent position, the position after that, and so on. If the network can be successfully trained, the weights of the connections from the (many) input nodes to the (relatively lower dimensional) hidden layer must represent meaning in this predictive sense. If the same word tends to occur in similar sentences, for example, “I went out for dinner and ate X at a restaurant,” then those words should have similar patterns of weights.
It turns out—perhaps surprisingly—that treating the patterns as a vector and applying standard vector algebraic operations provide meaningful results. Similarity between words can be calculated as the distance between the directions of the vectors. Some examples are the vector sum of “Russian”+“River” lying close to “Volga River”; or “Madrid”−“Spain”+“France” lying close to “Paris” (Mikolov et al., 2013b). Such methods have shown psychological uses, for example, to identify the emotional context of text (Tausczik & Pennebaker, 2010) or to study mental health (Yin, Sulieman, & Malin, 2019).
Furthermore, such computational models have been shown to reflect known human biases and prejudices (Caliskan, Bryson, & Narayanan, 2017). This presents a risk for societal applications of machine learning, but also scientific opportunities. If there is a prejudiced ghost in the machine, perhaps there is also a biophilic one.
The aim of this study was, therefore, to apply this kind of computational model of semantics to the relationship between nature and mental health. We hypothesized that there would be an association between words related to nature and to mental health and well-being. More specifically, the prediction is that the strongest overall effect should be toward well-being, even though some negative associations would fit within existing theory, especially involving threat and arousal. The approach thus allows a critical test of this hypothesized semantic reflection of the (broadly defined) biophilia hypothesis, in the sense of failing to reject the null hypothesis as well as if the preponderance of associations does not make substantive sense in well-being terms.
Positive results would have a range of implications. They would reveal a novel domain in which biophilic associations exist, and thus provide a complementary line of evidence for nature—well-being associations. They would also provide a proof of principle that semantic similarity encodes psychologically relevant associations related to nature and mental health. This would provide methodological opportunities for future studies focused on specific theoretical questions. Finally, if the hypothesized semantic connections of interest are found, this would support a potential causal mechanism in connecting nature and well-being through biasing of appraisal.
Materials and Methods
The files used for the analyses are available at https://doi.org/10.6084/m9.figshare.19351940.v1.
As the study involved only analyses of an existing, publicly available language model, IRB approval was not sought.
Modeling of contrasts and associations
Measurements of associations were based on the vector representation of words provided by a neural network language model, the word2vec algorithm (Mikolov et al., 2013a, 2013b). These representations consist of the weights from words to 300 hidden neurons of a pretrained neural network. The vector representations encode the context, in terms of surrounding words in sentences, in which a given word tends to be used, providing a measure of relatedness or interchangeability. The GenSim toolbox was used (Řehůřek & Sojka, 2010), with a pretrained model based on the Google News text corpus. This consists of around 100 billion words used in 3 million words or phrases that were used in Google News stories.
Associations between a pair of words were defined to be the cosine similarity between their word2vec vectors. That is, vectors with the exact same weights would be perfectly aligned and have a similarity of 1, vectors pointing in opposite directions would have a similarity of −1, and orthogonal vectors would have a similarity of 0.
Contrasts between pairs of positive and negative words related to mental health, for example, relaxation minus stress, were defined to be their difference vector, labeled as relaxation-versus-stress. This is an application of the ability to use vector algebra with semantic vectors already described, to focus on the specific valenced concept of interest. Note that using only the positive word could capture merely generic associations between nature and emotion. The difference score represents the specific relative meaning captured by the contrast.
Using these contrasts, a similarity between a nature word vector, for example, “forest” can be calculated with the contrast vector representing, for example, the difference between “stress” and “relaxation.” Such associations are described as, in the mentioned example, “the association between forest and stress-versus-relaxation.” This association would reflect the degree to which “forest” is associated with relaxation relative to stress.
Word sets
The following sets of words were used in the analyses: nature words (nature, fields, forest, woods, greenery, river, lake, sea, landscape, hills, mountain, and wildflower), positive mental health words (relaxation, happiness, joy, confidence, calm, restoration, and safety), and negative mental health words opposite to the positive words (stress, sadness, distress, doubt, anxiety, burnout, and danger). The positive and negative mental health words provide lists of pairs of contrasts, such as relaxation-versus-stress.
These words were selected to cover a range of environments and psychological states relevant to nature and well-being, although we acknowledge that all such selections are to some extent arbitrary. However, the statistical tests described hereunder are not biased by the selections—these are simply a cross section of words relevant to the hypothesized relationship, but whether computational semantic similarities are in fact found is precisely what will be tested. Furthermore, by considering a range of words, we reduce the level of arbitrariness by assessing the overall pattern of relationships. We will also be able to identify any specific relationships between particular combinations of nature words and well-being pairs.
Statistical testing of associations between nature and mental health
Associations were tested between each of the nature words and each of the positive-versus-negative mental health pairs. Monte Carlo simulations were run to test whether these associations were statistically significantly specific to nature—mental health associations as follows. In each iteration of a total of 10,000 run for each given association, a random word was selected from the vocabulary that was tagged as a noun when placed at the end of any of the sentences “Walking in,” “Walking in a,” or “Walking near a.” The similarity of this noun with the positive minus negative contrast was calculated. This provided a null hypothesis distribution of associations of nouns in general with positive versus negative mental health pair.
The p-value of the nature word being tested was defined to be the proportion of more positive associations in the null hypothesis distribution than the observed association for that word. Associations are considered significant if the p-value survives multiple testing correcting, through Bonferroni correction, for the total number of associations (all nature words × all positive-minus-negative mental health word pairs) being tested; otherwise, if the p-value is <0.05, the association is labeled “nominal.”
Results
Table 1 gives the results of the statistical tests of associations between nature words and positive-versus-negative mental health words contrasts. Supplementary Appendix SA1 provides unipolar associations, that is, between nature words and positive and negative words separately.
Connections Between Nature Words and Positive–Negative Mental Health Word Pairs
The table provides, per mental health word pair providing a positive–negative contrast, the vector similarity between that contrast and a range of nature-related words. Statistical significance, based on Monte Carlo testing, is provided, together with an indication of whether the significance was only nominally significant at an alpha of 0.05 (* or - for positive and negative associations, respectively) or survived multiple testing (*** or —).
Three nature words were significantly associated with relaxation-versus-stress: “forest,” “greenery,” and “wildflower”; additionally, “woods,” “lake,” “landscape,” and “hills” showed a nominal association. There were no significant associations with happiness-versus-sadness; nominal associations were found for “nature,” “fields,” “greenery,” “river,” and “lake.” Joy-versus-distress was significantly associated with “greenery” and “hills,” and nominally associated with “fields” and “mountain.” Confidence-versus-doubt showed no significant or nominal associations. Calm-versus-anxiety showed only nominal associations, with “nature,” “forest,” “river,” and “lake.” Restoration-versus-burnout was significantly associated with “greenery,” “river,” “lake,” “landscape,” and “wildflower,” and nominally associated with “forest” and “woods.” Safety-versus-danger showed no significant or nominal associations.
In additional analyses, for completeness, it was considered whether there were negative, that is, reversed, associations. This was the case for safety-versus-danger, which showed significant associations with danger-versus-safety for landscape, hills, and mountain, and nominal associations for all other nature words except for fields and wildflower.
Discussion
This study aimed to test the hypothesis that “biophilic” associations would be found between words related to nature and mental health. Such associations were indeed found, indicating a connection embedded in the semantic structure of language. The results revealed particular relationships between certain nature words and aspects of mental health.
The multiple testing corrected significant associations were found between relaxation-versus-stress and “forest,” “greenery,” and “wildflower”; between joy-versus-distress and “greenery” and “hills”; and between restoration-versus-burnout and “greenery,” “river,” and “lake.” Weaker or no associations were found for happiness-versus-sadness, confidence-versus-doubt, calm-versus-anxiety, and safety-versus-danger, although happiness-versus-sadness and calm-versus-anxiety were involved in a number of nominally significant associations that did not survive the Bonferroni correction. Following the same testing procedure but considering negative rather than positive associations, some nature words were more strongly associated with danger than with safety.
Both positive and negative associations could be of importance to policy interventions—for instance, strategies that do not consider the association between nature and danger might fail to improve people's contact with nature. The current method and results may help to further specify theoretical relationships, that is, they provide a way to consider which specific kinds of nature and aspects of mental health are most closely connected. Furthermore, they may help guide methodological choices, for example, for implicit association test stimulus sets.
What could it mean that relaxation-versus-stress, joy-versus-distress, and restoration-versus-burnout showed particularly strong associations? These contrasts fit the claim that humans have evolved to exist in nature and will suffer from the need to compensate for aspects of “non-natural” living. However, this is not to say that experiences in nature are necessarily easier or nonchallenging. Indeed, the negative association with danger would appear likely to reflect the fact that contact with nature also involves various kinds of hazards. We note that, as discussed previously (Kahn, 1997), there also exists a concept of “biophobia,” or negative relationships with nature: we fear certain natural stimuli (e.g., snakes) and may act in ways that harm nature or animals.
In terms of mental health, we note that such threatening elements are not necessarily harmful in all cases. To the extent humans are (e.g., biologically) prepared to deal with challenges, they may serve to provide a more optimal level of challenge and arousal (Xie, 2016; Yerkes & Dodson, 1908) and hence, still, an affiliation with nature in the sense of biophilia (Kahn, 1997). The overall pattern of associations may thus be due to fitness for certain kinds of potentially rewarding challenges that are more likely to occur in nature rather than industrialized society.
Regardless of what caused the semantic associations, given that they exist they could represent a cultural influence on individuals, by playing a role in biasing our appraisals of stimuli. As noted, biasing appraisals would be expected to form a mechanism of influencing mental health effects. For instance, the appraisal of a forest an individual is walking through may be biased toward interpretations in line with relaxation, calm, and restoration.
This study has limitations we here acknowledge. In particular, the semantic vectors are based on one particular corpus of texts, in one particular language. Further research could explore vectors trained on different texts, and even test differences between different corpuses. The clear challenge here will be the availability of such sets of texts. Furthermore, the selection of words used for the current analyses was relatively general and, although we believe the sets were appropriate to the hypotheses, must be acknowledged to have been based on subjective choices. Future study could focus on different word sets based on different considerations; the scripts we have made available can be easily applied to different word sets.
For instance, specific theoretical questions could suggest comparing whether words predicted by one theoretical framework are more strongly related to natural environments than those predicted by a competing framework. Furthermore, future research could also derive concepts and word sets of interest from surveys or interviews concerning specific issues around health and well-being. We do note the need to always select words carefully where their meaning could be ambiguous; for example, “grass” or “field” will have associations related to their alternative meanings, and spelling errors could affect associations, for example, associations with “heath” could be contaminated by the phrase “health and safety.”
Conclusions
The methodological approach of combining computational semantic vectors with psychological queries appears potentially fruitful and, at least, could be interesting from the perspective of hypothesis generation. Once such semantic associations have been found, these can be further tested empirically in behavioral studies. The current analytical approach may have methodological uses in studies using implicit measures in particular, when the selection of appropriate verbal stimuli may be an essential part of the research design. Information on the strengths of semantic associations such as those found in the current results could be used to attempt to select optimal stimuli for use in implicit measures such as the implicit association test, as there is a relationship between such behavioral measures and semantic measures (Lynott, Kansal, Connell, & O'Brien, 2012).
Note that word selections could be tailored to optimize either effect size or reliability, in line with research aims (Hedge, Powell, & Sumner, 2017). That is, if a study design depends on having a clear overall association, over all participants, between nature and positive-versus-negative attitudes, then a design strategy could be to choose words with relatively strong semantic associations in the desired directions. However, in individual differences research, it might be less desirable for all participants to show a strong association in the same direction than to have a lot of variability over participants. In that case, words could be chosen where the association with the positive-versus-negative contrast is actually weak.
Such methodological approaches might be of help in trials and mediation analyses of applied interventions fostering contact with nature. Furthermore, knowing which particular words best reflect biophilic associations could suggest more effective communications supporting proenvironmental behavior and climate change action, by selecting words likely to resonate with existing associations. Finally, the current results provide an additional line of support for an association between nature and mental health, embedded in language itself, and a potential causal mechanism that could play a role in nature–well-being associations.
Footnotes
Authors' Contributions
T.E.G. contributed to conceptualization, methodology, software, formal analysis, and writing—original draft. N.M. was involved in conceptualization, methodology, and writing—original draft. A.P. carried out conceptualization and writing—review and editing.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This work was part of the project named “Establishing Urban FORest based solutions in Changing Cities (EUFORICC) financially supported by the Ministry of Education, University and Research (MIUR) of Italy (PRIN 20173RRN2S).”
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
