Abstract
Heidi, a quintessential Swiss fictional character, has left an enduring imprint on global culture, surpassing the confines of mere literature to become a cultural phenomenon. Our study delves into the timeless allure of Spyri’s novel by examining its portrayal of spatial and emotional dimensions. Using a mixed-method approach that combines computational methods and human annotations, this paper aims to provide a comprehensive understanding of the relationship between emotional content and landscape representation in Heidi, emphasizing the narrative’s reverential treatment of nature’s influence and reaffirming the novel’s dichotomous depiction of nature versus urbanity. Our investigation also exposes, however, disparities between computational sentiment analysis and human interpretations, underscoring some of the limitations of lexicon-based sentiment analysis methods. By advocating for a holistic approach that amalgamates computational techniques with human insights, we advocate for a nuanced understanding of sentiment analysis in literary works, one that acknowledges the subtleties and complexities woven into the narrative. We call for continued exploration to refine sentiment lexicons, explore sentiment variation across diverse literary genres and cultural contexts, and delve deeper into the interplay between sentiment and fictional space.
Keywords
Introduction: Why Heidi?
Wie das Vögelein, das zum ersten Mal in seinem schön glänzenden Gefängnis sitzt, hin und her schießt und bei allen Stäben probiert, ob es nicht dazwischen durchschlüpfen und in die Freiheit hinausfliegen könne, so lief Heidi immer von dem einen Fenster zum anderen, um zu probieren, ob es nicht aufgemacht werden könne, denn dann musste man doch etwas anderes sehen als Mauern und Fenster, da musste doch unten der Erdboden, das grüne Gras und der letzte schmelzende Schnee an den Abhängen zum Vorschein kommen, und Heidi sehnte sich, das zu sehen. (Spyri, 1880: 103) As a bird, when it first finds itself in its bright new cage, darts hither and thither, trying the bars in turn to see if it cannot get through them and fly again into the open, so Heidi continued to run backwards and forwards, trying to open first one and then the other of the windows, for she felt she could not bear to see nothing but walls and windows, and somewhere outside there must be the green grass, and the last unmelted snows on the mountain slopes, which Heidi so longed to see. (Spyri, 1946: 98)
Heidi, undeniably one of the most celebrated Swiss fictional characters, has etched its mark on cultural consciousness with the novel’s sales alone exceeding 50 million copies worldwide. 1 Its indelible impact is further underscored by its presence among the thirty best-selling books of all time, a testament to the popular charm of the red-cheeked protagonist. The orphaned heroine from the children’s books first published in 1880 (Heidis Lehr- und Wanderjahre) and 1881 (Heidi kann brauchen, was es gelernt hat) by Johanna Spyri (1827–1901) has transcended the confines of ink and paper to emerge in over 25 television adaptations, two video games, and even two musicals (Leimgruber, 2001). Such cultural ‘immersion’ has metamorphosed the environs of Maienfeld, the fictionized, but existing village central to the novel’s narrative, into ‘Heidiland,’ a ‘real-world’ Swiss holiday region. Such is its impact that an asteroid was christened in honour of this fictional character (Swissinfo.ch, 2007).
In a revealing snapshot, a national survey ‘Gulliver,’ – revived in 2014 from its 1964 incarnation (Hedinger and Gossolt, 2015) – showcased the deep-rooted place Heidi occupies in the construction of identity for the inhabitants of Switzerland. Notably, 62.3% of the participants from a representative sample (n = 625 out of N = 1004) and 34.4% from a public sample (n = 222 out of N = 645) pinpointed the character of Heidi when asked about pivotal events or personalities contributing to Switzerland’s sense of self. But what propelled Heidi into such a success?
One facet that contributes to the enduring appeal of the novel is its ability to artfully simplify reality, creating a world divided into ‘good’ and ‘bad’. Accordingly, the narrative realm is inhabited by relatively typified characters, but even more central is the representation of the clearly differentiated fictional spaces. The ‘mountain’ (nature) is symbolically opposed to an urban milieu: with its sublime expanses, open horizons, and yet idyllic corners, the mountain finds itself at odds with the ‘city’ that is marked by enclosed and stifling confinements, both spatially and socially. The novel presents the alpine space as that of a ‘good’ and ‘pure’ rural life, which is a foil to the ‘bad’ and ‘compromised’ encroachment of Frankfurt’s urbanization. Within Spyri’s narrative, the mountain takes on a role of therapeutic solace (Williams, 2017), which builds upon a romanticising notion of the Swiss mountains as an ‘enchanted’, unspoiled sanctuary – an idea whose origins can be traced back at least as far as Albrecht von Haller’s poem Die Alpen (1751) that links enlightened ideas (‘culture’) with a pastoral idyll (‘nature’), suggesting alpine Switzerland as the space where re-integration of the disparity of modern life can take place, and which is opposed to an ‘urban-courtly vice’ (Hentschel, 2002). In the context of a developed modern society at the time of Spyri’s publication of Heidi, the notion of a ‘return’ to rural existence takes on a distinctly utopian character, rendering the novel a representation of a utopian yet hybrid vision—an enlightened, Christian counterbalance to the overly controlled urban lifestyle of the ‘educational bourgeoisie’ (Halter, 2001: 17). Simultaneously, the book draws on a tradition of associating alpine, ‘authentic’ imagery with Swiss ‘national identity’, thereby contributing to the construction of the ‘Swiss Myth’ as both a self-image and an external perception (Im Hof, 1991).
The heart of our investigation lies in the exploration of the nature/urban dualism known to be represented in the first of the Heidi novels, as well as some of the complications arising with such dualism. Our study employs a distant reading approach (Moretti, 2013; Primorac et al., 2023) to illuminate two key aspects: the prominence of clearly distinct natural and rural landscapes in Heidi, using a rule-based and word list-based spatial text mining; and the polarization of sentiments – a theme deeply ingrained in the narrative – using a lexicon-based sentiment analysis (SA) approach (Kim and Klinger, 2019; Lee and Kim, 2021; Taboada et al., 2011), which we present in more details in the Methods section. Through this systematic examination of ‘space’ and ‘affect’ at a formal, textual level, we also contribute to understanding the extent to which computationally detected spaces and emotions align with the perceptions of readers and literary scholars. Finally, this endeavour serves as a case study to evaluate the computational method used, by comparing human-annotated sentences from Heidi to the output of the lexicon-based SA (henceforward SA). We also engage in a qualitative exploration of selected text passages where discrepancies arise between the output of ‘automated’ SA and the ‘manual’ judgement of human annotators.
Methods
Heidi, as most literature written for children, is a comparably simple literary text. Children’s literature tends to use simpler and more frequent vocabulary (Berman, 2008; Puroila et al., 2012), typically features straightforward plots, simpler syntax, shorter sentences, and less parataxis (McNamara, 2013); more dialogue and action-oriented events, a conventional and schematic plot structure, and clearly depicted emotions (McDowell, 1973). These characteristics align well with the strengths of lexicon-based SA, a technique that assesses sentiment by matching words in a text with predefined entries in a sentiment lexicon – lists of tokens associated with emotional values such as ‘positive’, ‘negative’ or ‘neutral’, or numerical values that represent these (for example, values on a continuous scale between +3 and −3). Because these lexicons are usually composed of common, high-frequency words, they are relatively likely to effectively cover the vocabulary found in children’s texts. Moreover, the reduced syntactic complexity and shorter sentence length increase the likelihood that emotional cues are clearly associated with specific elements in the sentence – such as characters or spatial entities – making sentiment easier to detect and attribute. As a result, the stylistic features of children’s literature decrease ambiguity and enhance the precision of sentiment analysis.
To investigate the contrast between rural/natural and urban spaces in Heidi, 2 we used the environment-related spatial term lists developed by Grisot and Herrmann (2023). These included words associated with rural or natural settings on the one hand and urban areas on the other (details on the lists’ content are provided in the following sections). After locating the occurrences of these spatial terms within the text, we analysed the sentiment of the surrounding sentences using the SentiArt sentiment lexicon (Jacobs and Kinder, 2019). This lexicon – particularly its Affective-Aesthetic Potential (AAPz) metric – offers a comprehensive and reliable measure of valence (i.e., the degree of positivity or negativity of a word) for German-language texts (Jacobs et al., 2020).
To assess the reliability of this computational approach, we complemented the lexicon-based analysis with human annotation: six independent readers evaluated the sentiment and emotional tone of the same set of sentences from the novel Heidi. We then compared the computational sentiment scores with the human judgments to evaluate the extent of agreement between the two methods. Finally, we examined the cases where the lexicon-based results diverged from human interpretation to better understand the limitations of automated sentiment detection in literary contexts.
Our analyses were performed using the programming language R (R Core Team, 2017), the graphical user interface (GUI) R Studio (Posit Team, 2024), and a number of R packages that facilitate – among other things – data handling, statistical analyses and visualisations. 3 The data used in this paper is publicly available on Zenodo, an open-access repository for research outputs (see Grisot and Herrmann, 2024).
Space
The line ‘Heidi, Heidi, deine Welt sind die Berge’ (“Heidi, Heidi, your world is the mountains”) comes from the opening song of the widely popular animated television series Heidi, performed by Gitti and Erika (1977). Even for those unfamiliar with the show, this lyric conveys the central theme of the story: Heidi’s deep connection to the alpine landscape, an adventure-invested, but comforting world, which stands in stark contrast to the unfamiliar and often alienating urban environment. 4 We explore in this section the spatial contrast between rural/natural and urban space in Heidi because this opposition lies at the heart of the narrative’s structure, emotional patterning and ideological message. The story consistently associates the rural alpine environment with freedom, health, emotional authenticity and moral goodness, while the urban setting is portrayed as restrictive, artificial and alienating. By examining how these spaces are emotionally encoded in the text, we gain insight into the broader values the narrative communicates – particularly concerning childhood, education, identity, and the idealised relationship between humans and nature.
To carry out this analysis, we first needed a way to systematically identify and classify spatial references in the novel, distinguishing between urban and rural environments. Grisot and Herrmann (2023) categorised spatial entities to examine the representation of literary space in Swiss novels written between 1840 and 1940, using a framework inspired by Wartmann et al. (2018). The latter distinguish between biophysical landscape elements – such as geology, landforms, flora, fauna, and climate – and cultural landscape elements – including land use, settlements, infrastructure and human-made objects. While this taxonomy effectively separates natural from human-made elements, it does not differentiate between rural and urban environments, both of which may fall under the category of human-made. To address this limitation and better reflect the spatial dynamics in literature, Grisot and Herrmann (2023) developed a new taxonomy that represent both RURAL and URBAN spatial categories and created a list of terms for several categories pertaining to these overarching landscape types.
The RURAL category encompasses spatial references associated with the countryside, including both general terms and specific place names. The original annotation scheme includes (1) rural entities – terms that describe features and infrastructure typical of rural areas (e.g., Feld ‘field’, Hütte ‘hut’) and (2) natural entities – terms referring to elements found in nature without human influence (e.g., Baum ‘tree’, Felsen ‘rock’). Additionally, the RURAL category covers proper names of specific locations, such as (3) rural geolocations – villages and small towns with fewer than 5000 inhabitants across Switzerland, Austria, Germany, France, and Italy (e.g., Huttwil, Enggwil) – and (4) natural geolocations, including the names of mountains, rivers, valleys, and lakes (e.g., Mont Blanc, Donau ‘Danube’). These correspond respectively to the original annotation labels: natural, rural, geoloc_rur, and geoloc_nat.
In contrast, the URBAN category includes (1) urban entities – terms denoting city-specific features such as buildings and infrastructure (e.g., Bahnhof ‘station’, Palast ‘palace’) – and (2) urban geolocations, comprising the names of cities and towns in the same five countries (e.g., Berlin, Paris, Rom ‘Rome’) – labelled respectively as urban and geoloc_urb.
Six spatial entities categories identified in Grisot and Herrmann (2023).
Using the spatial data developed by Grisot and Herrmann (2023), we focused on two main categories of spatial entities: RURAL and URBAN. These categories are based on the lists of terms summarised above, with a total of N = 168,302 terms for RURAL entities and N = 5841 terms for URBAN entities. When we applied these spatial entity lists to Heidi, we identified a total of N = 536 spatial entities, distributed as shown in Figure 1(a) and 1(b). (a) Raw count and proportion of spatial entities by space type in the 1880 novel ‘Heidi’s Lehr- und Wanderjahre’. (b) 30 most frequent terms by space type in ‘Heidi’s Lehr- und Wanderjahre’: RURAL on the left and URBAN on the right.
The results of this initial ‘mapping’ of the spatial entities lists onto Heidi appears in line with our perception of the role of landscape in the novel – at least on this first level of analysis. RURAL entities are substantially more present in Heidi than URBAN ones, both in terms of token frequency (RURAL = 405 vs URBAN = 131), and with regard to the number of unique terms detected (RURAL = 84 vs URBAN = 21). 5
An overview of which spatial terms were detected in the novel shows to some extent that the list of spatial entities used is effective; the relatively small number of distinct spatial terms detected in Heidi is consistent with the overall word count of N = 51,112. We can see that the most frequent terms found in the two categories are, perhaps unsurprisingly, distinctively characteristic of the landscape of Heidi. This is particularly true for the RURAL category, where the top 10 most frequent terms (Alm, Dörfli, Hütte, Tannen, Berg, Felsen, Weide, Tal, Berge, Höhe) are highly representative of the alpine pasture.
Interestingly, while none of the 10 most frequent RURAL terms are geolocations (named entities), the URBAN category contains at least two cities (Frankfurt, Straße, Turm, Kirche, Platz, Schule, Bank, Laden, Brunnen, Basel). Among the full list, we see Frankfurt, Paris, Basel and Neapel (somewhat surprisingly all cities located in different European countries, rather than dominantly Swiss), accompanied by spatial terms that are indeed not necessarily typical of the alpine imagery, but that represent the themes of the novel (see Schule (school) or Kirche (church)). A few of the urban terms detected, however, need disambiguation. We see for example Stößen and Brunnen – both words feature in our list as geolocations (Brunnen is situated on the shores of Lake Lucerne, while Stößen could refer a town in the Saxony, Germany) but also exist as common nouns (respectively ‘spring(s)’ and ‘bumps’/’shocks’). It can also be noted that the list shows a double occurrence of the non-named spatial entity Straße/Straßen (‘street/streets’), which may be generally associated with cities, but can be found also in contexts describing rural environments. 6
Given the apparent low occurrence of URBAN terms in Heidi, we decided to compare the space entities distribution of the novel with a reference corpus. This allowed us to determine whether the prominence of RURAL in the spatial distributions is indeed specific of Spyri’s novel, or whether this simply reflect a general trend in the Swiss literary production of the time.
For this purpose, we compiled a Comparison Corpus, composed of 184 Swiss literary narrative texts written in German between 1822 and 1940 by 69 Swiss authors (see Grisot and Herrmann, 2024). As we wanted the corpus to be as representative 7 as possible of the historical literary period Spyri belonged to, we decided to include a variety of literary texts, including Swiss canonical authors and authors of popular novels and ‘Dorfgeschichten’, such as Gottfried Keller or Jeremias Gotthelf, as well as examples of children’s literature, such as Ida Frohnmeyer’s novels.
We then used the same method on the Comparison Corpus, looking for RURAL and URBAN spatial entities and their distribution. The result can be observed in Figure 2(a) and 2(b). (a) Raw count and proportion of spatial entities by space type in the comparison corpus. (b) 30 most frequent terms by space type in the Comparison Corpus: RURAL on the left and URBAN on the right.
Similarly to Heidi, and in line with previous research (Grisot and Herrmann, 2023), we can observe in the Comparison Corpus a higher presence of RURAL spatial entities in comparison to URBAN ones. This predominance, we argued before, can be explained both socially and culturally: firstly, at least in part, it relates to an idealising view of the rural and natural landscapes, which play a crucial role in the self-identification of the Swiss nation and its people. Secondly, it is in line with the poetics of successful realistic literary texts in those years 8 – for instance Dorf- and Bergromane – depicting more often the local, everyday life of the Swiss countryside and of smaller Swiss urban centres than the flamboyant chaotic life of the European city – much more present in the work of many non-Swiss authors during the examined period.
We can also observe, however, that Heidi pushes this dualism slightly further than the Comparison Corpus, with an accentuated difference between the presence of RURAL and URBAN entities, respectively 75.56% [N = 405] and 24.44% [N = 131] in Heidi, and 74.71% [N = 80,737] and 25.29% [N = 27,336] in the Comparison Corpus.
Average relative frequency of entities by sentence detected in Heidi and in the comparison corpus by space type, and p value resulting from the Wilcoxon rank sum test.
These results indicate that Heidi features a generally higher presence of spatial terms, and that the probability of encountering a RURAL spatial term in a sentence in Heidi is significantly higher than that of finding one in the Comparison Corpus. In other words, the Comparison Corpus has generally fewer spatial entities (proportionally) and fewer RURAL ones, relatively to Heidi.
Given the plot development of Heidi – which sees the child first arriving in the (Swiss) mountains, then moving to the city of Frankfurt am Main (in Germany) and eventually returning to the Alps – we decided to explore whether this ‘movement’ could be detected by the distribution of spatial entities across the novel – in a way verifying the effectiveness of our entities list. We did find a clear pattern, with RURAL entities dominating the first and last part of the novel, and URBAN entities concentrated in the middle.
Sentiment
Our second aim in this paper is to analyse the sentiment (positive/negative) encoded in Heidi, examining the ‘affective’ difference between sentences containing RURAL and URBAN spatial terms. For this purpose, we used the AAPz (Affective-Aesthetic Potential) measure provided by the lexicon version of SentiArt (Jacobs and Kinder, 2019), which has been shown to be a reliable predictor of valence (Jacobs, 2019; Jacobs et al., 2020) – i.e. how positive (AAPz value >0) or negative (AAPz value <0) a word is. The choice of SentiArt has several motivations: instead of using word lists based on human ratings for the valence of single words – a procedure which presents a number of problems, such as individual, societal or contextual biases – SentiArt is based on vector space models (VSM), which has been shown to be more accurate and reliable while providing much more comprehensive lexicons; it is based on the computation of a multivariate sentiment analysis of several affective semantic features by means of word embeddings using distributional semantics on vast language data (Jacobs, 2019). Moreover, compared to most other lexicons – which feature roughly between 1000 and 15,000 tokens – SentiArt contains 116,313 tokens. This extensive size significantly enhances the lexicon’s coverage, i.e., it increases the likelihood that words in the corpus will be matched with an emotional value. As a result, the analysis can yield a more nuanced and precise representation of emotional tone. Similarly to Jacobs et al. (2020), we decided to compute the mean AAPz value for each sentence, calculated on tokens (words) found by the SentiArt lexicon.
Considering the spatial distribution across the novel, we wanted to observe (1) whether the AAPz value would follow the space pattern ‘mountain-city-mountain’ in the form of a general ‘positive-negative-positive’ emotional arc – one of the most common ones in narrative texts according to Reagan et al. (2016), and (2) whether the presence of RURAL spatial terms would affect the AAPz value. We were also interested in observing whether Heidi differed in any way from the Comparison Corpus described above in terms of the relation between sentiment and space. We thus performed a Wilcoxon rank sum test to evaluate whether the AAPz values for each sentence differ between sentences with RURAL entities and with URBAN entities in Heidi and in the Comparison Corpus. 10
In line with our expectations, we found that the beginning and end of the novel appear to embed a more positive content in comparison with the central sections of the book (in which we have a lower presence of RURAL spatial entities). This is shown in Figure 3. Sentiment arc across the novel.
We also found a significant difference in AAPz values between sentences containing RURAL and URBAN spatial terms, both in Heidi (p < 0.0001) and in the Comparison Corpus (p < 0.0001), as shown in Figures 4 and 5. Notably, the direction of this effect differed between the two corpora: in the Comparison Corpus, sentences with URBAN terms had higher average AAPz values, indicating more positive sentiment, whereas in Heidi, RURAL sentences were associated with more positive sentiment. Interestingly, sentences without any spatial terms exhibited significantly lower AAPz values in both corpora, suggesting that spatial references tend to co-occur with more emotionally charged language (Table 3). Comparison of AAPz value in sentences with RURAL versus URBAN entities in Heidi. Comparison of AAPz value in sentences with RURAL versus URBAN entities in the comparison corpus. Mean and median AAPz values in sentences containing RURAL versus URBAN spatial terms in Heidi and in the comparison corpus (highest respective values in bold).

To explore this contrast more closely for Heidi, we examined sentences containing specific spatial entities that we deem to be most representative of the dualism nature/city, and that represent the two main and ‘opposite’ settings of the novel: Alm on the one hand (for the alpine pasture), and the city of Frankfurt on the other – both at the top of the most frequently detected URBAN and RURAL spatial terms in Spyri’s novel. By comparing the affective value of sentences containing these two tokens in Heidi and in the Comparison Corpus, we wanted to verify once again whether the emotional value of certain specific spatial entities is somewhat unique in Spyri’s novel. There is no reason to believe, in fact, that Frankfurt would generally have a negative connotation in general in the comparison corpus or Alm a more positive one (Figures 6 and 7). Comparison of AAPz value in sentences containing the entities ‘Alm’ and ‘Frankfurt’ in Heidi. Comparison of AAPz value in sentences containing the entities ‘Alm’ and ‘Frankfurt’ in the comparison corpus.

We thus selected sentences containing these two terms only, and compared their average AAPz value. What emerges is shown in Figure 5.
In line with our prediction, while a superficial observation of Figure 5 might seem to suggest that Alm retains a more positive encoding, no statistical difference was found in the Comparison Corpus for AAPz in sentences containing the two terms. In Heidi, on the other hand, the presence of these spatial terms appeared to have a highly significant effect on AAPz, with Alm featuring a significantly more positive encoding than Frankfurt, and vice versa, Frankfurt featuring a significantly more negative encoding than Alp.
Sentiment lexicon versus human annotations
ICC values for sections of Heidi annotated by different annotators groups.
The results demonstrate that IRR improved as the number of annotators increased, with ICC values progressing from moderate reliability (e.g., ICC = 0.424 for three annotators) to excellent reliability (e.g., ICC = 0.834 for six annotators). 11 Lower ICC values, particularly for smaller groups (e.g., three annotators), may reflect variability in individual judgment and are likely influenced by the smaller sample size of sentences annotated (n = 56). Conversely, the higher ICC values for larger groups (e.g., five or six annotators) indicate stronger agreement among annotators, suggesting reduced measurement error. Overall, these results suggest that as more annotators collaborated, the reliability of the annotations increased, supporting the robustness of the dataset for subsequent analyses.
We examined then the similarities and discrepancies between the average AAPz value and the valence annotated manually and examined the specific cases (sentences) where these two values differ the most. As we have seen, using the SentiArt lexicon the overall result of the sentiment distribution (AAPz) across the novel matched our expectations of a specific affective curve, broadly going from positive (Heidi in the mountains) to negative (Heidi’s permanence in Frankfurt) to positive again (Heidi’s returning to the Alps). This was also true for the valence annotated manually, where a remarkably similar pattern was found, as shown in Figure 8. AAPz versus annotated sentiment arc across the novel.
A Pearson’s product-moment correlation – used to measure the strength and direction of the relationship between two continuous variables – showed a positive and statistically significant correlation between the AAPz values and the human-annotated valence scores (both centred and normalised): r = 0.33, 95% CI [0.29, 0.37], t(2074) = 16.07, p < .001. This means that, overall, the tool’s sentiment scores aligned well with human judgments, with more positive sentences also receiving higher AAPz values.
Sentences from Heidi where manual annotations for valence and SentiArt AAPz values differed the most.
The first element worth noting is that the sentences where the SentiArt lexicon does not match the human annotators are mainly single words or short phrases. These are mostly exclamations or short questions (only three out of 20 are not) and often share one or more tokens. One aspect that was somewhat unexpected is that the SentiArt lexicon assigns an affective value to certain proper names of people (e.g., Sebastian), which, in isolation, might not typically be seen as carrying a specific emotional connotation. In these cases, SentiArt attributed a more positive value than the manually annotated valence. Since SentiArt was trained on the sdewac corpus, 12 we can infer that this dataset likely contains more instances of this particular name associated with positive sentiment than others. Whether this type of encoding is genuinely meaningful or merely a result of sampling is an empirical question. However, the increased negativity of the manual annotations might also depend on two plausible behaviours: on the one hand the annotators – asked to consider the sentiment of the sentence – could have interpreted the exclamation mark as a marker of negativity, or enhanced the value of a negative sentence as a consequence of the presence of an exclamation mark (Hancock et al., 2007; Lee et al., 2015). On the other, since they were asked to read the novel sequentially, annotators, building a text base and a situation model (Zwaan and Radvansky, 1998) would have retained information about the previous sentence/the context of the exclamation and used that information in their evaluation.
An interesting case is represented by sentence number 975: Sind wir im Wald? (are we in the forest?). Here, the SentiArt lexicon has a 100% coverage, with every word affectively encoded as mildly positive to positive, and thus resulting in an averagely positive AAPz. Human annotators, on the other hand, read this sentence as a mildly negative one. They did so because they were able to consider the situational context – a scene set in the city of Frankfurt, where Heidi is chided by the housekeeper Fräulein Rottenmeier because she left the house, following a sound that she mistook for the rustling of fir trees (but which was in fact the sound of cartwheels on the town’s pavement). The housekeeper’s sentence unfriendly sarcasm becomes visible in context: ‘Tannen! Sind wir im Wald? Was sind das für Einfälle! Komm’ herauf und sieh’, was du angerichtet hast!’ (‘Fir trees! Are we in the forest? What kind of ideas are these! Come up here and see what you’ve done!’).
Furthermore, cases involving (friendly) sarcasm present unique challenges. For example, in sentence 1141: “Warum nicht gar!”, lachte die Großmama (‘Why not indeed!’, laughed Grandma), the average SentiArt AAPz value is skewed by the strong negative connotations of warum, gar, and nicht, which outweigh the mild positivity of lachte and Großmama. It is evident that while the descriptive phrase might change the interpretation of the sentence by a human annotator, this is not accounted for when using a lexicon method. Similarly, in sentence 1716: Richtig! (‘That’s right!’), spoken by the doctor visiting the homesick child in Frankfurt, the AAPz value and the annotated valence are strikingly opposed. Context provided by surrounding sentences plays again a crucial role in shaping the annotators’ responses: “Dann schluckst du’s herunter zum Andern, nicht wahr, so? Richtig!” (‘Then you swallow it down along with the rest, don’t you? That’s right!’). Here, the doctor’s Richtig! reflects an ambiguous tone, commenting on the child’s expected behaviour within a social convention or her actual actions in that moment.
Complicating things: Transformative spaces
When exploring why Heidi has resonated so strongly with readers (and viewers), it becomes evident that its core contrast between ‘pure’ nature and ‘stifling’ educated-bourgeois civilization holds significant appeal – it reaffirms societal beliefs, invites alternative perspectives and provides engaging entertainment. However, as has been pointed out, Heidi is not an entirely stereotypical narrative. Instead, it functions as a kind of Bildungsroman, presenting a utopian vision of modern society with the potential to transform both the individual and the social order.
Let us look in more depth at sentence number 1141 – which is taken from a scene in the study (Studierzimmer). We know from close reading that this study is a place that is affectively positively encoded. Heidi here interacts with Clara’s grandmother Sesemann, an educated character that is benevolent, appreciating the girl’s authenticity and innocence, while being quite aware of her clear and quick mind. When Heidi opens the study door, she enters a space in which she feels welcome and unjudged: As she opened the study door she heard a kind voice say, “Ah, here comes the child! Come along and let me have a good look at you.” Heidi walked up to her and said very distinctly in her clear voice, “Good-evening,” and then wishing to follow her instructions called her what would be in English “Mrs. Madam.” “Well!” said the grandmother laughing, “is that how they address people in your home on the mountain?” “No,” replied Heidi gravely, “I never knew any one with that name before.’ “Nor I either,” laughed the grandmother again as she patted Heidi’s cheek. (Spyri, 1946: 137) Wie es die Thüre zum Studierzimmer aufmachte, rief ihm die Großmama mit freundlicher Stimme entgegen: “Ach, da kommt ja das Kind! Komm mal her zu mir und laß dich recht ansehen.“ Heidi trat heran, und mit seiner klaren Stimme sagte es sehr deutlich: „Guten Tag, Frau Gnädige.” “Warum nicht gar!” lachte die Großmama. “Sagt man so bei euch? Hast du das daheim auf der Alp gehört?” “Nein, bei uns heißt Niemand so”, erklärte Heidi ernsthaft. “So, bei uns auch nicht”, lachte die Großmama wieder und klopfte Heidi freundlich auf die Wange. (Spyri, 1880: 107)
A close stylistic reading of the narrator’s voice and characters’ speech reveals an interplay of affect and speaking style that is not captured by the large-scale analysis reported above. The stylistic analysis reveals the foregrounding of proximity through situation-dependent language use: in contrast to the distanced discursive habitus typical of the educated environment, grandmother Sesemann invites physical and emotional proximity, using a friendly raised voice (rief ihm die Großmama entgegen: the grandmother called out towards her) and colloquial style (komm mal her zu mir/come along (to me)). She then assesses the child explicitly ‘as a whole’ (lass dich recht ansehen/let me have a good look at you) and treats her seriously, appreciating Heidi’s age-conforming, intelligent, yet clumsy attempt to mimic the urban register (Guten Tag, Frau Gnädige/Good-evening, Mrs. Madam). She acknowledges Heidi’s ‘otherness’ by differentiating between how things are said ‘at the alp’ and in Frankfurt (bei euch vs. bei uns/in your home vs nor I either). The grandmother thus acts as a good educator, taking the child seriously in a resource-oriented way, thus communicating that the child is not ‘deficient’, but in fact brings along her own experiences and skills that build the basis for adaption. Her appreciation thus frees a potential for Heidi’s social-linguistic development, who is now enabled to correct herself by switching around the word order (Gnädige Frau/Madame (literal translation: Gracious Lady)). Thus, when factoring in more information given by the text about the study, it appears a ‘friendly space’ within the URBAN environment, an environment for learning in the sense of Wilhelm von Humboldt’s ‘Bildung’. 13 Further events in the narrative show that the transformational significance of the study goes in both ways: Heidi can open up to transform herself, but she is also welcome to transform her environment. She does so by inviting authentic folk types of music (barrel organ player), animals (cats) and overall behaves in an authentic, affective and physical way (shouting, laughing, romping around) that is an antidote to the high control of Clara’s bourgeois family that struggles with a free development of human potential, in fact leaving Clara ill. Thus, while the overall affective-topographical narrative schema is stable, as was demonstrated above, the close reading shows oscillations within the dualistic ‘alp versus city’ pattern. Although a comprehensive close reading of the novel is beyond the scope of this paper, the ‘study’ example points out a transformative space which, in fact, exists on both sides of the divide (in the mountains, Heidi eventually joins the school in the village). These narrated spaces resist the main narrative and give the story complexity and realism, allowing it to sketch out a vision of a more integrated modern society – in which the natural can become part of cultural and the cultural part of the natural (Halter, 2001).
Conclusion
This paper aimed to formally examine the spatial-affective dimensions of the first Heidi novel while also evaluating the performance of the SentiArt sentiment lexicon through manual annotations performed by trained annotators. The analysis sought to shed light on the presence and representation of fictional space, the distribution of sentiment, and the discrepancies between the lexicon-based approach and human annotations.
The spatial analysis reveals a noteworthy distinction in the use of spatial terms between Heidi and the Comparison Corpus. Heidi exhibits a significantly higher frequency of rural spatial terms, underscoring the novel’s emphasis on rural settings and the profound connection between the central character, Heidi, and her natural surroundings.
In terms of sentiment analysis, we explored the differences in sentiment between sentences containing RURAL and URBAN spatial terms. Our findings indicate that there is significant variation in sentiment between these two types of sentences in both Heidi and the Comparison Corpus. Notably, the Comparison Corpus shows higher average sentiment in sentences with URBAN terms, while Heidi exhibits a more positive sentiment in sentences containing RURAL terms. This discrepancy suggests that Heidi’s portrayal of spatial entities evokes even more stereotypical emotional responses compared to other literary works in the Comparison Corpus.
The SentiArt lexicon also demonstrates a sentiment distribution (valence, measured by Affective-Aesthetic Potential, AAPz) that aligns with anticipated expectations of a specific affective trajectory within the novel. The sentiment fluctuates from predominantly positive during Heidi’s sojourn in the mountains to a negative tone during her stay in Frankfurt, only to regain a positive orientation upon her return to the Alps.
The testing of the SentiArt lexicon against human annotations provides further insights. A thorough analysis using Pearson’s correlation coefficient substantiates a positive and statistically significant correlation between the AAPz values generated by the SentiArt lexicon and the manually annotated valence. However, it is important to acknowledge instances where disparities emerged between the lexicon-based approach and human annotators. Notably, the lexicon often ascribes sentiment to isolated words or brief phrases, including proper names of individuals, which conventionally lack inherent affective connotations. In-depth examination of specific examples offered insights into the nature of these disparities. Instances where the lexicon contradicted human annotators predominantly comprised single words or concise phrases, frequently featuring exclamations or brief interrogatives. Challenges were apparent in capturing the nuances of sarcasm and ambiguity accounting for the impact of surrounding sentences. The lexicon-based approach, being contextually unaware and reliant on individual word encoding, does not accommodate the contextual factors and the tone conveyed in these instances, which was on the other hand retained by human annotators who read the novel as a whole. Disagreement in human/machine annotation pointed us to some elements of the novel that are foregrounded not in a statistical, but in an interpretative level. A stylistically informed hermeneutic close reading of the ‘study’ revealed it as a friendly space within the overall hostile environment of ‘Frankfurt’ points to transformative spaces on both sides of the urban/nature divide (in the mountains, Heidi eventually joins the school in the village), which resist the main narrative rendering complexity and realism, allowing to sketch out a vision of a more integrated modern society. To conclude, we believe this study enriches our understanding of the spatial and sentiment dimensions of the novel Heidi and shows the potential of tailored lexicon based spatial and sentiment approaches. The prevalence of rural spatial terms underscores the novel’s thematic emphasis on the protagonist’s harmonious connection with nature. The sentiment analysis utilising the SentiArt lexicon successfully aligns with the expected affective trajectory within the narrative.
The disparities between the lexicon-based approach and manual annotations (as well as the ‘contradictory’ study scene) also highlight the necessity of integrating computational methods with human insights for a more comprehensive sentiment analysis of literary texts. There are inherent limitations to relying solely on computational methods. Human annotators bring contextual awareness, cultural understanding and interpretive skills that enable them to capture the nuances, irony, sarcasm and subtleties embedded within literary texts. Integrating human insights and interpretations with computational methods can lead to a more comprehensive sentiment analysis, enhancing our understanding of the intricate emotional dynamics at play in a narrative.
Future research could explore hybrid approaches that combine computational sentiment analysis techniques with expert human annotations. This would involve refining sentiment lexicons by incorporating human judgements and considering the context in which sentiment is expressed. By leveraging the strengths of both computational methods and human interpretation, researchers can gain a deeper understanding of the emotional dimensions of literary works and enrich literary analysis and interpretation. Furthermore, extending this research to other literary texts should provide more generalizable insights into how sentiment and spatial representations vary across different genres, time periods or cultural contexts. Investigating the interplay between sentiment and other narrative elements, such as character development, plot structure or thematic motifs, can deepen our understanding of how emotions are evoked and expressed in literature.
In conclusion, integrating computational methods with human insights holds exciting potential for advancing sentiment analysis in literary texts. By acknowledging the complementary nature of computational and human approaches, researchers can enhance the accuracy and depth of sentiment analysis, contributing to the fields of literary analysis, literary theory, and computational linguistics.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was carried out within the project High Mountains Low Arousal? Distant Reading Topographies of Sentiment in German Swiss Novels in the early 20th Century, and funded by the Swiss National Science Foundation (SNSF), with additional funds from the German Research Foundation (DFG) within the framework of the Collaborative Research Centre ‘Practices of Comparing: Ordering and Changing of the World’ (SFB 1288). Our thanks go also to The University of Bielefeld (Germany), and Cambridge Digital Humanities (CDH) at The University of Cambridge (UK) for their support.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
