From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts

Abstract

Word Rain is a development of the classic word cloud. It addresses some of the limitations of word clouds, in particular the lack of a semantically motivated positioning of the words, and the use of font size as a sole indicator of word prominence. Word Rain uses the semantic information encoded in a distributional semantics-based language model – reduced into one dimension – to position the words along the x-axis. Thereby, the horizontal positioning of the words reflects semantic similarity. Font size is still used to signal word prominence, but this signal is supplemented with a bar chart, as well as with the position of the words on the y-axis. We exemplify the use of Word Rain by three concrete visualization tasks, applied on different real-world texts and document collections on climate change. In these case studies, word2vec models, reduced to one dimension with t-SNE, are used to encode semantic similarity, and TF-IDF is used for measuring word prominence. We evaluate the technique further by carrying out domain expert reviews.

Keywords

Word cloud tag cloud text visualization digital humanities climate change data text and document data

Introduction

The word cloud is a frequently used method for visualizing text content.¹ In its arguably most common form, words are arranged with a layout that packs the words to minimize the overall area used, that is, with a layout that lacks semantic meaning. Words are typically displayed with a larger font the more frequently they occur in the text, see Figure 1 for an example.

Figure 1.

An example of a classic word cloud generated at wordclouds.com for the IPCC report “Climate Change 2021: The Physical Science Basis, Summary for Policymakers.”

One of the reasons for the popularity of this visualization technique could be that there are several easy-to-use services for automatically generating word clouds from a text. Another reason for the popularity of the word cloud might be that the need to create a compact and static visualization of a text often arises, and a word cloud can provide such a visualization.

However, criticism has been raised against some of the visualization principles underlying the classic word clouds. For instance, it is easy to mistakenly infer a semantic interpretation of the positioning of the words, when no such meaningful interpretation exists.² To use font size as the sole indication of a word’s significance is also problematic. Longer words might then be perceived as more important, since their length make them take up more of the space in the word cloud.¹ Additionally, the lack of a semantically motivated word positioning makes it difficult to compare the content of two word clouds. Corpus comparison is a task that often arises within digital humanities, for example, when comparing the content of two corpora of different genres, or when visualizing difference over time in corpora that span multiple years or even decades.

Therefore, based on previous criticism of word clouds, as well as on previous research in which the classic word cloud has been developed with new features, we here present, exemplify and evaluate a refinement of the classic word cloud representation, which we call Word Rain.

We exemplify the Word Rain technique by applying it to three separate tasks: (i) Showing differences between texts belonging to different text sub-genres. (ii) Detecting temporal differences in a corpus that spans several years. (iii) Generating candidates for new words to add to a specialized glossary. The motivation and the main application scenario for this work are rooted in digital humanities, and explorations of different types of texts on the topic of climate change are used in all three case studies.

We have chosen climate change as the example topic since it is both highly discussed and current, with texts readily available from a number of different genres and eras. We also perform a review of the Word Rain technique with researchers who have different kinds of expertise within this domain.

Background and related work

The word cloud is a widely used text visualization^3–5 technique that can be traced back to the “psychological maps” by Milgram and Jodelet,⁶ as discussed by Viégas and Wattenberg.¹ Another prominent earlier example of word clouds can be found on the cover of the 1992 German edition of “Milles Plateux” by Deleuze and Guattari.⁷ The interest for this visual representation technique and its practical applications can be observed outside and within the academic environment, including the information visualization, visual analytics, and human-computer interaction research fields, where various usability concerns associated with the use of this technique are often raised. In the rest of this section, we discuss the relevant prior work on design, evaluation, and application of word clouds that will provide foundation for our proposed technique.

General design considerations

The history of the word cloud technique – and, in particular, its application as tag clouds for web page tags and search terms – was described by Viégas and Wattenberg,¹ who highlighted several issues with this technique from the point of view of information visualization, namely, (i) term length affecting the perceived emphasis/prominence, (ii) difficulty with looking up a specific term, (iii) difficulty with comparing different font sizes, and (iv) layout not representing semantic similarity. At the same time, Viégas and Wattenberg suggested that a different, “vernacular” visualization technique perspective may be more relevant when considering the applications of this approach rather than a traditional analytical one. Similar considerations were discussed by Hearst and Rosner,⁸ who considered the use of word/tag clouds as a signal of individual or collaborative activity rather than a tool of precise inquiry. The later work by Viégas et al. that discussed the use of the web-based Wordle tool for word cloud generation⁹ used the term “participatory visualization” to describe this phenomenon.

Still, the interest for empirical evidence regarding the use of word clouds and related visual representations for data analysis tasks led to a number of interesting research studies. Rivadeneira et al.¹⁰ mentioned several possible user tasks that may be supported by word/tag clouds (searching for a particular term, browsing, and matching the complete tag cloud), but focused specifically on the visual attributes that might affect the task of impression formation (i.e. overview of the main topics/contents of the data represented by the tag cloud). Some of the most important findings from their study include (i) the strong positive effect of larger fonts sizes on memory/recall; (ii) lack of effect of the layout on recognition and recollection of the tags; and (iii) suggestion of using a simple list ordered by tag frequency to facilitate the overview of categories/topics present in the data. Felix et al.¹¹ conducted user studies focusing on various dimensions of the keyword summary (incl. word cloud) representations design space in the context of tasks such as magnitude judgment (e.g. for the values encoded with font size), keyword search, topic matching, and topic discovery. The main outcome of their investigation was that no particular design clearly outperfomed the others across all tasks, and thus, the choice of a particular design (e.g. a spatial layout vs an ordered list of terms) should depend on the task; furthermore, for the magniture judgment task, the use of additional marks (such as bars) can improve accuracy. On the other hand, relying on the font size for magnitude judgment might lead to lower accuracy, but be beneficial for the completion time of keyword search. Alexander et al.¹² studied additional factors that might affect magnitude judgment, such as the length of the word as well as the width and height of the respective text label, and reached the conclusion that such biases can have a consistently negative effect – but typically with a small effect size. Finally, we should mention the study by Hearst et al.¹³ that focused on the particular case of word clouds for semantically grouped data, for example, particular topics or categories assigned to individual terms. Their results suggest that separating the respective groups of terms spatially or by distinctive color assignment is beneficial for the category/topic matching task.

Advanced techniques and applications

A number of improvements and specialized layout techniques have been proposed for word/tag clouds over the years. Hassan-Montero and Herrero-Solana¹⁴ described the steps for improving tag clouds that included (i) a variation of the TF-IDF weighting scheme for tags and (ii) tag clustering with a hierarchical version of the k-means algorithm. The resulting layout consists of a single column with multiple rows of tags, with similar tags included in the same cluster (row), and similar clusters located close to each other vertically. This approach also makes use of the font size to encode the weight of individual tags. Wu et al.¹⁵ proposed a different approach for improving word cloud representations by (i) computing the word co-occurence matrix, (ii) using dimensionality reduction (namely, multidimensional scaling¹⁶) to compute the preliminary layout, (iii) removing white space by seam carving and thus achieving a more compact layout, and (iv) explicitly representing the groups/clusters of words with bubble sets. The resulting approach thus preserves the semantic similarity between the terms as present in the data. Barth et al.² proposed further heuristic- and graph-based algorithms for improving semantic-aware layouts of word clouds, and Schubert et al.¹⁷ combined t-SNE with additional postprocessing steps to generate semantic-aware word clouds. In order to address the lack of semantic distance representation, Gambette and Véronis described the Tree Cloud approach that makes use of the semantic distance matrix and produces a visual representation similar to a phylogenetic tree¹⁸; the resulting tree layout potentially introduces a lot of white space, however. Besides the actual semantic similarity, support for varying lexical forms is provided by some approaches, for example, Prefix Tag Clouds by Burch et al.¹⁹ laid out prefix trees to indicate the use of terms such as “visual”+“-isation”, “-ize,” etc. in the provided data.

Another direction of research has focused on custom layouts and novel representations related to word clouds. Some of the earlier work on this problem includes the studies by Seifert et al.²⁰ that compared several alternative strategies for fitting a word cloud layout into an arbitrary polygon shape (e.g. an octagon). One of the main findings from the respective user studies is that a more intrusive strategy of initially truncating text labels led to better results in terms of user task completion, but was rated the lowest with respect to the esthetics, thus contributing to the discussion of design trade-offs for word clouds. Chi et al.²¹ proposed an approach that combined shape morphing with word clouds, which allowed the users to specify custom shape outlines for the word cloud layouts (e.g. a word cloud could be generated within a shape of an apple or a human outline). Furthermore, this approach would allow a series of word clouds with morphed layouts to be generated for a temporal document collection. Feng et al.²² took this idea even further, with their approach of automatically searching and deciding on an appropriate image that would represent the concept corresponding to the most prominent term in the input text data; the foreground of the respective image then serves as the shape for a custom word cloud layout (e.g. resulting in a word cloud in the shape of a chicken for restaurant menu texts). Besides the automatic layout customization, further options to customize, enrich, and (hopefully) improve word cloud representations could include the use of user-driven customizations,^23,24 typographic attributes,²⁵ word-scale/sparkline representations,^26,27 or even node-link representations, such as word webs by Baumer et al.²⁸

Finally, besides the techniques focusing on the word cloud as the final result in itself, the prior work in information visualization and visual analytics has proposed various approaches for the actual use of word clouds for analytical tasks (mainly for text analysis). For example, WordBridges by Kim et al.²⁹ used word clouds to represent details for particular network nodes and edges within a node-link diagram: here, a co-authorship network for several researchers could be represented while demonstrating word clouds based on the publication keywords for an individual researcher (nodes) and co-authored publications (edges). Besides bibliographic data, applications of this approach were demonstrated for intelligence reports and works of fiction. ProjCloud by Paulovich et al.³⁰ combined the ideas of applying dimensionality reduction¹⁶ for preserving the semantics while allowing for complex non-convex polygonal shapes to be used for word clouds, if demanded by the user. One interesting advantage of their approach lies in the ability to start with a more compact projection plot based on dimensionality reduction for documents and then compute & lay out a word cloud for the respective document contents within the polygonal selection area chosen by the user (e.g. with a lasso selection tool). The use of dimensionality reduction to reveal clusters of terms (i.e. topics) in large document collections was also adopted by TexTonic by Paul et al.,³¹ which combined a spatial metaphor to represent the overall landscape of clusters with a force-directed layout of term labels within the clusters. Compare Clouds by Diakopoulos et al.³² assigned the horizontal positions of individual term labels based on their use in one or both of the specified input corpora in order to reveal, for instance, the patterns of term use on mainstream media versus blogs.

Several approaches have focused on word clouds / keyword summaries for temporal (or at least ordered) document collections, such as Parallel Tag Clouds by Collins et al.,³³ which used an ordered series of column layouts; tag clouds with time-varying co-occurrence highlighting by Lohmann et al.,³⁴ which combined a row layout and sparkline stacked bar charts; or SparkClouds by Lee et al.,³⁵ which used a generic word cloud layout (the alphabetically ordered cloud was demonstrated in the respective article) with sparkline line plots attached to each individual term label. While these approaches aim to provide a data overview over the complete time range, the Fisheye Word Cloud technique by Wang et al.³⁶ made use of the focus + context strategy instead. Their implementation took the position of the mouse pointer (controlled by the user) over the temporal axis into account and generated a spiral word cloud layout around that point; thus, the users could navigate over the horizontal axis to observe the changes in the data over time, as presented by the updated word cloud contents. Focus + context was also used by Liu et al.³⁷ in order to update the word cloud representation based on the user-selected data facet, and by Heimerl et al.³⁸ in their text- and topic-exploration tool Word Cloud Explorer, which supported interactive user control for a number of text pre-processing steps (e.g. filtering by a stop word list and part of speech). Attention to the use of terms both within a particular document and across the document collection was also given in the techniques such as RadCloud by Burch et al.³⁹ and MultiCloud by John et al.,⁴⁰ which typically indicated the use of terms in particular groups of documents while producing a layout with an elliptical or circular boundary. Finally, PyramidTags by Knittel et al.⁴¹ provided context-, time-, and word order-aware layouts that resembled pyramids or upwards-pointing triangles aligned (to some extent) with a timeline displayed beneath the main representation.

Further examples of the use of more or less traditional tag/word clouds as additional views within visualization tools (e.g. word clouds used as a supplementary view for metro map-inspired representations⁴²) or customization of node/link label attributes within dimensionality reduction or network visualization approaches (as used by FacetAtlas,⁴³ for instance) can be found in the literature, but these topics are beyond the scope of our study.

Word clouds in the wild

The popularity of tag clouds during the rise of Web 2.0 and availability of both offline and online word cloud generator tools such as Wordle,⁹ which allowed quite a large degree of customization to the end users with respect to the font families, colors, etc., led to the wide adoption of this visualization technique among the general public, but also across various academic environments (word clouds can support, for instance, investigation of the language use and lexical semantic change over time⁴⁴). As mentioned above, the prior work has discussed this phenomenon and proposed several feasible explanations as well as perspectives on the use of word clouds^1,8; such perspectives could be related to the existing discussions of the need to consider not only the analytical perspective in information visualization,⁴⁵ as well as the preference of even advanced users (with respect to the level of technical knowledge) for more straightforward techniques (“simple is good”).⁴⁶ Nevertheless, the question of whether the target audience always understands the encoding used for word clouds created by themselves or other authors as opposed to appreciating its esthetics without focusing on the contents remains generally unanswered. In relation to this, ongoing discussions of the concept of “visualization literacy”⁴⁷ should also be mentioned, in addition to the word cloud-specific work discussed above.

The existing authoring tools for web clouds include, among others, a number of freely available methods for automatically generating standard word clouds, both web pages and programing packages (e.g. http://amueller.github.io/word_cloud/). Figure 1 shows an example word cloud generated from one of the many services available (More specifically from https://www.wordclouds.com/). These services often provide several configuration parameters, but still typically adhere to the main principles of the standard word clouds. That is, to indicate word prominence (based on word frequency) with font size, and to not provide any semantic justification for how the words are positioned. Additionally, some of the more advanced tools developed within the information visualization community are made available in form of the source code, offline applications, or online tools (e.g http://wordcloud.cs.arizona.edu/²).

The applications of word/tag clouds “in the wild” include the roles of (i) graphical user interface elements, allowing for the discovery of particular web pages or data items; (ii) esthetic infographic representations of data; and (iii) actual tools for providing an overview of potentially large text documents/corpora. The latter case can also be related to interactive exploration of social media data⁴⁸ and close/distant reading methodologies in digital humanities⁴⁹; here, the use of word clouds is supported by a number of popular existing software suites, including Voyant Tools by Sinclair and Rockwell,^50,51 for instance.

Word Rain: Data processing and visual encoding steps

We mainly focused on addressing two of the problems identified in previous research, which were also mentioned in the introduction. These are: (i) that word positions lack semantic meaning (and the resulting difficulties in comparing word clouds), and (ii) that font size as the sole prominence indicator results in longer words being perceived as more important than shorter ones.

Important for any new version of word clouds is to still retain as much of the simplicity as possible of the traditional word cloud. In particular, simplicity in that it should be possible to provide all of its visualization features in the form of a clear static image. That is, a static image that could either be provided digitally, or on a printed paper or poster, and that fills the same role of providing an overview of textual content as the traditional word cloud does today. In addition to addressing the two limitations mentioned above, this has been our main aim when designing Word Rain.

In Figure 2, the new design is summarized, and the difference between the classic word cloud (left) and the Word Rain representation (right) is shown. (I) Words in the word cloud are positioned without taking semantics into account, while Word Rain positions similar words close to each other on the horizontal axis. The semantically similar words A1–A3 are positioned close to each other, as are the semantically similar words B1–B2. This is achieved by using the information encoded in a multidimensional language model⁵² built on distributional semantics, and projecting the model into one dimension⁵³ to use as the word’s position on the x-axis. (II) In the typical word cloud to the left, word prominence is only indicated by font size. In word rains, font size is still used as a prominence indicator. Prominence is, however, also indicated by the height of bars connected to the word, and by the vertical position of the word.

Figure 2.

Illustrative examples of a typical word cloud (left) and our proposed Word Rain representation (right). In these examples, terms A1–A3 are semantically close, while being distant from terms B1–B3 and somewhat distant from terms AB1–AB3. Numerical weight (25–100 in these examples) indicates the prominence of the respective term.

The vertical structure of the visualization, with words of lower prominence being positioned below those of higher prominence, gave us the impression of words falling downwards, as if they were raining, hence the name “Word Rain”.

Figure 3 shows an example of a word rain generated from an IPCC report summary. In the following sections, we will in more detail describe the technique for generating word rains.

Figure 3.

A word rain for the IPCC report “Climate Change 2021: The Physical Science Basis, Summary for Policymakers.” English stop words from NLTK was used, as well as a manually compiled stop word list, which, for example, included the words “likely” and “confidence.” The lines 1–3 and arrows a–d are added for explanatory purposes and are not part of the graph. The lines exemplify groups of similar words, that is, words signifying increase/decrease (1), “co₂”/“emissions”/“carbon”/“greenhouse gases” (2), and “sea”/“ocean” (3). The letters indicate: the bar connected to the most prominent word (a), a highly populated semantic region (b), the words “decade”/“century”/“surface” which exemplify when words have (and have not) been moved downwards to avoid collision (c), and finally the connecting line to “policymakers”/ “summary_policymakers” (d).

Determining which words to display

The most basic principle of word clouds is retained in the Word Rain technique, that is, to extract the most prominent words of the text collection, and include these in the visualization.

As the default method for creating a word rain, we determine a word’s prominence by its TF-IDF-value (term frequency, inverse-document-frequency),⁵⁴ that is, by the number of times a word occurs in a text divided by the total number of texts in which the word is present. The inverse document frequency can by configuration either be (i) computed based on the presence of the word in a number of documents, that each of them is visualized by a word rain, or (ii) by calculating the document presence of the word in a background corpus that is not visualized, but has the sole purpose of providing inverse-document-frequency statistics. It is also possible to configure the visualization functionality to use raw term frequency (without inverse document frequency) as the measure of word prominence.

The user can configure how many (i.e. the top $n$ ) words to include in the word rain. It is also possible to specify a list of stop words, that is, (typically frequent) words that are to be excluded from the visualization.

Horizontal word positioning

The type of model used to achieve a semantically motivated word positioning is a language model built on distributional semantics. That is, on the notion that words with a similar meaning, for example, “tea” and “coffee,” often occur in similar contexts.⁵⁵ The distributional semantics data representation is captured by multidimensional vectors.⁵⁶ By applying dimensionality reduction,¹⁶ it is possible to retain some of the information of the model when projecting it into a one-dimensional space. The scalar representing the word in one-dimensional space is used for positioning the word on the x-axis. Thereby, semantic closeness of words will be represented by horizontal closeness in the Word Rain visualization.

We use a word2vec model⁵² to encode the distributional semantics information, and a t-SNE dimensionality reduction⁵³ to reduce the information to one dimension. How the horizontal position of the words is determined is described in more detail in steps 1–5 in Algorithm 1.

Algorithm 1 Word rain algorithm. Here, “document” is used to refer to the text(s) for which the user wants to plot one word rain. The word rains generated during the same run of the algorithm will share the same $\bar{x}$ , i.e. a word occurring in several word rains will have the same position on the x-axis in all generated word rains.
1. From the user, collect: • ${\bar{d}}_{all}$ , a list of documents to visualize • $n$ , the maximum number of words to include from each document 2. Let ${\bar{w}}_{all}$ be an ordered set (all words that are to be visualized for all documents) 3. For each document, $d$ in ${\bar{d}}_{all}$ : (a) Calculate prominence for each unique word in $d$ (e.g. with TF-IDF) (b) Let ${\bar{v}}_{d}$ be a list of tuples (c) For each word $w$ in the top $n$ most prominent words: i. Add the tuple ( $w$ , $p$ ) to ${\bar{v}}_{d}$ , where $p$ is the prominence of $w$ ii. Add $w$ to ${\bar{w}}_{all}$ 4. In a matrix A, store vectors (from a word2vec model) that correspond to the words in ${\bar{w}}_{all}$ 5. Reduce the dimensionality of the vectors in A to one dimension using t-SNE dimensionality reduction, and name the result $\bar{x}$ ( $\bar{x}$ now contains a scalar for each word in ${\bar{w}}_{all}$ ) 6. For each document, $d$ in ${\bar{d}}_{all}$ : (a) Create an empty canvas on which to plot the word rain for $d$ (b) Sort the tuples in ${\bar{v}}_{d}$ according to decreasing prominence value (c) For each word, $w$ , and prominence, $p$ , in ${\bar{v}}_{d}$ : i. $x \leftarrow {\bar{x}}_{i}$ , where $i$ is the position of $w$ in ${\bar{w}}_{all}$ ii. $y \leftarrow 0$ iii. While plotting $w$ at $(x, y)$ would collide with a previously plotted word: • Decrease $y$ with the height of the colliding word iv. Plot the word at $(x, y)$ in a font size corresponding to $p$ v. Plot a bar in the bar chart at position $x$ , with a height corresponding to $p$

Algorithm 1 Word rain algorithm. Here, “document” is used to refer to the text(s) for which the user wants to plot one word rain. The word rains generated during the same run of the algorithm will share the same

\bar{x}

, i.e. a word occurring in several word rains will have the same position on the x-axis in all generated word rains.

1. From the user, collect:
•

{\bar{d}}_{all}

, a list of documents to visualize
•

n

, the maximum number of words to include from each document
2. Let

{\bar{w}}_{all}

be an ordered set (all words that are to be visualized for all documents)
3. For each document,

d

{\bar{d}}_{all}

:
(a) Calculate prominence for each unique word in

d

(e.g. with TF-IDF)
(b) Let

{\bar{v}}_{d}

be a list of tuples
(c) For each word

w

in the top

n

most prominent words:
i. Add the tuple (

w

p

) to

{\bar{v}}_{d}

, where

p

is the prominence of

w

ii. Add

w

{\bar{w}}_{all}

4. In a matrix A, store vectors (from a word2vec model) that correspond to the words in

{\bar{w}}_{all}

5. Reduce the dimensionality of the vectors in A to one dimension using t-SNE dimensionality reduction, and name the result

\bar{x}

(

\bar{x}

now contains a scalar for each word in

{\bar{w}}_{all}

)
6. For each document,

d

{\bar{d}}_{all}

:
(a) Create an empty canvas on which to plot the word rain for

d

(b) Sort the tuples in

{\bar{v}}_{d}

according to decreasing prominence value
(c) For each word,

w

, and prominence,

p

, in

{\bar{v}}_{d}

:
i.

x \leftarrow {\bar{x}}_{i}

, where

i

is the position of

w

{\bar{w}}_{all}

ii.

y \leftarrow 0

iii. While plotting

w

(x, y)

would collide with a previously plotted word:
• Decrease

y

with the height of the colliding word
iv. Plot the word at

(x, y)

in a font size corresponding to

p

v. Plot a bar in the bar chart at position

x

, with a height corresponding to

p

Examples of horizontal positioning can, for instance, be seen in Figure 3. Words signifying increase/decrease are positioned around the horizontal position 1 (Figure 3). The words “co₂,”“emissions,”“carbon,” and “greenhouse gases” are positioned around position 2 (Figure 3), and the two words “sea” and “ocean” are positioned very close to each other at position 3 (Figure 3). When using the same t-SNE projection for a number of texts that are to be compared, it is possible to use the position on the x-axis for comparing the texts. For the three texts in Figure 4, the same t-SNE projection was used, and the word positioning can therefore be used for comparing the text content. This is also the case for the two texts in Figure 6.

Figure 4.

Word rains for Swedish translations of two IPCC technical reports from 2019: (a) Special Report on Climate Change and Land,⁶⁵ (b) Special Report on the Ocean and Cryosphere in a Changing Climate,⁶⁶ (c) A report on thought structures that hinder climate change mitigation.⁶⁷ The lines indicate clusters of prominent words for Figure A and B (2, 4, 5 and 6), and Figure C (1 and 3), respectively.

Figure 5.

Word rain for the most frequent words (stop words excluded) in the entire corpus of climate change editorials from Nature.

Vertical word positioning

The vertical position of a word is adapted to the position of other, more prominent words of the cloud. That is, if the x-coordinates of two words are such that the words overlap, the least prominent of them is moved downwards (i.e. “raining down” in the figure) until a position is found where the word does not collide with a more prominent one. Thereby, the more prominent a word, the higher up on the x-axis it is typically positioned.

For instance, at the area at point c in Figure 3, the word “century” has been moved down, not to collide with the more prominent word “surface.” The word (or more specifically bigram) “per_decade,” on the other hand, does not collide with a more prominent one (since “century” has been moved downwards), and retains a high position in the word rain.

How the vertical word position is determined is described in more detail in step 6b–6c in Algorithm 1.

Despite the criticism raised against using font size as an indication of prominence, we decided to keep it as one of the prominence indicators. Partly because the prevalent use of font-size-based word clouds has made font size somewhat of a standard indicator of prominence. But also, and more importantly, because the use of different font sizes together with a semantically meaningful word positioning makes it possible to support the standard “overview first, then details on demand” workflow.⁵⁷ When starting to look at the word rain, words displayed with a large font will stand out and provide a quick overview of the most important content. It is thereby possible for the user to search for areas of the word rain with semantically interesting content and to focus the attention on those areas. This functionality is particularly powerful when a word rain is viewed on a computer screen, rather than on paper. The less prominent words might then be displayed in a font size too small to read when the image is shown in its original size, and details in semantically interesting areas can be explored by zooming in on and enlarging these areas (In order not to make the examples here too cluttered if printing them on paper, fewer words are included than if the word rains would be produced to read solely on a computer screen). For instance, the user might find “co2” and “carbon” at position 3 in Figure 3 interesting, zoom in on the region around position 3 and find, for example, “aerosols” and “greenhouse_gases.” The usefulness of the semantic positioning is particularly evident when comparing the word rain to the type of classic word cloud shown in Figure 1. This word cloud similarly includes words displayed with a very small font. However, since their position in the cloud is random, there is no possibility to explore them by zooming in into interesting areas.

The bar chart and connecting lines

Since the vertical positioning of a word does not follow directly from its prominence value, but is also dependent on its interaction with other words in the cloud, the y-coordinate of a word does not suffice for showing its importance. A bar chart with vertical bars above the words, with a height proportional to the word’s prominence, is therefore also used as word prominence indicator. Each word is linked with its corresponding bar in the bar chart with a semi-transparent line from the upper-left corner of the word. In Figure 3, point a is the highest bar, which is connected to the word “global,” and point b is a cluster of bars which indicate that there are a number of words with high prominence in that semantic region, as well as many non-prominent words. Finally, point d exemplifies a connecting line to the words “policymakers” and “summary_policymakers.” To emphasize the semantic x-axis, the bar chart (as well as the lines connecting the words to the bar chart) is colored with a gradient where the color corresponds exactly to the x value, thus the color coding is not designed to provide any independent information. A color gradient from turquoise to magenta was chosen, which maintains a reasonable contrast to the white background – also when printing in black and white – and which avoids a wide color range that might distract the readers (however, we acknowledge that further considerations for the choice of this supplementary color map⁵⁸ can be considered part of future work). The semantic similarity/dissimilarity of words is thus double-encoded by position on the x-axis and by the color of its corresponding bar in the bar chart.

Additional configuration possibilities and code availability

For all user-defined parameters, except which corpus and word2vec-model to use, a default configuration is provided. There are, however, a number of additional possibilities for configuring the Word Rain technique. For instance, which maximum font size to use and how much to move the words on the vertical axis when they collide. These two parameters, together with the number of words to show, regulate the airiness/clutter of the visualization (If there are uninteresting words that receive a high prominence value and thus will be displayed with a large font, adding these to the stop word list will reduce the clutter).

Another parameter is the option to extract and visualize n-grams⁵⁹ instead of, or in addition to, individual words. For n-grams, the word2vec-vector corresponding to the last word in the n-gram is used. This often works well for languages where the specifying word comes before the more generic word in a compound (e.g. adjective before noun), as is the normal case in Germanic languages like English, but might have to be adjusted for other languages.

Code for generating the word rains is available with an open license on GitHub (https://github.com/CDHUppsala/word-rain). The current implementation relies on Matplotlib⁶⁰ for rendering.

Apart from the code on GitHub and the text collection(s), a word2vec model for the language of the texts is needed for generating the word rains. This model could either be trained on the corpora for which word rains are generated, if these corpora are large, or on another corpus (For the word rains generated here we used a Swedish word2vec model found at http://vectors.nlpl.eu/repository/ and an English model that is now found at https://huggingface.co/fse/word2vec-google-news-300).

The word clouds are produced as fully searchable PDF files. Thereby, it is also possible to explore the cloud through searching for a particular term of interest and then, for example, explore the semantic neighborhood of that term. The word cloud generation functionality aims to be language independent, although the quality of the word2vec-models required for the visualization will be lower for languages with only a small amount of digital text available. The division of texts into tokens and n-grams relies on word boundaries being indicated by white space. Written languages for which this is not the case, for example, Japanese and Chinese, therefore require a pre-processing step where white space is inserted between tokens.⁶¹ We have so far tested the functionality on texts written in English, Swedish, and Yiddish (using the Hebrew alphabet).

IPCC report example

The example in Figure 3 shows a word rain for the IPCC report “Climate Change 2021: The Physical Science Basis, Summary for Policymakers” (We extracted plain text from the PDF using the Poppler package https://poppler.freedesktop.org).⁶² We configured the visualization to include the 300 most prominent words or bigrams, where prominence was based on TF-IDF. For calculating the background document frequency statistics, we used paragraphs from a small subset of “the British National Corpus.”⁶³ Each paragraph in the background corpus was counted as a document for the inverse document frequency statistics. We use the English stop word list from the NLTK Python programing package⁶⁴ as well as a list of seven manually compiled stop words (The entire list: “confidence”, “spm”, “10”, “11”, “12”, “13”, “figure”, “likely”).

Example applications for Word Rain

We applied the Word Rain visualizations to three separate tasks: (i) To visualize differences between different sub-genres, (ii) to visualize differences between documents over time, and (iii) to visualize the coverage of a lexical resource on a document type.

Differences between sub-genres

Many different kinds of reports – and other types of texts – on the topic of climate change are produced. We generated word rains for three different climate change reports: Swedish translations of two IPCC technical reports from 2019, Special Report on Climate Change and Land,⁶⁵ and Special Report on the Ocean and Cryosphere in a Changing Climate,⁶⁶ as well as for a report on thought structures that hinder climate change mitigation.⁶⁷ That is, two technical reports describing effects on nature from climate change and one report within a very different sub-field, the field of rhetoric/behavioral science.

The 300 most prominent words, based on TF-IDF, were included in the word rains. Since the aim was to compare the content of the three reports, only word occurrence in these three documents was used for calculating the IDF value. That is, in contrast to the previous example on English text, no background corpus was used for calculating document frequency statistics.

The same t-SNE projection was used for all three visualizations, that is, a t-SNE projection that included the vectors corresponding to all words that were to be included in the three word rains. Thereby, the horizontal word positioning of the three visualizations shares the same semantic interpretation, and can thus be compared.

In Figure 4, the resulting word rains for the three reports are compared. The result of the visualizations is that the “word rain profiles” for the two technical IPCC reports are very similar (Figure 4(a) and (b)). Both have their most prominent words positioned at the right end of the graph (at positions 5 and 6), while moderately prominent words are positioned in two clusters in the middle of the word rain (positions 2 and 4). It is particularly interesting that the two profiles are very similar, despite their most prominent words being different. With the Word Rain technique it seems to be possible to illustrate semantics on a more abstract level than the word level.

The profile for the human thought structure report has a very different profile (see Figure 4(c)). The areas important for the other two reports are almost empty for this report. Conversely, most words from report C are positioned around positions 1 and 3, areas that are almost empty for the two technical reports. We, by no means, claim that these examples show that the word rain profile can be used for giving an objective measure of the similarity or differences between texts. In contrast, the examples given here were actively selected based on them being two texts belonging to a narrow sub-genre of climate change reports, and another text belonging a totally different sub-genre. But we do claim that a word cloud, in which the words are given a semantically meaningful positioning, offers the possibility to provide useful information on the semantic content of the text. It is particularly valuable that such a word cloud might provide semantic information on a more abstract level, that is, information that lies beyond the individual words. For the standard word cloud, which does not employ any meaningful semantic information for the word positioning, this opportunity is lost.

Differences over time

For evaluating the Word Rain visualization on the task of showing differences in the content of a corpus over time, we used a previously compiled corpus of editorials. The corpus consists of editorials from Nature and Science that contain climate change related keywords and that were published between the years 1969 and 2016.

The corpus is divided into six eras centered around the publication of the IPCC assessment reports. The corpus is manually annotated for eight different frames used by the editors when discussing climate change, for example, if solving climate change is framed as a technological challenge, an economical challenge, or a scientific challenge.⁶⁸

It has also previously been shown that many of the general trends found by the manual annotations were possible to detect by applying automatic methods, in the form of topic modeling, to the corpus.⁶⁹

We here used only the editorials from Nature, and applied the same era divisions used in the two previous studies. For each era, we created a word rain, using the same t-SNE projection to make them comparable. We tried different word rain configurations, for example, regarding number of words to show, cut-offs for maximum and minimum word occurrences, and whether to use TF-IDF or raw frequency. We here include results from the configurations we found most useful, which was to use raw term frequencies together with an extensive stop word list produced in the previously mentioned topic modeling-based study. The top 300 words were included. Figure 5 shows the word rain for the entire period and Figure 6 shows visualizations for two of the six eras (1988–1992 and 2006–2010). For the era-specific visualizations, words that occurred in all of the six eras are excluded. Thereby, the visualization is more focused on words that stood out for one or several eras.

Figure 6.

Word rains for the eras 1988–1992 and 2006–2010 in the corpus of climate change editorials from Nature. While Figure 5 shows the most frequent words for the entire corpus, the most frequent words for each era are shown here, but with words occurring in all eras excluded. New words, that is, words that have not occurred in a previous era in the time series visualized, are presented in bold and marked with an asterisk (*). Line 1 indicates words related to types of energy, which only occur in the 2006–2010 era, and 2 indicates words signifying person and place names in both eras.

To give an additional indication of novelty in a new era, we implemented the functionality of presenting new words in bold and marking them with an asterisk (*). That is, if a word has not been present in the visualization series for a previous time era, this word is marked with an asterisk. For instance, there is something that is starting to be a disappointment in the era 1988–1992, while discussions about “polar” and “arctic” are becoming important in the 2006–2010 era. The asterisks, however, are only meant to give new words extra focus, and not as a replacement for comparing the graphs themselves. Since a specific word always appears on the same x-coordinate within a time series, it is always possible to look for the word or word cluster manually in graphs from previous eras.

From the all-eras (see Figure 5) word rain, it can be seen that a few words are very dominating, mainly words related to science, but also the words “energy”/“power” and “emissions.” Although relevant conclusions, the same information could have been provided with a standard word cloud. When comparing the two eras, the benefit of the word rain becomes more evident. It can be seen that both have a spike in the right-most region (Figure 6, position 2), with words signifying person and place names. When applying the “details on demand” workflow and zooming in, it is evident that the names and places vary. In the first era, Margaret Thatcher is frequently mentioned, as well as the Amazon, and the place for the important climate change summit is Rio. Barack Obama is instead the most frequently mentioned person in the second era and the place for the climate summit (or treaty) changes to Kyoto. Such semantic-based comparisons would have been difficult to make if using a classic word cloud.

It can also be seen that the era 1988–1992 has a very evident gap (Figure 6, position 1) just to the left of the right-hand name spike. This gap is filled in the 2006–2010 era, mainly with words related to types of energy, for example, “renewable energy,”“wind energy,”“biofuels” and “energy efficency,” but also with the words “republican(s).” Other possible topics specific to the second era, albeit less evident, are “polar/arctic/ocean” and “forest/deforestation/ecosystems.” There are differences between the two eras that might also have been found in a classic word cloud, for example, that “stewardship” becomes important in the 1988–1992 era, while “leader(s)/leadership” is used in the later era. However, the horizontal alignment makes it much easier to find the same word in different eras. Another example is that “population growth” is important in the first era, but not in the second.

The methods applied here for comparing the different eras could be seen as complimentary to the above-mentioned topic modeling-based methods that have previously been applied to this corpus of climate change-related editorials. Some observations could be made using both methods, for example, the larger interest in renewable energies in the later era, as well as in polar/arctic research and in forest/biodiversity. Other observations could only be made when using the Word Rain visualization technique, and not by the topic modeling-based approach, such as the comparison of which names that were most common in the different eras. However, for the specific task of replicating the high-level trends for how climate change is framed, which had been manually annotated by Hulme et al.,⁶⁸ the topic modeling-based methods were more useful.

Coverage of a specialized glossary

For investigating the usefulness of the Word Rain technique for visualizing the coverage of a specialized vocabulary, we again used the IPCC report “Climate Change 2021: The Physical Science Basis, Summary for Policymakers.”⁶² This time, we used it for investigating the coverage of the GEMET (GEneral Multilingual Environmental Thesaurus) glossary (https://www.eionet.europa.eu/gemet/en/exports/rdf/latest) on the domain-specific words of the report, and for suggesting new words that might be relevant to add to the glossary.

To capture the words most typical to the domain of the report, and less typical to the general language, we made some changes to the configuration used for the word rains described above. Also for this experiment, we used paragraphs from a small subset of “the British National Corpus,”⁶³ and let each paragraph represent a background document. Using this corpus, we applied a cut-off that allowed words to occur in a maximum of 1% of the documents (= paragraphs) in the background corpus in order to be included in the word rain. The background corpus used consists of around 50,000 paragraphs, which meant that words occurring in more than around 500 paragraphs in this corpus of general English were excluded from the word rain. In contrast, the cut-off for the number of occurrences in the IPCC report was decreased to two occurrences. This decision was based on that also infrequent, but highly specialized, words might be relevant to add to a glossary. We also added numbers and years to the list of stop words, as those are typically not included in a glossary.

One graph was produced for unigrams and one for bigrams (see Figures 7 and 8, respectively). For the bigrams in Figure 8, we only included a small list of stop words, as many frequent words, otherwise classified as stop words, might be part of multi-word expressions. The visualization for unigrams in Figure 7 was configured to include a maximum of 3000 words and the one for bigrams to include 500 words.

Figure 7.

Unigram word rain visualizing the coverage for the GEMET glossary for the IPCC report “Climate Change 2021: The Physical Science Basis, Summary for Policymakers.”⁶² Words and expressions already included in the glossary are underlined and their corresponding vertical bar is displayed in a gray color.

Figure 8.

Bigram word rain visualizing the coverage for the GEMET glossary for the IPCC report “Climate Change 2021: The Physical Science Basis, Summary for Policymakers.”⁶² Words and expressions already included in the glossary are underlined and their corresponding vertical bar is displayed in a gray color.

The glossary coverage for the IPCC report was visualized by underlining the words and expressions included in the glossary and by displaying their corresponding vertical bar in a gray color.

To be able to claim that Word Rain is a suitable visualization technique for providing an overview of glossary coverage, the results must be more useful than straight-forward choices, such as simple lists of covered/not covered words, or by a standard word cloud.

The advantage of the Word Rain technique for this task is that it, very quickly, provides an indication of what kind of vocabulary that is covered/not covered by the glossary, which gives a deeper understanding than just a coverage proportion, or by coloring a classic word cloud. When analyzing the unigram visualization, it can be seen that the not-covered words in the very right-most part of the graph are typically names and places, and therefore should not be included. It can also be seen that the left part of the word rain (around 75% of the chart) is dominated by general words that would not typically be included in a glossary. It is therefore reasonable that GEMET only covers a few words in this area.

In contrast, there is a clear spike of interesting words positioned close to “climate”/“co₂” and “ocean,” of which relatively many are covered by GEMET. The “overview first/details on demand” can be used to zoom into such areas for which it is interesting to study what is covered or not on a more detailed level. The area of IPCC-report-relevant words could be used for generating suggestions for words to add to the GEMET glossary. As this area is likely to contain many relevant candidates, it is a faster method for generating suggestions for new words to include, than, for example, to scan through long TF-IDF-based word lists without any semantic sorting. Examples of possible candidate words found when zooming into the area are shown in the first section of Table 1. For the bigram visualization, fewer examples of GEMET glossary matchings were found. Still, spatially close to the few bigrams that were found in GEMET, a list of new candidate words for inclusion in GEMET could be compiled (second section of Table 1). To determine which of these candidate words that would be suitable to add to GEMET, domain knowledge is required. We here only present an unfiltered list of word candidates.

Table 1.

Unfiltered candidates for words to add to the GEMET glossary, unigrams followed by bigrams.

Glossary suggestions

Biogeochemical

Combustion

Cryosphere

Dioxide

ENSO (El NiñoSouthern Oscillation)

Forcings

Halocarbons

Interglacial

Irradiance

Meridional

Monsoon

N₂O (nitrous oxide)

Nitrous

NOx

Organic

Permafrost

Precipitation

Radiative

Salinity

Snowfall

Subtropical

TCRE

(Transient climate response to cumulative carbon emissions)

Tropical

Vapor

Volcanic

Anthropogenic forcing

Black carbon

Climate system

Climatological mean

Co₂ concentration

Cumulative emissions

Extreme weather

Interannual variability

Land carbon

(Ocean) heat content

Northern hemisphere

Radiative forcing (RF)

Sea ice

Solar irradiance

Southern oscillation

Surface temperature

Sustained warming

Temperature anomaly

Temperature extremes

Transient climate (response)

Tropical cyclone

Volcanic aerosols

Volcanic eruptions

Domain expert feedback

Evaluation/validation of various types of contributions in information visualization and visual analytics is an open challenge, discussed, among others, by Lam et al.⁷⁰ and Isenberg et al.⁷¹ As we have chosen the topic of climate change as the main topic for selecting the source text data to be processed, represented, and analyzed with the Word Rain technique in this study, including the case studies with IPCC reports and Nature editorials discussed above, we chose the domain expert review^72,73 as the main evaluation approach. The motivation for this choice is severalfold: (i) researchers who may actually use the Word Rain technique for representing and analyzing text data as part of their own research present an important target audience for us; (ii) such researchers most likely have at least passing knowledge/familiarity with the classic word/tag clouds, including the potential use of the technique in their field or their own contributions; (iii) the insights reported by the domain experts can complement some of our case studies and examples; and (iv) domain expert reviews are a viable option when focusing on the qualitative evaluation results (as we do not claim to provide quantitative evidence of the technique’s superiority with respect to effectiveness or efficiency for particular tasks, for instance).

Participants and protocol

While the precise definition of a “domain expert” varies in the literature,^72,74,75 our intention was to involve academic researchers who focus on the broad problem of climate change in their work with some level of involvement of textual data (rather than exclusively climate modeling, for instance). Furthermore, as a pilot study, we conducted the first session with one of the co-authors of this manuscript (participant 0), whose background is in digital humanities, including traditional and computer-assisted text analyses of specialized domain corpora. The rest of the participants were external to this study, although it should be disclosed that they are part of the professional network of the authors of this manuscript.

The individual sessions were conducted over Zoom and lasted for 1 h each, with two authors of this manuscript being present besides the invited participant. The protocol itself followed the format of a semi-structured interview, with the introductory questions focusing on the participants’ background as well as their self-reported knowledge of information visualization techniques (namely, word clouds) as well as computational text analysis methods (including dimensionality reduction and TF-IDF).

The three external participants all have different scientific backgrounds, more specifically backgrounds in geography, linguistics and meteorology. The first invited participant (P1) is a PhD student, P2 is a researcher, and P3 is a full professor. All of the participants currently focus on the topics of climate change with slightly different foci. P1 and P2 have reported passing knowledge of classic word clouds and lack of knowledge regarding dimensionality reduction/t-SNE and TF-IDF, while P3 has reported prior experiences from multiple projects involving visualization methods, several of which involved word clouds, as well as passing knowledge of computer-assisted text analyses (but not dimensionality reduction/t-SNE or TF-IDF in particular).

As part of the introduction, a brief overview of classic word clouds and the visual encoding of a word rain was given, based on the contents of Figure 2; as part of this, a brief description of the underlying computational methods (such as text preprocessing and dimensionality reduction) was provided. Afterward, the experts were shown a single word rain for an IPCC report from Figure 3, asked to describe its contents while thinking aloud, and provide further feedback as part of the discussion. Then they were presented with diachronic representations for 1988–1992 and 2006–2010 Nature editorials (see Figure 6), and afterward these two eras versus the complete collection (see Figure 5).

At the end of the session, the experts were asked about further remarks and suggestions, and finally asked to fill in the standard System Usability Scale questionnaire,⁷⁶ as generated by a web-based tool.⁷⁷ The rationale for including this questionnaire was to complement the semi-structured feedback with additional commensurable results; it should be noted that SUS focuses on users’ subjective perceptions of usability, rather than task-based effectiveness or efficiency measurements.^76,78

Feedback summary and suggestions for improvement

When exploring the single IPCC report word rain (Figure 2), the experts were able to point to words and areas that they expected in the text. Their attention was typically first caught by words displayed with a large font, and thereafter to semantically related words. When reasoning about the IPCC report word rain, the domain experts made connections between semantically similar words and their position in the graph, that is, reflections which they could not have made with a classic word cloud. It is, however, not immediately evident that these kinds of reflections would benefit an underlying task of gaining insights into the text. When comparing two texts, on the other hand, it was more evident that the semantic positioning facilitated comparison, for example, some commented on the subject of renewables only being present in one of the graphs, and it was evident that person and place names were mentioned in both eras, but that they were different names.

The domain expert interviews also resulted in a number of suggestions for improvement. The most important insight was that it was not self-evident that two graphs could use the same semantic t-SNE projection, that is, the same horizontal semantic projection. There is, therefore, a need to more clearly indicate that the x-axes of two or several graphs are connected. A possible solution might be to provide the graphs with the same type of vertical through-lines as used here for explanatory purposes (e.g. the lines indicating positions 1–6 in Figure 4). That is, to include such lines in the actual graph generated by the algorithm. Such lines could either be positioned at even x-axis intervals, or (as for the figures here) be used to indicate highly populated, or otherwise interesting, semantic areas.

On a related note, while the bars (located above the baseline) did not cause any confusion (with several potential suggestions from the participants to indicate y-axis ticks or to limit the number of bars displayed as user-configured options), the lines below the baseline that act as links between the bars and labels seemed to be more confusing. Several participants sought clarification and confirmation about these marks, and our suggestion to consider changing the style of these links (e.g. use a dashed line pattern, a curve rather than a solid line, or to emphasize endpoints) as part of future improvements was met positively by the participants.

One misunderstanding following from the layout of Figure 3 was that the most semantically central words were positioned in the middle of the graphs, and the less semantically central at the edges. Perhaps this interpretation, that the word rain depicts a scale between two semantic extremes, might have been emphasized by the specific color scale used. At the same time, there was some indication that the colors could help in interpreting and comparing the graphs, since the colors were used for referring to semantic areas by at least one of the experts when two graphs were compared. Therefore, colors should probably be kept for indicating semantics and forming a connection between several graphs with the same t-SNE projection, but the exact use of color could probably be developed further.

A side effect of our choice that words only rain down when there is a collision is that the algorithm fills in spaces above prominent words with less prominent ones (see e.g. the upper left corner of Figure 8, above “high confidence”). This has the effect that words are not strictly ordered by prominence along the y-axis. This caused confusion initially for some experts, and we will therefore provide a configuration option where, at each specific position on the x-axis, less prominent words are always positioned below more prominent ones.

One of the domain experts expressed that the information on the number of occurrences of a word would provide valuable information for understanding the text content. We have therefore started implementing a configuration option that generates a word rain which displays word occurrences in parentheses next to the word. This is a concrete, and easily interpretable (in contrast to TF-IDF), figure that complements the visual word rain information. Even though this increases the clutter due to the additional characters displayed, it can be a reasonable compromise for applications where this information is very valuable.

The corpora used for the expert review sessions were very small. Word2vec models, pre-trained on large general-language corpora, were therefore used for retrieving semantic vectors, instead of using models trained exclusively on the corpora visualized. The interviews showed that the basis for the semantic word positioning was another source of misunderstanding, as it is easily assumed that the positioning was based on the corpora investigated and not on general-language semantics. When possible, that is, when the corpora used are large enough, it might therefore be preferable to use word2vec models trained on the actual corpora that are visualized, perhaps even at the cost of a poorer semantic representation.

It should be noted that the sessions were conducted over Zoom with the figures displayed within slides, which might have had an effect on the perception due to the video stream compression and limited screen space (rather than being able to zoom in the PDF or look closer at a printed poster). However, when checking with the participants, remarkably, they reported being able to make out labels of rather small font sizes on their laptop or desktop monitors: for instance, one participant mentioned the labels “northern_hemisphere” and “increased” (bottom left of Figure 3) as being still comfortable to read, while another mentioned the label “2081” (top right of Figure 3) being close to the threshold (especially with the dot used for the “0” character with the respective font).

Regarding the SUS scores, the mean is 70.62 ± 9.41 (while the median is 73.75), interpretable as grade “C” or “OK.”⁷⁷ Due to small sample size and that the quantitative analysis not being our focus, we will not discuss any detailed results. However, it is interesting to note with respect to the more extreme answers, that one of the participants replied “Strongly agree” to the question “I think that I would need the support of a technical person to be able to use the Word Rain visualization technique,” while the other three replied “Disagree”. It is unclear if this depends on how “use” was interpreted.

When discussing the final thoughts and remarks at the end of session, the domain experts all expressed interest in the Word Rain visualization and had ideas for how the technique would be useful to apply on different types of textual data in their domain of interest.

Discussion

When applying the Word Rain visualization to the three tasks selected, we were able to draw conclusions about high-level semantic differences between genres, about corpus changes over time and to make suggestions for additions to a specialized glossary. The first task would not have been possible to carry out with a classic word cloud. For the other two tasks, it might have been possible, but much more difficult without the semantic positioning of words.

Our study also showed that domain experts were able to understand and use the Word Rain visualizations, and had ideas for how the technique would be useful to apply on different types of textual data within their expert domains. Our general impression of Word Rain is, therefore, that it is a useful visualization technique.

There were, however, also aspects of Word Rain that were difficult to understand without explanations and that led to misinterpretation or a feeling of the word rains not providing enough information. Therefore, there is still room for improvements of the visualization technique, as described above.

When designing such improvements, it is, however, important to keep in mind that the aim of the Word Rain technique is to provide an improved visualization that still fills the role of the classic word cloud. That is, a visualization that in an easily understandable manner represents the content of a text, while it still provides all its visualization features in the form of a static digital or printed image. To add interactive features, in addition to the possibility of zooming (which corresponds to a larger print in the non-digital word), would therefore not be in line with this aim. To address the limitations described above, we have therefore provided suggestions for improvement that would be possible to implement for a static visualization, while treating Word Rain as a technique rather than a complete interactive tool. Eventually considering the application of this technique in a progressively broader context, the possible next steps regarding user interaction would be to allow the user to generate these static images in an interactive fashion, or to be able to interactively choose which pairs of images to compare, or to use word rains alongside or instead of word clouds as part of InfoVis/VA tools with multiple coordinated views⁷⁹ (potentially supporting more basic interactions, such as pan & zoom or filtering,⁵⁷ but also adjusting configuration or encoding,⁸⁰ which could allow the users to select a color map suitable for their task, e.g. for further highlighting the differences alongside the x-axis to better support comparison⁸¹) – such applications, however, go well beyond the scope of this initial study and potentially beyond the typical use case scenarios for researchers in digital humanities and climate communication.

The text pre-processing (which might include the training of a word2vec-model when a pre-trained model is not used), and the t-SNE projection might be computed and cached once for the overall data set explored. Therefore, the computational aspects that need to be taken into account when rendering a word rain (essentially, step 6 in Algorithm 1) conceptually has a quadratic complexity with respect to the number of word labels, as each new label is checked for collisions with the previously laid out labels. This part of the implementation could be eventually optimized further, but it should be noted that it is currently used to produce static figures, as mentioned above, rather than being used directly within interactive tools.

Regarding the scalability⁸² of the visual representation, Word Rain does provide an overview of the most prominent words and can be used for further in-depth exploration of less prominent ones (especially in relation to the more prominent ones within the same semantic clusters), if the user wishes so. While the overlaps between the labels are avoided, a large number of labels could still lead to a certain amount of visual clutter due to the overall use of ink (both for labels and other marks such as lines), thus the ability to set a threshold to include the top $n$ most prominent words. The use of color as well as typographic attributes²⁵ could also facilitate specific scenarios.

Limitations of the evaluation/validation provided for our study can be attributed to the scope (case studies and domain expert reviews) and scale (three external experts interviewed). While our motivation for these choices is described in the respective sections, we do not claim that these efforts cover all of the relevant aspects of our novel visualization technique. Further empirical studies thus remain an important part of the future work, especially when considering generalizability of Word Rain for further text exploration tasks and corpora types.

Conclusion

We have here introduced the Word Rain technique, and thereby aspired to address two problems associated with the traditional word cloud: (i) that the word positioning lacks a semantic meaning; and (ii) that using font size as the only indication of word prominence might give the incorrect impression of longer words always being more important than shorter ones.

We have addressed the two problems by basing the horizontal word positioning on distributional semantics-based word similarity, and by adding two additional word prominence indicators that can be used alongside font size. That is, prominence indicators in the form of the vertical word positioning and in the form of a bar chart with bars that indicate word prominence.

The word positioning used by the Word Rain technique gives it an increased possibility (compared to a traditional word cloud) of conveying text content through the actual graph. Most importantly, text differences and text similarities beyond the level of the individual words can be revealed by the difference and similarities in the Word Rain visualization. We exemplified this by (i) contrasting two technical IPCC reports on concrete effects of climate change on nature with a report on human thought structures related to climate change, (ii) exploring temporal differences in a corpus of editorials on the topic of climate change, and (iii) investigating the coverage of a specialized glossary on an IPCC report.

In addition, the meaningful word positioning makes it easier for a user/reader to actively explore interesting areas of the graph by zooming in on regions with potentially interesting content. It is thereby useful to also include words in the graph that initially have a font size too small to be legible, but to give the user the possibility to zoom in on and further explore these words. Since the word positioning is based on semantics, these less prominent, and originally illegible, words can be located through their more prominent neighbors, which are displayed with a larger font. In a standard word cloud, without a semantically meaningful word positioning, such targeted explorations are not possible.

The results of our case studies and interviews with the researchers working with climate communication and climate adaptation texts are promising, while directions for future work on improving Word Rain, but also considerations that can be valuable for further research in text visualization, are also outlined in this work.

Supplemental Material

sj-pdf-1-ivi-10.1177_14738716241236188 – Supplemental material for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts

Supplemental material, sj-pdf-1-ivi-10.1177_14738716241236188 for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts by Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher and Matts Lindström in Information Visualization

Supplemental Material

sj-pdf-2-ivi-10.1177_14738716241236188 – Supplemental material for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts

Supplemental material, sj-pdf-2-ivi-10.1177_14738716241236188 for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts by Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher and Matts Lindström in Information Visualization

Supplemental Material

sj-pdf-3-ivi-10.1177_14738716241236188 – Supplemental material for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts

Supplemental material, sj-pdf-3-ivi-10.1177_14738716241236188 for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts by Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher and Matts Lindström in Information Visualization

Supplemental Material

sj-pdf-4-ivi-10.1177_14738716241236188 – Supplemental material for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts

Supplemental material, sj-pdf-4-ivi-10.1177_14738716241236188 for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts by Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher and Matts Lindström in Information Visualization

Supplemental Material

sj-pdf-5-ivi-10.1177_14738716241236188 – Supplemental material for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts

Supplemental material, sj-pdf-5-ivi-10.1177_14738716241236188 for From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts by Maria Skeppstedt, Magnus Ahltorp, Kostiantyn Kucher and Matts Lindström in Information Visualization

Footnotes

Acknowledgements

The authors would like to express their gratitude to the domain experts for participating in the evaluation sessions. Additionally, the authors would like to thank the anonymous reviewers and the associate editor for their valuable feedback, as well as the Applied CompLing Discourse Research Lab at University of Postdam for the possibility to produce word rains using their editorial corpus. This work has also benefited from the Dagstuhl Seminar 22191 “Visual Text Analytics,” where the first and third author participated.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article, i.e. Word Rain has been developed with funding from three research infrastructures:

•Huminfra: National infrastructure for Research in the Humanities and Social Sciences (Swedish Research Council, 2021-00176)

•InfraVis: the Swedish National Research Infrastructure for Data Visualization (Swedish Research Council, 2021-00181)

•Nationella Språkbanken: The National Language Bank of Sweden (Swedish Research Council, 2017-00626)

ORCID iDs

Maria Skeppstedt

Magnus Ahltorp

Kostiantyn Kucher

Supplemental material

Supplemental material for this article is available online.

References

Viégas

Wattenberg

. Tag clouds and the case for vernacular visualization. Interactions 2008; 15(4): 49–52.

Barth

Kobourov

Pupyrev

. Experimental comparison of semantic word clouds. In: Gudmundsson

Katajainen

(eds) Experimental algorithms. Cham, Switzerland: Springer International Publishing, 2014, pp.247–258.

Kucher

Kerren

. Text visualization techniques: Taxonomy, visual survey, and community insights. In: Proceedings of the IEEE Pacific visualization symposium (PacificVis ‘15), Hangzhou, 14-17 April 2015, pp.117–121. New York: IEEE. DOI: 10.1109/PACIFICVIS. 2015.7156366.

Cao

Cui

. Introduction to text visualization, Atlantis briefs in Artificial Intelligence. Vol. 1. Paris, France: Atlantis Press, 2016.

Alharbi

Laramee

. SoS TextVis: an extended survey of surveys on text visualization. Computers 2019; 8(1): 17.

Milgram

Jodelet

. Psychological maps of Paris. In: Proshansky

Ittelson

Rivlin

(eds) Environmental psychology: People and their physical settings. 2nd ed. New York, NY, USA: Holt, Rinehart, and Winston, 1976, pp.104–124.

Deleuze

Guattari

. Tausend Plateaus: Kapitalismus und Schizophrenie. Berlin, Germany: Merve-Verlag, 1992.

Hearst

Rosner

. Tag clouds: Data analysis tool or social signaller. In: Proceedings of the Hawaii international conference on system sciences (HICCS ‘08), Waikoloa, HI, 7–10 January 2008, pp.160–160. New York: IEEE. DOI: 10.1109/HICSS.2008.422.

Viégas

Wattenberg

Feinberg

. Participatory visualization with Wordle. IEEE Trans Vis Comput Graph 2009; 15(6): 1137–1144.

10.

Rivadeneira

Gruen

Muller

, et al. Getting our head in the clouds: Toward evaluation studies of tagclouds. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ‘07), pp.995–998. New York: Association for Computing Machinery. DOI: 10.1145/1240624.1240775.

11.

Felix

Franconeri

Bertini

. Taking word clouds apart: an empirical investigation of the design space for keyword summaries. IEEE Trans Vis Comput Graph 2018; 24(1): 657–666.

12.

Alexander

Chang

Shimabukuro

, et al. Perceptual biases in font size as a data encoding. IEEE Trans Vis Comput Graph 2018; 24(8): 2397–2410.

13.

Hearst

Pedersen

Patil

, et al. An evaluation of semantically grouped word cloud designs. IEEE Trans Vis Comput Graph 2020; 26(9): 2748–2761.

14.

Hassan-Montero

Herrero-Solana

. Improving tag-clouds as visual information retrieval interfaces. In: Proceedings of the 2006 international conference on multidisciplinary information sciences and technologies (InSciT2006). Merida, Spain: University of Extremadura.

15.

Provan

Wei

, et al. Semantic-preserving word clouds by seam carving. Comput Graph Forum 2011; 30(3): 741–750.

16.

Espadoto

Martins

Kerren

, et al. Toward a quantitative survey of dimension reduction techniques. IEEE Trans Vis Comput Graph 2021; 27(3): 2153–2173.

17.

Schubert

Spitz

Weiler

, et al. Semantic word clouds with background corpus normalization and t-distributed stochastic neighbor embedding. ArXiv 2017; abs/1708.03569.

18.

Gambette

Véronis

. Visualising a text with a tree cloud. In: Locarek-Junge H and Weihs C (eds.) Classification as a tool for research. Berlin; Heidelberg: Springer, 2010, pp.561–570.

19.

Burch

Lohmann

Pompe

, et al. Prefix tag clouds. In: Proceedings of the international conference on information visualisation (IV ’13), London, 16–18 July 2013, pp.45–50. New York: IEEE. DOI: 10.1109/IV.2013. 5.

20.

Seifert

Kump

Kienreich

, et al. On the beauty and usability of tag clouds. In: Proceedings of the international conference information visualisation (IV ‘08), London, 9–11 July 2008, pp.17–25. New York: IEEE. DOI: 10.1109/IV.2008.89.

21.

Chi

Lin

Chen

, et al. Morphable word clouds for time-varying text data visualization. IEEE Trans Vis Comput Graph 2015; 21(12): 1415–1426.

22.

Feng

Gao

Karras

. Towards semantically aware word cloud shape generation. In: Adjunct proceedings of the 35th annual ACM symposium on user interface software and technology (UIST ‘22). New York, NY, USA: Adjunct, Association for Computing Machinery. DOI: 10.1145/3526114.3558724.

23.

Koh

Lee

Kim

, et al. ManiWordle: providing flexible control over Wordle. IEEE Trans Vis Comput Graph 2010; 16(6): 1190–1197.

24.

Wang

Chu

Bao

, et al. EdWordle: consistency-preserving word cloud editing. IEEE Trans Vis Comput Graph 2018; 24(1): 647–656.

25.

Brath

Banissi

. Using typography to expand the design space of data visualization. She Ji J Des Econ Innov 2016; 2(1): 59–87.

26.

Goffin

Willett

Fekete

, et al. Exploring the placement and design of word-scale visualizations. IEEE Trans Vis Comput Graph 2014; 20(12): 2291–2300.

27.

Beck

Weiskopf

. Word-sized graphics for scientific texts. IEEE Trans Vis Comput Graph 2017; 23(6): 1576–1587.

28.

Baumer

EPS

Snyder

Gay

. Interpretive impacts of text visualization: mitigating political framing effects. ACM Trans Comput Hum Interact 2018; 25(4): 1–26.

29.

Kim

Elmqvist

, et al. Word Bridge: Using composite tag clouds in node-link diagrams for visualizing content and relations in text corpora. In: Proceedings of the Hawaii international conference on system sciences (HICSS ‘11), Kauai, HI, 4–7 January 2011. New York: IEEE. DOI: 10.1109/HICSS.2011.499.

30.

Paulovich

Toledo

FMB

Telles

, et al. Semantic wordification of document collections. Comput Graph Forum 2012; 31(3 pt 3): 1145–1153.

31.

Paul

Chang

Endert

, et al. TexTonic: Interactive visualization for exploration and discovery of very large text collections. Inf Vis 2019; 18(3): 339–356.

32.

Diakopoulos

Elgesem

Salway

, et al. Compare Clouds: Visualizing text corpora to compare media frames. In: Proceedings of the 2015 IUI Workshop on visual text analytics. Oshawa, Canada: Ontario Tech University.

33.

Collins

Viégas

Wattenberg

. Parallel tag clouds to explore and analyze faceted text corpora. In: Proceedings of the 2009 IEEE symposium on visual analytics science and technology (VAST ‘09), Atlantic City, NJ, 12–13 October 2009, pp.91–98. New York: IEEE. DOI: 10.1109/VAST. 2009.5333443.

34.

Lohmann

Burch

Schmauder

, et al. Visual analysis of microblog content using time-varying co-occurrence highlighting in tag clouds. In: Proceedings of the international working conference on advanced visual interfaces (AVI ‘12), pp.753–756. New York: Association for Computing Machinery. DOI: 10.1145/2254556.2254701.

35.

Lee

Riche

Karlson

, et al. SparkClouds: Visualizing trends in tag clouds. IEEE Trans Vis Comput Graph 2010; 16(6): 1182–1189.

36.

Wang

Dent

North

. Fisheye word cloud for temporal sentiment exploration. In: CHI ‘13 extended abstracts on human factors in computing systems(CHI EA ‘13), pp.1767–1772. New York: Association for Computing Machinery. DOI: 10.1145/2468356.2468673.

37.

Liu

Shen

. Supporting multifaceted viewing of word clouds with focus+context display. Inf Vis 2015; 14(2): 168–180.

38.

Heimerl

Lohmann

Lange

, et al. Word Cloud Explorer: text analytics based on word clouds. In: Proceedings of the Hawaii international conference on system sciences (HICSS ‘14), Waikoloa, HI, 6–9 January 2014, pp.1833–1842. New York: IEEE. DOI: 10.1109/HICSS.2014. 231.

39.

Burch

Lohmann

Beck

, et al. RadCloud: Visualizing multiple texts with merged word clouds. In: Proceedings of the international conference on information visualization (IV ‘08), Paris, 16–18 July 2014, pp.108–113. New York: IEEE. DOI: 10.1109/IV.2014.72.

40.

John

Marbach

Lohmann

, et al. MultiCloud: interactive word cloud visualization for the analysis of multiple texts. In: Proceedings of the graphics interface conference (GI ‘18), pp.34–41. Canadian Human-Computer Communications Society. DOI: 10.20380/GI2018.06.

41.

Knittel

Koch

Ertl

. PyramidTags: context-, time- and word order-aware tag maps to explore large document collections. IEEE Trans Vis Comput Graph 2021; 27(12): 4455–4468.

42.

Shahaf

Guestrin

Horvitz

. Trains of thought: Generating information maps. In: Proceedings of the 21st international conference on world wide web (WWW ‘12), pp.899–908. New York: Association for Computing Machinery. DOI: 10.1145/2187836.2187957.

43.

Cao

Sun

Lin

, et al. FacetAtlas: multifaceted visualization for rich text corpora. IEEE Trans Vis Comput Graph 2010; 16(6): 1172–1181.

44.

Jatowt

Tahmasebi

Borin

, et al. Computational approaches to lexical semantic change: visualization systems and novel applications. In: Tahmasebi

Borin

Jatowt

(eds) Computational approaches to semantic change. Berlin, Germany: Language Science Press, 2021, pp.311–339.

45.

Baumer

EPS

Jasim

Sarvghad

, et al. Of course it’s political! A critical inquiry into underemphasized dimensions in civic text visualization. Comput Graph Forum 2022; 41(3): 1–14.

46.

Russell

. Simple is good: observations of visualization use amongst the Big Data digerati. In: Proceedings of the international working conference on advanced visual interfaces (AVI ‘16), pp.7–12. New York: ACM. DOI: 10.1145/2909132. 2933287.

47.

Börner

Bueckle

Ginda

. Data visualization literacy: definitions, conceptual frameworks, exercises, and assessments. Proc Natl Acad Sci 2019; 116(6): 1857–1864.

48.

Chen

Lin

Yuan

. Social media visual analytics. Comput Graph Forum 2017; 36(3): 563–587.

49.

Jänicke

Franzini

Cheema

, et al. Visual text analysis in digital humanities. Comput Graph Forum 2017; 36(6): 226–250.

50.

Sinclair

Rockwell

. Teaching computer-assisted text analysis: approaches to learning new methodologies. In: Hirsch

(ed.) Digital humanities pedagogy: Practices, principles, and politics. Open Book Publishers, 2012. https://doi.org/10.11647/OBP.0024.11

51.

Sinclair

Rockwell

. Text analysis and visualization: making meaning count. In: Schreibman

Siemens

Unsworth

(eds) A new companion to digital humanities. Hoboken, New Jersey, USA: John Wiley & Sons, 2015, pp.274–290.

52.

Mikolov

. Distributed representations of sentences and documents. In: Proceedings of the international conference on machine learning (ICML ‘14), pp.1188–1196. PMLR. https://proceedings.mlr.press/

53.

van der Maaten

Hinton

. Visualizing data using t-SNE. J Mach Learn Res 2008; 9: 2579–2605.

54.

Spärck Jones

. IDF term weighting and IR research lessons. J Doc 2004; 60(5): 521–523.

55.

Cuba Gyllensten

. Quantifying meaning. PhD Thesis, KTH Royal Institute of Technology, 2023.

56.

Bengio

Ducharme

Vincent

, et al. A neural probabilistic language model. J Mach Learn Res 2003; 3: 1137–1155.

57.

Shneiderman

. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the IEEE Symposium on visual languages (VL ‘96), Boulder, CO, 3–6 September 1996, pp.336–343. New York: IEEE. DOI: 10.1109/VL.1996.545307.

58.

Liu

Heer

. Somewhere over the rainbow: an empirical assessment of quantitative colormaps. In: Proceedings of the 2018 CHI conference on human factors in computing systems (CHI ‘18), pp.1–12. New York: Association for Computer Machinery. DOI: 10.1145/3173574.3174172.

59.

Manning

Schütze

. Foundations of statistical natural language processing. Cambridge, MA, USA: MIT Press, 1999.

60.

Hunter

. Matplotlib: A 2D graphics environment. Comput Sci Eng 2007; 9(3): 90–95.

61.

Skeppstedt

Ahltorp

Kucher

, et al. Topic modelling applied to a second language: a language adaptation and tool evaluation study. In: Selected papers from the CLARIN annual conference 2019, volume 172:17. Linköping Electronic Conference Proceedings, pp. 145–156. Linköping, Sweden: Linköping University Electronic Press.

62.

Masson-Delmotte

Zhai

Pirani

, et al. IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Technical Summary. Cambridge: Cambridge University Press, 2021.

63.

BNC

Consortium

(ed.). British national corpus. Baby ed. Literary and Linguistic Data Service, 2007. http://hdl.handle.net/20.500.14106/2553

64.

Bird

. NLTK: The natural language toolkit. In: Proceedings of the ACL workshop on effective tools and methodologies for teaching natural language processing and computational linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.

65.

Shukla

Skea

Buendia

, et al. IPCC, 2019: summary for policymakers. In: Climate change and land: an IPCC special report on climate change, desertification, land degradation, sustainable land management, food security, and greenhouse gas fluxes in terrestrial ecosystems (Unofficial translation into Swedish by the Swedish Meteorological and Hydrological Institute). Technical Summary. Cambridge: Cambridge University Press, 2019.

66.

Pörtner

Roberts

Masson-Delmotte

, et al. IPCC, 2019: summary for policymakers. In: IPCC special report on the ocean and cryosphere in a changing climate (unofficial translation into Swedish by the Swedish Meteorological and Hydrological Institute). Technical Summary. Cambridge: Cambridge University Press, 2019.

67.

Wolrath Söderberg

. Tankestrukturer som hindrar omställning - och hur vi kan överkomma dem (Thought structures that hinder climate change mitigation and how to overcome them). Stockholm: Miljömålsberedningen, 2021.

68.

Hulme

Obermeister

Randalls

, et al. Framing the challenge of climate change in Nature and Science editorials. Nat Clim Change 2018; 8: 515–521.

69.

Stede

Bracke

Borec

, et al. Framing climate change in Nature and Science editorials: applications of supervised and unsupervised text categorization. J Comput Soc Sci 2023; 6: 485–513.

70.

Lam

Bertini

Isenberg

, et al. Empirical studies in information visualization: seven scenarios. IEEE Trans Vis Comput Graph 2012; 18(9): 1520–1536.

71.

Isenberg

Chen

, et al. A systematic review on the practice of evaluating visualization. IEEE Trans Vis Comput Graph 2013; 19(12): 2818–2827.

72.

Wong

Madhavan

Elmqvist

. Towards characterizing domain experts as a user group. In: Proceedings of the IEEE workshop on evaluation and beyond – methodological approaches to visualization (BELIV ‘18), Berlin, 21 October 2018. New York: IEEE. DOI: 10.1109/BELIV.2018.8634026.

73.

Matkovíc

Wischgoll

Laidlaw

. Empirical evaluations with domain experts. In: Chen M, Hauser H, Rheingans P and Scheuermann G (eds.) Foundations of data visualization. Cham, Switzerland: Springer International Publishing, 2020, pp.181–194.

74.

Crispen

Hoffman

. How many experts? IEEE Intell Syst 2016; 31(6): 56–62.

75.

Ribes

. How I learned what a domain was. Proc ACM Hum Interact 2019; 3(CSCW): 1–12.

76.

Brooke

. SUS: A “quick and dirty” usability scale. In: Jordan PW, Thomas B, McClelland IL, Weerdmeester B (eds.) Usability evaluation in industry. Boca Raton, FL, USA: CRC Press, 1996, pp.189–194.

77.

Blattgerste

Behrends

Pfeiffer

. A web-based analysis toolkit for the system usability scale. In: Proceedings of the 15th international conference on pervasive technologies related to assistive environments (PETRA ‘22), pp.237–246. New York: Association for Computing Machinery. DOI: 10.1145/3529190. 3529216.

78.

Brooke

. SUS: a retrospective. J Usability Stud 2013; 8(2): 29–40.

79.

Roberts

. State of the art: Coordinated & multiple views in exploratory visualization. In: Proceedings of the fifth international conference on coordinated and multiple views in exploratory visualization (CMV 2007), Zurich, 2 July 2007, pp.61–71. New York: IEEE. DOI: 10.1109/CMV.2007.20.

80.

Kang

Stasko

, et al. Toward a deeper understanding of the role of interaction in information visualization. IEEE Trans Vis Comput Graph 2007; 13(6): 1224–1231.

81.

Gleicher

Albers

Walker

, et al. Visual comparison for information visualization. Inf Vis 2011; 10(4): 289–309.

82.

Richer

Pister

Abdelaal

, et al. Scalability in visualization. IEEE Trans Vis Comput Graph 2022. DOI: 10.1109/TVCG.2022.3231230

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.62 MB

1.83 MB

16.17 MB

0.14 MB

0.72 MB