iClassifier : A digital research tool for corpus-based classifier networks in complex writing systems

Abstract

This article presents the method applied by the iClassifier (©Goldwasser/Harel/Nikolaev) digital research tool for the study of the linguistic phenomenon of classifiers. The tool was created in 2019 with the objective of curating corpus-based and data-driven documentation of classifier systems. The record of classifiers comprises millions of tokens worth of “big data” analysis. By tagging classifiers in various corpora, a topography of categories emerges, visualized as complex, multilayered networks. This article offers an overview of how classifier-based networks are created and how network analysis methods can be applied to analyze knowledge organization. We present the data structure and annotation scheme of the iClassifier research tool, demonstrating how one can plot classifier networks and generate reports of lemma and classifier repertoires in each corpus. The iClassifier tool provides quantitative reports, including classifier frequency, variation and co-occurrence statistics. Each data subset, such as a certain part of speech, timespan, geographical location or textual genre, can be queried and visualized. The tool is meant to allow browsing between a macro-overview of all categories in a corpus and zooming in into micro-analysis of the individual categories and lemmas that built up a corpus. Each classifier is seen as a category head, and the categories are drawn in their multilayered and multidimensional relationships. The potency of this tool is in documenting the phenomenon in large corpora of texts and expanding our knowledge about the rules and functions of classifier systems, leading us to a more refined mind-mapping of ancient cultures. Currently, very little systematic analysis has been done on this ancient record of emic information.

Keywords

Classifier studies digital humanities lexical semantics network analysis

1. Introduction

1.1 Classifier lists and classifier categories

The repertoire of graphemic classifiers has been known mostly from lists of graphemes called “determinatives” in ancient scripts appearing in grammar books. An example is the case of the ancient Egyptian script and its “generic determinatives” list compiled by Gardiner (1957: 31–33), which is still used as a standard reference for Classifier Meanings (CMs).¹ Such lists assign each classifier a general overarching meaning label that does not always fully represent the semantic profile of the sign in its use as a classifier. Information on the range of lemmas it was attested with, in what combinations it appeared, or how its usage has evolved (diachrony) has remained unanswered.

Recent years have seen efforts to reevaluate the semantic scope of classifier categories in Egyptology. A digital sign list of hieroglyphs published as the Thot Sign List in a digital format arranges the Egyptian characters by their functions, including the function “classifier.”² Furthermore, attempts have been made to refine the list of categories and arrange the vocabulary by classifier categories and listing a variety of lemmas occurring with a classifier (e.g., Winand and Stella, 2013, for the MK and 18th Dynasty), or to map all classifiers occurring with a lemma (e.g., Werning, 2011: 323–325). Recently, Goldwasser and Soler (2024) have created a classifier list based on a study by iClassifier on the different versions of The Story of Sinuhe. The iClassifier³ research tool enables quantitative research on classifiers and creates classifier lists that are data-driven and corpus-based.⁴

The purpose of this article is to propose a dynamic and detailed record of classifier categories. Upon tagging all classifiers in a corpus, we can access their semantic range and analyze the classification patterns of host lemmas, such as tracking the stability of classification patterns, both synchronically and diachronically, and the combinational aspects of multiple classifications. With iClassifier, we also expose the incompatibility of categories in a single classifier combination, that is, incompatible categories that do not co-occur. In this study, we apply the method to three ancient complex scripts: ancient Egyptian, ancient Chinese and Sumerian. A classifier list was published recently for Sumerian (the “Selz consolidated list,” see Selz, 2021; Selz et al., 2017: 289–299). The term semantic classifier has been recently used to refer to graphemic classifiers in the ancient Chinese script (Goldwasser and Handel, 2024; Handel, 2023) and a new corpus-based list of classifiers for ancient Chinese in Guodian texts was created by Xu (2024).

1.2 From classifier lists to classifier networks

This article presents a typological approach to documenting and analyzing classifier systems with the iClassifier research tool. We create classifier lists presented not only in a list format but also as networks based on different datasets. We collect data in Egyptian and Chinese bottom-up, adding individual texts and marking all classifiers within them. In Sumerian, the current dataset is curated top-down by digitizing all orthographies with classifiers marked as such by the compilers of the ePSD2 Sumerian dictionary (see Selz, 2021; Selz and Zhang, 2024).⁵ Each project’s results are automatically generated as a classifier list based on tagging the examples in the iClassifier input system (Section 3). We conduct classifier analysis by tagging classifiers and marking each instance as semantic, derivational/grammatical or phonetic classification (described in Section 3.2). This method allows one to plot all classifiers or only a certain type of classifier—for example, access only semantic or phonetic classifiers. With the digital tool, users can study the reports of individual classifiers and lemmas appearing in the dataset. Several examples are provided in the following sections.

2. Classifier networks: Visualizing innate semantic topographies of categorization in ancient vocabularies

The classifier repertoire of each dataset can be visualized as a network, exposing a macro-outlook of the classified vocabulary. The semantic scope of each category is visible and connections between categories surface. A first attempt to analyze classifiers in an original corpus was recently performed by Goldwasser and Soler (2024) on the various manuscripts from different periods of an Egyptian canonical literary text, The Story of Sinuhe. Xu analyzed an original corpus of texts from the Warring States period (see Xu, 2024). Another analysis of an exemplary Egyptian text is presented in this article.⁶

2.1 iClassifier, the new “network view” of classifier systems in ancient Egyptian

The network presented in Figure 1(a) reveals to us at one glance all classifier categories and their host lemmas as they are manifested in an actual text. The network depicts the classifier repertoire of an ancient Egyptian short tale written in cursive hieratic script on papyrus and named in modern scholarship The Tale of The Doomed Prince (Doomed Prince).⁷ This text consists of 1640 words (=tokens) and dates to the New Kingdom, probably the first part of the 19th Dynasty (c. 1300–1200 bce). In our classifier-based networks, one can identify two types of nodes structuring the network: classifier nodes represented by the original hieroglyphic character, and round lemma nodes (e.g., Figure 1(b)). In network analysis terms, the lines connecting lemmas and their classifiers and the lines between co-occurring classifiers are called edges. Blue edges connect each classifier to all the lemmas (vocabulary items) it classifies. These are the data points we map: how words are associated with classifier categories. A lemma node can be connected to one or more classifier nodes, marking on its attested classification patterns. Instances without classification are also recorded in the project’s reports.

Figure 1.

(a) A classifier-based network visualization (right) and a classifier frequency key (left) in The Tale of The Doomed Prince (p.Harris 500). Data imported from TLA. ©iClassifier, Haleli Harel. (b) A detailed view of the classifier-based network of The Tale of The Doomed Prince. ©iClassifier, Haleli Harel. (c) A classifier-based network of all verbal lexemes in The Tale of The Doomed Prince and a classifier frequency key. The text consists of 1640 tokens, of which 256 examples are of verbs and 144/256 are written with classifiers (56%). ©iClassifier, Haleli Harel.

Red edges are links between classifiers that co-occur with a particular lemma. In the selected text (Figure 1(a)), we can identify that the most connected and attested semantic categories are [default/abstract],⁸ [motion], [man], [senses & emotions], [action], [divine] and [sun/time]. Edge width indicates how many tokens (=unique examples) exist for a certain connection. In the network’s periphery, some categories appear as disconnected islands. They occur solely with one or several lexical items that do not portray classifier variation or co-occurrence—for example, the [eye] category on the top left of Figure 1(a). This category occurs with six lemmas in the corpus, yet appears as a single classifier, and has no patterns of combinations with other categories, as can be seen in the visualization. While taking a closer look at the Egyptian Doomed Prince's network structure (Figure 1(a)), we observe an interconnected network with many red edges, representing multiple classifications. Each classifier is listed in the frequency key of Figure 1(a) by the number of its host lemmas (light blue) and the number of tokens (examples) it has (dark blue).

Zooming into the image (Figure 1(b)), we can examine closely such instances of multiple classifications. The hero of the story is an unnamed young prince identified as šrj “lad.”⁹ This lemma connects the generic [man] category with the [small] category, connected in red edges.¹⁰

One can identify the [woman] classifier on the upper right side of Figure 1(b), another well-attested category in this text. The hero falls in love with the daughter of a foreign ruler, šrj.t “young woman” in the text.¹¹ The high number of attestations of this lemma is indicated by the thick width of the blue edges linked to this noun. The noun portrays double classification and is linked in a red edge to two classifiers, [woman] and [small].¹² One can learn from the network that the lemma has many occurrences and a stable double classification. To the left, the [motion] category depicted by the “moving legs” glyph occurs with various verbs of motion and adverbs such as r-bnr “outside,” portraying double classification with the [road] classifier. Compare a similar classification in Chinese (Xu, 2024). The link between the [motion] category and the [man] category is established by co-occurrence with the deverbal noun znn “chariot solider,” written with two classifiers after the hieroglyphs representing the phonological information – znn .¹³ Here, the classifier [man] functions not only as a generic taxonomic classifier but also as a nominalizer.¹⁴

Tagging all classifiers in the text, we learn that most nouns are classified (199/277, 72%), more than half of the verbs occur with classifiers (144/256, 56%), but adjectives are rarely classified (8/82, 10%).¹⁵ Half of the adverbs appear with classifiers (13/24, 54%). The classification rate of grammatical elements is low, and rarely such elements are classified—for example, 2% of the prepositions (2/101) and 2% of the particles (4/185).¹⁶

It is possible to query the network further using various variables that the scholar annotated in their text analysis—for example, by a selected part of speech (POS). One can draw a network of all POS types or query a specific POS, such as a network of verbs. In the network in Figure 1(c), all classified verb forms in the text are plotted and arranged around their classifiers.¹⁷

Figure 1(c) shows the concentration of classified verbal lexemes¹⁸ around five categories (or “mental hubs,” see Goldwasser, in press).¹⁹ In the network’s key, the most common verb classifier, [motion], is the most commonly occurring, as can be seen from the first line of the token count (colored dark blue). The other prominent “hubs” in the network of verbs are the [default/abstract] (17 lexemes), [action]20 (10 lexemes) and [senses & emotions]²¹ (8 lexemes). In this case study, 9% of the verbs are not fully preserved. Out of all tokens (=examples), only 56% of the verbal forms written in this specific text have classifiers, and 35% of the tokens are not classified. The most common non-classified verbs are lemmas ḏd “to say” (25 attestations) and jri̯ “to do” (10 attestations).²² Such common verbs are regularly written without classifiers in cursive and monumental hieroglyphic scripts.²³

2.2 Comparing classifier-based networks, Chinese and Sumerian

With the guidelines described above for ancient Egyptian, a pilot project of the ancient Chinese iClassifier was created. The selected text corpus is of ancient Chinese philosophical texts and written on bamboo strips from the Warring States period (∼400–200 bce; for the presentation of the corpus, see Xu, 2024). The result of classifier tagging in this corpus is visible in Figure 2.

Figure 2.

A classifier-based network of a corpus of Ancient Chinese philosophical texts (Guodian texts).  Left: Key of the number of lemmas and tokens that occur with each classifier—for example, the classifier 心 [heart] occurs with 54 lemmas and 409 tokens. ©iClassifier, Yanru Xu. The frequency key illustrates the number of lemmas and tokens per classifier. Data courtesy of The Intelligent Retrieval Network Database of Chinese Characters (East China Normal University (ECNU), Shanghai).

The network in Figure 2 is based on a corpus of Guodian bamboo manuscripts containing 12,092 tokens, of 405 lemmas. A total of 2251 tokens are classified by 99 semantic classifiers (Xu, 2024). In the classifier-based network of this text corpus in ancient Chinese, the central categories are [heart] and [movement] (Xu, 2024). The [heart] category in Chinese is somewhat analogous, in scope to the category [senses & emotion] in Egyptian, which shows a similar centrality effect (see above and Goldwasser and Soler, 2024). Another set of comparable categories can be identified in the classifier , labeled as [movement] or lit. [moving legs] + [road] and the Egyptian category [motion] in Figures 1(a)–(c). Even though the Egyptian and ancient Chinese scripts are two complex writing systems, far removed in place and time, these two similar categories show the same centrality effect in their respective classifier systems. Nevertheless, one should keep in mind that the Egyptian Doomed Prince (Figures 1(a)–(c)) was written around a millennium before the Chinese philosophical compositions of the Guodian corpus (Figure 2).

A third classifier system digitized using the iClassifier tool is the cuneiform system used to transcribe the Sumerian language. The Sumerian network (Figure 3(a)) illustrates the 46 semantic classifiers attested in this ancient script based on the digitization of script forms in the Sumerian ePSD2 dictionary.²⁴ The Sumerian system differs from the Chinese and Egyptian systems as it is strictly a noun classifier system, while Egyptian and Chinese scripts display the classification of nouns, verbs and other parts of speech. In the current pilot datasets, the Chinese (Figure 2) and Egyptian (Figure 1(a)) networks depict the lexical inventory of specific texts, while the Sumerian network paints a macro image of all classifiers and all classified words in the Sumerian lemma list (for an explanation of the project, see Selz and Zhang, 2024). The Sumerian network contains 46 classifiers and 2859 lemmas occurring in 9497 script variations. The largest categories in this network are [divinity], [wood/tree], [place], [bird] and [stone].²⁵

Figure 3.

(a) A classifier-based network of Sumerian, according to the uppercase marking of classifiers in the ePSD2 dictionary after Selz, 2021; Selz and Zhang, 2024. The frequency key illustrates the number of lemmas per classifier. This network image includes 2859 lemmas and 9497 written forms, including proper names (June 2023). This list includes only one instance of each script variation (see n. 5). ©iClassifier, Bo Zhang and Gebhard Selz. (b) The classifiers of the Sumerian vocabulary, sorted by the number of lemmas per classifier. ©iClassifier (after Selz and Zhang, 2024, Figure 1).

2.3 How do classifier-based networks distribute?

The analysis of various classifier systems led to the identification of certain distribution patterns between the corpora (Harel, in press). For example, arranging all categories in Figure 3(a) according to the number of lemmas they occur have shown a “long-tailed” distribution (Figure 3(b)).²⁶

Figure 4 shows that the network of classifiers in the Guodian texts corpus compiled by Xu (2024) also distributes long-tail, including a large number of classifiers occurring with one single host.

Figure 4.

The classifiers of an ancient Chinese corpus of philosophical texts, sorted by the number of lemmas per classifier. ©iClassifier (after Xu, 2024, Figure 1).

Similar results emerged in a pilot study of the Middle Kingdom and New Kingdom hieratic texts of The Tale of Sinuhe (Goldwasser and Soler, 2024). A more balanced distribution emerged in the Doomed Prince classifier repertoire discussed above (Figure 5 representing Figure 1(a)). Here, the classifier list includes 51 classifiers. As compared with other earlier Egyptian texts, the Doomed Prince contains fewer classifiers since it is a short tale and only survives in one manuscript. It also has repetitive vocabulary and thus a low lexical density.²⁷ One identifies more medium-sized classifier categories in this single text, and a lighter “tail,” as the number of categories in this specific text is smaller than in other Egyptian texts (e.g., 93 classifiers in The Story of Sinuhe corpus). Although the categories and their distribution are different, the core group of central categories reoccurs in the range of texts studied so far in Egyptian, led by the categories [default/abstract], [motion], [man], [senses & emotions] and [action].

Figure 5.

The classifiers of The Tale of The Doomed Prince (p.Harris 500). Data imported from TLA. ©iClassifier, Haleli Harel.

Pioneering research on the diachrony of classification in Egyptian texts pointed to a reduced number of classifiers in New Kingdom hieratic.²⁸ On a general note, texts in the same language may differ to a certain extent in their semantic classifier-based networks due to various variables: for example, genre, script type, language phase, or type of support and materiality, among other variables. Still, the structure of the Sumerian (Figure 3(b)), Chinese (Figure 4) and Egyptian (Figure 5) corpus-based classifier networks testifies that various networks share a distribution of main “super” categories, some medium-sized, and many marginal and smaller categories. As we expand our data poll, we will be able to accurately describe these tendencies across data samples and draw the distribution of categories as well as the evolution of individual classifier categories more accurately.

2.4 “Community detection” of classifier networks

Next, we inquire how the classifier repertoire in a dataset forms a complex semantic space. The data can be exploited by algorithmic methods such as community detection²⁹ to visualize its inner hierarchies. In Figures 6(a) and (b), the same network we previously presented in Figures 1(a)–(c) is visualized by sorting the data according to a community detection algorithm (using the software Gephi). The nodes are grouped into communities by their proximity of co-occurrence.³⁰ “A community is a densely connected subset of nodes that is only sparsely linked to the remaining network” (Gulbahce and Lehmann, 2008: 3). Applying community detection algorithmic methods, 51 semantic classifier categories are arranged into defined clusters. In many cases, these clusters are constructed of at least three classifiers. This clustering allows us to access which categories are interrelated based on classifier combination or variance. The network visualization in Figure 6(a) and (b) is merely an experimental investigation into the data. This method is being calibrated while we work on larger data from various corpora, aiming to use it to expose noteworthy connections and semantic structures emerging from the collected data. As shown in Figure 6(a), a community detection algorithm sorted the network presented in Figure 1 into clusters by color.

Figure 6.

(a) A network of classifier categories of The Tale of The Doomed Prince. Arranged by a community detection algorithm, including “island categories.” ©iClassifier, Haleli Harel. (b) A detailed view of a network of classifier categories of The Tale of The Doomed Prince. Arranged by a community detection algorithm. ©iClassifier, Haleli Harel.

In the network represented in Figure 6(a), we see central and well-attested categories, and around them, some disconnected “islands” colored gray, visualizing smaller categories that do not have co-occurrence or interchangeability with other categories. Due to our limited space, we discuss here only a few examples. Some of the central and well-connected clusters can be seen in the magnified detail in Figure 6(b).

A first central cluster groups actions of motion, most of which are instances of verbs of motion, regularly written with the classifiers [motion] and [activities of the leg]. This cluster also includes the [road] classifier (Figure 6(b)), as exemplified in Section 2.1.

A smaller cluster colored in turquoise on the left (Figure 6(b)) is connected to the [motion] cluster. The connection to the green [motion] cluster is through a lemma ktkt “to quiver,” classified with both × [break] and [motion] classifiers.³¹ This cluster, [action], features a central category surrounded by several co-occurring categories—for example, the [upside-down boat] on the right side of cluster. In the Doomed Prince, all lemmas classified with [action] are associated to another category and have multiple classifications—for example, the combination of [break] and [action] classifiers.

In order to document the classification combinations and patterns for each classifier, we generate detailed reports for each lemma and each classifier. An example is the report for this specific classifier, the taxonomic category [action] in the Doomed Prince (Figure 7(a)).

Figure 7.

(a) The [action] (D40) classifier in The Tale of The Doomed Prince. ©iClassifier, Haleli Harel. (b) The [sun] classifier in The Tale of The Doomed Prince. ©iClassifier, Haleli Harel. The 日[sun/time] classifier in the ancient Chinese corpus of Guodian texts. ©iClassifier, Yanru Xu. © Data courtesy of The Intelligent Retrieval Network Database of Chinese Characters (East China Normal University (ECNU), Shanghai).

According to the results, in this text, the classifier [action] was attested with five different words and occurred in combinations with additional categories, conveying specific types of actions. All combinations are listed in Figure 7(a). The combinations in this data subset are with [embracing-arms] in qnj “to embrace,” and with [knife] in the verbal lexemes ẖdb “to kill” smꜣ “to slay” and fdq “to slice.” The classifier of a turned-over boat [upside-down boat] appears as a repeater classifier in pnꜥ “turned upside down.” This combination occurs in a verb describing how the prince’s wife intoxicates a snake to protect the hero from its deadly bite. It features the use of a metaphor pnꜥ “to be like an overturned boat” in the language. The snake is upside down “as an overturned boat, i.e., out of function.”³²

Another central cluster is connecting the [man], [male/phallus], [woman], [small/negative], “bad bird” and [weary, weak]. One can identify that some lemmas are connected to two clusters but are placed in one cluster based on statistical analysis. Such is the word ẖ rd “child” appearing in the blue cluster on the right side of Fig. 6 in 10 examples. This lemma also has a connection to the orange cluster, based only on two examples written in a plural form ẖ rd.w “children”; the plural form diverges in classification and is written with the [people] classifiers: the [woman] and [man] categories. Still, the lemma ẖ rd is part of the blue cluster as its association to it is more significant. In the future, we will examine a larger corpus using this method to determine which clusters emerge and assess classification patterns’ stability between different texts.

2.5 iClassifier, classifier axis

Apart from examining the macro image of an interconnected classifier network, iClassifier enables the user to examine closely each classifier (classifier axis) or each lemma (lemma axis). Via the classifier axis, we map the classifiers’ combination patterns. We will now demonstrate how to browse classifier reports, going through all parts of the online report form (Figure 7(b)). On the top left of the classifier report in Figure 7(b), several built-in queries are available, enabling to query all signs tagged as classifiers or to produce a report of a certain type of classifiers—for example, exclude the phonetic and grammatical classifiers and view only purely lexical/semantic categories, by choosing “subset by level” and “lexical.” The user can further select the texts, POS and script type presented in the classifier report. As shown in the image, only one text, “pHarris 500- Doomed Prince,” was selected and highlighted in black on the top left of Figure 7(b). Below it are additional queries based on annotations matched with the Thot metadata thesauri (presented in Section 3.4).

Two types of category graphs are displayed in the classifier report. The first, a circle with green nodes, appears on the left of Figure 7(b), titled “Lemma co-occurrence graph- Edge width by no. of examples.” The host lemmas occurring with a classifier surround it as round nodes, each lemma is written with its transliteration and translation. The width of the gray edges represents how many times each lemma was attested in the text with this classifier—for example, among all host lemmas, the thicker edge of the word hrw “day” represents that this word occurs more than other lemmas. It occurs 10 times, while other lemmas occur only twice or once. The surrounding lemma nodes represent the semantic scope of the [sun] classifier in this text.³³ In each corpus we study, a classifier may have a varying semantic scope. For example, we have eight items with the [sun] classifier in the Doomed Prince. A broader semantic scope of 25 lemmas in the various copies of The Story of Sinuhe (Goldwasser and Soler, 2024), a larger corpus of 1068 lemmas and 7733 tokens. Beneath the circle, on the bottom left of Figure 7(b), a table “Lemma by no. of examples” lists the number of instances each of the host lemmas has with the presented classifier.

Another visualization of the classifier category appears in the center column of Figure 7(b), titled “Lemma co-occurrence graph- Edge width by lemma centrality rank.” The relative pinkish hue of the nodes in the lemma centrality graph represents the percentage of examples of each lemma co-occurring with the [sun] classifier in the corpus. In cases where the lemma was regularly written with this classifier, the node’s color is dark pink (e.g., wnw.t “hour”). When the lemma is also classified into other classifier categories or has some unclassified tokens, its hue is lighter.³⁴ An example of a lemma with a lighter pink hue is the word hrw “day,” as only 10 of the 16 examples of the word get the classifier [sun]. The word hrw “day” is written without a classifier in a few instances. In these examples, the sun hieroglyph functions as a logogram. The color key for this graph appears below the table, and the exact percentages of classifier-lemma co-occurrence are written below in the “Lemma centrality rank statistics with N5 classifier.” This list shows how many of the lemmas’ tokens appear with this classifier in numbers and percentages. In these “classifier axis” graphs, we aim to explore which words are “central members” in a category and which are “fringe” members (Goldwasser, 2002: 27–32), as their categorization overlaps and interchanges between a few categories.

On the bottom part of the central column of Figure 7(b), a list titled “Tokens for this classifier” includes all examples with a full transcription of their source spelling, their tagged classifiers, their token number in iClassifier and their text name and line.

The right column of Figure 7(b) begins with a network graph titled “Classifier co-occurrence graph” displaying all other classifiers with which the discussed classifier co-occurs in this text. Here, one can learn about instances of multiple classification, that is, a host that takes a few classifiers. Each classifier co-occurring with the central classifier appears as a lilac node.³⁵ Again, the width of the edges marks the number of examples the two classifiers co-occur. For example, in one lemma šw “sunlight,” the [sun] classifier co-occurs with the [divine] classifier.

The table “Classifier co-occurrence statistics” lists how many times the classifier was attested with another classifier. Following it is another table, “Classifier combinations with this classifier,” which breaks down the combinations in which the classifier occurs and shows the number of attestations for each pattern. Some additional premeditated queries are the “POS co-occurrence stats” sorting the attestations of a classifier according to POS—for example, how many nouns (N), verbs (VB) or other parts of speech it may occur with. The table “Order statistics” sorts the classifier by its location in a sequence of multiple classification. This aims to track if classifiers usually occur in some semantically motivated order (e.g., schematic-taxonomic), as had been previously suggested in case studies (Goldwasser and Grinevald, 2012: 33–37). Lastly, the “Script statistics” sort the examples by script type—for example, how many tokens are in hieratic versus hieroglyphs in corpora that include multiple script types. In our case study in Figure 7(b), all the examples are in the hieratic script.

We visualize and produce classifier reports uniformly across all scripts. Therefore, one can produce a report on a comparable category [sun/time] in the ancient Chinese corpus. Figure 7(c) (after Xu, 2024) illustrates the semantic range of the classifier [sun/time] in the ancient Chinese Guodian corpus.

When we compare the category [sun/time] in the ancient Chinese corpus of Guodian texts (Figure 7(c)) to the category [sun/time] Egyptian Doomed Prince (Figure 7(b)), we see a somewhat different semantic range. However, one should keep in mind that the Egyptian text is one single short text that has limited vocabulary. In fact, outside of this specific corpus, the category [sun/time] in Egyptian shows high semantic overlap with the [sun/time] category in ancient Chinese.³⁶ In both scripts, a sign for the meaning “sun” underwent a semantic extension of the category from its first iconic use for the classification of [sun] into the taxonomic concept of [time] (for Egyptian, see Goldwasser, 2002: 13–14). Still, the category must be studied in both scripts from its emergence through its attestation span to meticulously describe where it surfaced and how its semantic scope changed over time. By collecting more data, we will be able to map which concepts for various aspects of the sun are shared between these far-removed scripts—for example, the word /*taw/ “to enlight,” and its parallel in Egyptian sḥḏ “to make bright; to illuminate,” among other instances of semantic overlap.

2.6 iClassifier, lemma axis

Another information axis in the iClassifier digital reports consists of detailed reports for each lemma in a dataset. The lemma report summarizes all information on the attestation of the lemma in the corpus, listing its examples and showcasing classification patterns, including attestations without classifiers. Figures 8(a) and (b) exemplify what reports of selected lemmas included in the category [sun] in ancient Chinese (Figure 8(a)) and Egyptian (Figure 8(b)) look like.

Figure 8.

(a) The lemma 時 shí “time” in the Guodian texts. ©iClassifier, Yanru Xu. (b) The lemma hrw “day” in the Doomed Prince. ©iClassifier, Haleli Harel.

The lemma in Figure 8(a) is the ancient Chinese character 時 ( 寺) meaning “time; hour; season; period; era; age; opportunity,” regularly written with a semantic classifier (here in pre-position) labeled as [sun/ time], and the phonetic element 寺.³⁷ Similarly, in Figure 8(b), the Egyptian lemma hrw “day” occurs with the [sun/time] classifier in 10 tokens, written in post-position. We can learn from the report how many of a lemma’s examples were unclassified, as well as list all its attestations with their orthography or query attestations by text or script type, among other variables. One unified annotation and visualization scheme is used to display reports for all classifier systems in all scripts documented by the digital tool. While these examples showcase a lemma that has only one categorization pattern, some lemmas have multiple classifiers or alternating classification patterns displayed in their reports.

2.7 iClassifier, text axis (in implementation)

We currently create additional research tools for iClassifier, such as analysis of the text axis.

Using the text report, the user will be able to produce a report of the classification patterns of each text. The following information regarding a certain text or a group of texts is available:

Text network overview: How many lemmas and tokens are in the text, number of classified tokens/lemmas, number of unclassified tokens/lemmas. The classification rate considers the philological documentation of text damage, counting how many reconstructed and unpreserved tokens exist.

Classification rate (per POS): Number of classified tokens versus unclassified tokens per each POS.

Classifier combinations in the corpus: List of combinations by the number of occurrences of each classification pattern and tools for semantic analysis of a category by ranking its range of context-sensitive meanings in the text.

3. Behind the screens: The iClassifier input system

Traditional transcriptions of ancient Egyptian have mostly omitted classifiers. They have been considered as “seen but not transcribed.” While phonological information is regularly transcribed, classifiers of all kinds remain mostly undocumented.³⁹ One of the tasks of the presented project is to create systematic transcription tag classifiers while encoding each word in the text. Users of iClassifier fill out or edit three primary input forms (see below). First is the token (=example) form, where classifiers are annotated (Section 3.1). Secondly, a token is linked to the text (Section 3.3) in which it is attested and to a specific lemma (=word; Section 3.2), the dictionary form of which it is an example.

3.1 iClassifier, token input form

The core of the input system of iClassifier is the token form. There, classifiers’ annotation and analysis occur. All reports presented above are based on annotating classifiers and conducting classifier analysis. Each classifier is tagged in the system with the tilde sign ∼ before and after. This is the minimal analysis required for the activation of the digital tool. All other queries are optional and the users can decide to what extent they want to apply the suggested analyses. The tool will accordingly plot all maps (general network, classifier reports, lemma reports) according to all marked classifiers in a dataset. A user can adapt the annotation scheme to their project and, for example, decide to a priori tag only semantic classifiers, resulting in a network map of semantic classifiers alone.

In Figure 9(a), the input form of the ancient Egyptian is presented with an example of a token classified by the sign (N5) [sun]. In this example the Egyptian lemma ꜣbd “month” is transcribed in the ancient Egyptian encoding MdC (Manuel-de-Codage) as N11:N14-D46:N5 (), and next the classifier [sun] (N5) is identified by tagging it with tilde signs before and after, N11:N14-D46-∼N5∼ (=, with a classifier in final position).

Figure 9.

(a) The iClassifier input system, TOKEN input form, ancient Egyptian. An example of a token page in ancient Egyptian of the lemma ꜣbd “month.” ©iClassifier, Haleli Harel. (b) The iClassifier input system, TOKEN input form, ancient Chinese. An example of a token page in ancient Chinese of the lemma 時 “season.” ©iClassifier, Yanru Xu.

Similarly, in Figure 9(b), the Chinese example (=token) 時 ( 寺) “seasons” is fully analyzed. First, the character 時 is split into its elements, and the semantic classifier [sun] (in this example in pre-position) is marked (pre-position) with tilde signs ∼ ∼寺 (∼[sun]∼ seasons). For all scripts whose original arrangement differs from that of the modern font, an image can be added to the token, as can be seen in the image of this specific token in ancient Chinese appearing on the left side of Figure 9(b).

3.2 Classifier analysis in the token input form

The iClassifier user may additionally perform a classifier analysis after marking classifiers wherever they exist (see Figure 9(a), bottom right). The user may tag information types of each classifier, which can be semantic (encyclopedic), semantic (pragmatic), grammatical, metatextual or phonetic. It is possible to create networks of semantic classifiers alone, or of cases considered pragmatic (parallel to the distinction between “lexeme” versus “referent” classification in Lincke and Kammerzell, 2012: 88–99).

Similarly, networks of grammatical classifiers or phonetic classifiers can be drawn.³⁹ Classifiers are tagged as grammatical in cases where the classifier marks mass or gender (Goldwasser and Grinevald, 2012: 28–29; Werning, 2011: 102–103). A typical case grammatical classifier is that of [plural] classifiers—for example, the Egyptian word jtr.w “seasons.” The plural ending is spelled out in the final w of the word () and is also repeated in the classification as [plural]  (). Another common instance of grammatical classification is the gender marker of the first person (1sg.c). Even though the ancient Egyptian language did not have gender variation in first-person pronouns, the script shows a whole array of gender and status classifiers, [man], [woman], [nobleman], [royal], [divine]. (Goldwasser, 1995; Goldwasser and Grinevald, 2012: 26–28; Lincke and Kammerzell, 2012: 56–58). The orthography of the first-person marker varies according to the host’s gender and status, marking its referential-contextual actor and the category he belongs to—for example, [divine], written as a classifier of the first person (1.sg) pronoun = i, . Next, users of iClassifier can optionally annotate the classifier-host relations between the classifier and its host lemma (following Goldwasser 1995, 2002, 2005). Table 1 exemplifies some semantic relations between classifiers and their hosts. The classification of the first person can be further analyzed as taxonomic, as it assigns the protaganist to one of the broad semantic categories listed above.

Table 1.
Semantic relations between classifiers and their hosts.

Classifier–host relations Example

Taxonomic
A classifier in taxonomic relation is a chosen prototype of a superordinate category that represents the category as a whole.⁴⁰ Its hosts are members of the superordinate category and stand in an “example of” relation to it.⁴¹ [motion]
in the word swtwt “to walk about, to travel,” as “walking” is “an example of” an act of [motion].
[house/ habitat]
The word jh.w “stable is an “example of” the [house/habitat] superordinate category, a “type of” building or house.

Taxonomic- repeater
A repeater is a hieroglyph repeating the same signified presented in the lemma phonetically. It repeats the phonological information recorded by the phonograms with a semantic classifier, hence the name “repeater.”⁴² The relations are still taxonomic, as a crocodile is an example of the category [crocodile]. This category includes other examples of crocodiles such as the crocodile or the god sbk, and a voracious spirit in the form of a crocodile (Gardiner, 1957: 475; DZA 21.977.480) [crocodile]⁴³
In the word mzḥ “crocodile” (4,4; 7,6; 7,9), the classifier repeats the semantic information presented by the previous hieroglyphs functioning as phonograms. It represents the same information in the pictorial.

Taxonomic-metaphoric
A classifier can be linked to its host by metaphorical relations (Goldwasser, 2005). In this case, the mute classifier represents a prototype of another, ad hoc category. The host word becomes temporarily a member in this category. (See Goldwasser, 1995: 83–84 for “ad hoc” categories). [swollen things]
In the word špt “to be angry,”⁴⁴ the hieroglyph of the puffer fish stands as a prominent exemplar for the category ANGRY SWOLLEN-THINGS, which is an ad hoc category. The angry person in a crowd of men is compared to this kind of fish in the crowd of fish. “He swells of anger as a puffer fish” (detailed discussion in Goldwasser, 2005: 106–107).

Schematic
Various types of schematic (metonymic) knowledge relations may exist between a word and its classifier, such as component/integral object (part-whole), or stuff/object (“made of”) relationship; see Goldwasser, 2002: 33–35. [house/habitat]
In the word sšd “window” (5,4; 5,6). A window is a “part of” (component) of [house]. Thus, various words for elements of the house and even furniture stand in schematic (metonymic) relation to the category head [house].
Like the [house/ habitat] category, the [wood] category features both taxonomic and schematic members. For example, with items “made-of” [wood], ʿgrt “wagon” or hdm.w “footstool” (two lexical borrowings from Semitic; see Harel, 2023: 87–88).

Unclear
A classifier is marked as “unclear” in cases where further discussion is needed to determine its classifier-host semantic relations. This allows the user to easily return to such tokens or to extract them from the project’s report.

Semantic roles relations In the case of verbs and deverbals, a list of semantic roles relations is included in the iClassifier classifier analysis form, allowing to determine which argument of the verbal action is represented by the classifier (Kammerzell, 2015: 1400–1401, following a list and discussion in Lincke, 2011). The list of semantic role relations includes various aspects of an action. The list includes: EXPERIENCER (AGENT), PATIENT, INSTRUMENT, SOURCE, GOAL, LOCATION, MOVER, ZERO, CAUSEE OR ABSENTEE. For example, in the lexemes ẖdb “to kill” or fdq “to slice,” the classifier [knife] portrays the INSTRUMENT.

Classifier–host relations	Example
Taxonomic A classifier in taxonomic relation is a chosen prototype of a superordinate category that represents the category as a whole.⁴⁰ Its hosts are members of the superordinate category and stand in an “example of” relation to it.⁴¹	[motion] in the word swtwt “to walk about, to travel,” as “walking” is “an example of” an act of [motion]. [house/ habitat] The word jh.w “stable is an “example of” the [house/habitat] superordinate category, a “type of” building or house.
Taxonomic- repeater A repeater is a hieroglyph repeating the same signified presented in the lemma phonetically. It repeats the phonological information recorded by the phonograms with a semantic classifier, hence the name “repeater.”⁴² The relations are still taxonomic, as a crocodile is an example of the category [crocodile]. This category includes other examples of crocodiles such as the crocodile or the god sbk, and a voracious spirit in the form of a crocodile (Gardiner, 1957: 475; DZA 21.977.480)	[crocodile]⁴³ In the word mzḥ “crocodile” (4,4; 7,6; 7,9), the classifier repeats the semantic information presented by the previous hieroglyphs functioning as phonograms. It represents the same information in the pictorial.
Taxonomic-metaphoric A classifier can be linked to its host by metaphorical relations (Goldwasser, 2005). In this case, the mute classifier represents a prototype of another, ad hoc category. The host word becomes temporarily a member in this category. (See Goldwasser, 1995: 83–84 for “ad hoc” categories).	[swollen things] In the word špt “to be angry,”⁴⁴ the hieroglyph of the puffer fish stands as a prominent exemplar for the category ANGRY SWOLLEN-THINGS, which is an ad hoc category. The angry person in a crowd of men is compared to this kind of fish in the crowd of fish. “He swells of anger as a puffer fish” (detailed discussion in Goldwasser, 2005: 106–107).
Schematic Various types of schematic (metonymic) knowledge relations may exist between a word and its classifier, such as component/integral object (part-whole), or stuff/object (“made of”) relationship; see Goldwasser, 2002: 33–35.	[house/habitat] In the word sšd “window” (5,4; 5,6). A window is a “part of” (component) of [house]. Thus, various words for elements of the house and even furniture stand in schematic (metonymic) relation to the category head [house]. Like the [house/ habitat] category, the [wood] category features both taxonomic and schematic members. For example, with items “made-of” [wood], ʿgrt “wagon” or hdm.w “footstool” (two lexical borrowings from Semitic; see Harel, 2023: 87–88).
Unclear A classifier is marked as “unclear” in cases where further discussion is needed to determine its classifier-host semantic relations. This allows the user to easily return to such tokens or to extract them from the project’s report.
Semantic roles relations In the case of verbs and deverbals, a list of semantic roles relations is included in the iClassifier classifier analysis form, allowing to determine which argument of the verbal action is represented by the classifier (Kammerzell, 2015: 1400–1401, following a list and discussion in Lincke, 2011).	The list of semantic role relations includes various aspects of an action. The list includes: EXPERIENCER (AGENT), PATIENT, INSTRUMENT, SOURCE, GOAL, LOCATION, MOVER, ZERO, CAUSEE OR ABSENTEE. For example, in the lexemes ẖdb “to kill” or fdq “to slice,” the classifier [knife] portrays the INSTRUMENT.

3.3 iClassifier, lemma input form

Attestations of classifiers are tagged in concrete examples (tokens). In the iClassifier input system, each token (a specific example of a lemma) is matched with the dictionary entry of a lemma. We upload a dictionary for each language to lemmatize the dataset within the common lexicographic tradition of each field.⁴⁵ If there is no digital dictionary or lemma list, the user can cite a published printed dictionary and link between the dictionary entry (i.e., lemma) and its examples (tokens). In cases where the user wants to highlight the semantic range of a lemma, meanings from several dictionaries can be added.

The Egyptian lemma form (Figure 10(a)) allows linking between stages of the Egyptian language (Demotic, Coptic) and linking lemmas to roots based on the TLA root list.⁴⁶ In addition, all lemmas can be assigned comparative concept-semantic field annotations based on a typological list published by the Concepticon project (see List et al., 2021).

Figure 10.

(a) The iClassifier input system, LEMMA input form, Egyptian. An example of the fully annotated lemma ꜣbd “month” in ancient Egyptian lemma list (TLA). The section of the input page on the right is optional. ©iClassifier, Haleli Harel. (b) The iClassifier input system, LEMMA input form, Chinese. An example of the lemma 時 “n. time; hour; season; period; era; age; opportunity” in ancient Chinese. ©iClassifier, Yanru Xu.

In the Chinese portal, the lemma input form is identical (Figure 10(b)) and each lemma is a Chinese character. In specific tokens analyzed in ancient Chinese corpora, an English context-based meaning of the respective character is provided (see Xu, 2024).

3.4 Text input, metadata annotations

Adding source images: A primary focus of iClassifier is to link examples to their contexts. Users can add source images of the text in the case of ancient, complex writing systems. It is possible to add orthographic comments and accompany them with a source image using the user-friendly iClassifier cropping tool.

Metadata: In order to contextualize examples fully, all traceable context features are mapped. Each example is linked to a TEXT and OBJECT—the text in which it occurred, and the object on which this text was written. For ancient Egyptian, we follow the TLA metadata structure and the Thot Data Model (TDM)⁴⁷ implemented into the iClassifier text/witness form (Figure 11(a)). Our users annotate each textual source, its script type, genre and date, choosing pre-set values created by the Thot Thesauri and the TLA text and object hierarchy, utilizing the Thot metadata API.⁴⁸

Figure 11.

(a) The iClassifier input system, TEXT input form. An example of metadata annotation for The Tale of The Doomed Prince in ancient Egyptian, imported from the TLA. ©iClassifier, Haleli Harel. (b) The iClassifier input system, TEXT input form. An example of metadata annotation for the text《郭店楚墓竹簡》《窮達以時》in ancient Chinese. ©iClassifier, Yanru Xu.

For ancient Chinese and Sumerian, metadata annotations were curated specifically and adjusted to common terminologies in each research field. For example, in Figure 11(b), we can see how a certain stage in the Chinese script is chosen for the text. In this example, the selected script is “Chu script,” and the text’s genre is “philosophical literary.” Object type is also noted (in this case, “Bamboo manuscript”). An original location and period or date are added, if available. Tracking such metadata variables allows the user to query later for results by place, time, script type or support.

4. Data structure and technical apparatus

The reports of projects created in iClassifier are published publicly with share-alike reuse and attribute license (CC BY-SA 4.0). The backend (data storage and web server) of the current version of the database, iClassifier BETA (released May 2020), is implemented as an array of SQLite databases with a separate database file for each project with Python (Flask + Gunicorn) and Golang web stack. Its user interface is based on Mithril.js.

The iClassifier network maps are drawn using the JavaScript library Vis.js. Textual data of each project can be easily exported as a .xlsx file with all database tables represented by a sheet according to the general database structure in Figure 12. The data of individual projects will be published with CC BY-SA 2.0 license together with the academic publication of each dataset.

Figure 12.

The iClassifier database structure. Drawn by Dmitry Nikolaev for iClassifier.

5. Summary

In this article, we presented a new method to visualize and study the complexity of innate categorization systems in ancient scripts. Guidelines of the report system of iClassifier were exemplified, and the annotation scheme of the input forms was surveyed. The presented research tool is a collaborative, digital environment created to apply quantitative measures to classifier systems. The tool enables philological and qualitative evaluation of each example by commenting on orthography, grammar or lexical semantics. Our project aims to record corpora containing classifiers, create data and test the universal versus culture-specific patterns in overt categorization systems such as classifiers. We are taking the first steps towards this goal by creating comparative measures to track the semantic contents of each system, calibrate our measuring tools, and compare semantic profiles. As corpora in each field will be curated, our next task is to apply similarity measures and show which meanings and categories are shared between various systems and what each system's general structure and hierarchy looks like. Preliminary results, such as the long-tailed distributions (Figures 3(b)–5), suggest that various systems of graphemic classifiers share a particular overarching structure. Still, such suggestions should be scrutinized based on accurate sample studies. By using the presented method, one can discover previously unseen topographies of inner hierarchies of the lexicon.

Corpus credits

Ancient Egyptian

The digitized text is accredited to Lutz Popko, with contributions by Altägyptisches Wörterbuch, “Joppe; Prinzenmärchen” (Object ID CVX3L24WLFAKNEOXEVC6QGPBJI) https://thesaurus-linguae-aegyptiae.de/object/CVX3L24WLFAKNEOXEVC6QGPBJI, in: Thesaurus Linguae Aegyptiae, Corpus issue 17, Web app version 2.01, 12/15/2022, T. S. Richter and D. A. Werning (eds), by order of the Berlin-Brandenburgische Akademie der Wissenschaften and H.-W. Fischer-Elfert and P. Dils by order of the Sächsische Akademie der Wissenschaften zu Leipzig (accessed: 19 June 2023). Classifier marking by Haleli Harel. Imported into iClassifier by Dmitry Nikolaev.

Ancient Chinese

The Guodian inscriptions of Laozi, courtesy of The Intelligent Retrieval Network Database of Chinese Characters (East China Normal University (ECNU), Shanghai). Classifier marking by Yanru Xu. Imported into iClassifier by Dmitry Nikolaev.

Sumerian

All tokens and lemmas are referenced after the ePSD2 lemma list. Classifier marking by Bo Zhang. Imported into iClassifier by Dmitry Nikolaev. http://oracc.museum.upenn.edu/epsd2.

Authors’ contributions

Haleli Harel: writing, database design and realization, methodology, classifier analysis in The Tale of The Doomed Prince, classifier distribution and long-tail calculations. Orly Goldwasser: discussions in classifier theory, methodology, reviewing. Dmitry Nikolaev: IT for iClassifier, community detection algorithm, data transformation and upload. Other contributors (tables and graphs): Yanru Xu, Chinese; Bo Zhang, Sumerian.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by ISF (grant number 735/17) and ISF (grant number 2408/22).The ArchaeoMind Lab https://archaeomind.huji.ac.il/ (PI Orly Goldwasser), Institute of Archaeology, The Hebrew University of Jerusalem.

ORCID iDs

Haleli Harel https://orcid.org/0000-0002-4012-015X

Orly Goldwasser https://orcid.org/0000-0002-2152-6429

Dmitry Nikolaev https://orcid.org/0000-0002-3034-9794

Footnotes

Notes

ORCID iDs

Haleli Harel

Orly Goldwasser

Dmitry Nikolaev

References

Aikhenvald

(2021) One of a kind: On the utility of specific classifiers. Cognitive Semantics 7(2): 232–257.

Barabasi

(2005) The origin of bursts and heavy tails in human dynamics. Nature 435(7039): 207–211.

Chantrain

(2014) The use of classifiers in the New Kingdom. A global reorganization of the classifiers system? Lingua Aegyptia 22: 39–59.

Chantrain

(2021) Classification strategies from the end of the Ramesside Period until the Late Period: A living system. Zeitschrift für Ägyptische Sprache und Altertumskunde 148(1): 50–64.

Chen

(2015) The prototypical determinatives in Egyptian and Chinese writing. Scripta 8: 101–126.

David

(2000) De l’infériorité à la perturbation: l’oiseau du “mal” et la catégorisation en Egypte ancienne. Wiesbaden: Harrassowitz.

Di Biase-Dyson

(2013) Foreigners and Egyptians in the Late Egyptian Stories: Linguistic, Literary and Historical Perspectives. Leiden: Brill.

Di Biase-Dyson

(2017) Metaphor in the teaching of Menna: Between rhetorical innovation and tradition. In: Gillen T (ed.) (Re)productive Traditions in Ancient Egypt—Proceedings of the conference held at the University of Liège, 6th–8th February 2013. Liège: Presses Universitaires de Liège, pp. 163–179.

Faulkner

(1962) A Concise Dictionary of Middle Egyptian. Oxford: Griffith Institute , Ashmolean Museum. DZA = Digitized Slip Archive. Available at: http://aaew.bbaw.de/tla/servlet/DzaBrowser (accessed 12 May 2023).

10.

Gardiner

(1932) Late-Egyptian stories. Brussels: Fondation Égyptologique Reine Élisabeth .

11.

Gardiner

(1957) Egyptian Grammar. 3rd rev. ed. Oxford: Griffith Institute .

12.

Goldwasser

(1995) From Icon to Metaphor Studies in the Semiotics of the Hieroglyphs. Fribourg/Göttingen: Fribourg University Press/Vandenhoeck & Ruprecht .

13.

Goldwasser

(2002) Prophets, Lovers and Giraffes: Wor(l)d Classification in Ancient Egypt. Wiesbaden: Harrassowitz.

14.

Goldwasser

(2005) Where is metaphor? Conceptual metaphor and alternative classification in the Hieroglyphic script. Metaphor and Symbol 20(2): 95–113.

15.

Goldwasser

(2009) A comparison between classifier language and classifier script: The case of Ancient Egyptian. In: Goldenberg G (ed) A Festschrift for Hans Jakob Polotsky. Jerusalem: Magnes Press, pp. 16–39.

16.

Goldwasser

(2022) Des déterminatifs aux classificateurs: la catégorisation dans l’écriture des anciens Égyptiens. In: Polis S (ed.) Guide des écritures de l'Égypte ancienne. Le Caire: Institut français d’archéologie orientale, pp. 192–199.

17.

Goldwasser

(2023) Is there an animal in the Ancient Near East? In: Zsolnay I (ed.), Seen, Not Heard: Composition, Iconicity, and the Classifier Systems of Logosyllabic Scripts. Chicago: Institute for the Study of Ancient Cultures of the University of Chicago, pp. 121–158.

18.

Goldwasser

(in press) Classifiers as priming devices, or “classifiers tell us what we already know”. In: Chantrain G (ed.) Language, Semantics and Cognition. New Haven: Yale Egyptological Studies 14, pp. 75–108.

19.

Goldwasser

Grinevald

(2012) What are determinatives good for? In: Grossman

Polis

Winand

(eds) Lexical Semantics in Ancient Egyptian. Hamburg : Widmaier, pp. 17–53.

20.

Goldwasser

Handel

(2024) Introduction: Graphemic classifiers in complex script systems. Journal of Chinese Writing Systems 8(1): 2–13.

21.

Goldwasser

Soler

(2024) Semantic classifiers (determinatives) and categorization in the ancient Egyptian writing system: Rules, list of classifiers, and studies by iClassifier on The Story of Sinuhe. Journal of Chinese Writing Systems 8(1): 34–58.

22.

Grinevald

(2015) Classifiers, linguistics of. In: J. D. Wright (ed.), International encyclopedia of the social & behavioral sciences. Amsterdam: Elsevier, 1973–1978.

23.

Grotenhuis

(Forthcoming) Digitizing Seth: Digital studies of Sethian hieroglyphs in the Coffin Texts. In: Proceedings of the 13th International Congress of Egyptologists, Leiden.

24.

Gulbahce

Lehmann

(2008) The art of community detection. BioEssays 30(10): 934–938.

25.

Halliday

(1985) Spoken and Written Language. Oxford: Oxford University Press .

26.

Handel

(2023) The cognitive role of semantic classifiers in Modern Chinese writing as reflected in neogram creation. In: Zsolnay I (ed.) Seen, Not Heard: Composition, Iconicity, and the Classifier Systems of Logosyllabic Scripts. Chicago: Institute for the Study of Ancient Cultures of the University of Chicago, pp. 159–162.

27.

Harel

(2022) Zooming in and out on Hoch’s Semitic Word List. Journal of the Society for the Study of Egyptian Antiquities 48: 57–77.

28.

Harel

(2023) A network of lexical borrowings in Egyptian texts of the New Kingdom: Organizing knowledge according to the classifier system. Doctoral Dissertation, The Hebrew University of Jerusalem, Israel.

29.

Harel

(in press) Comparing networks of semantic categories: Digitizing graphemic classifiers in ancient complex scripts using the iClassifier research platform. In: Chantrain G (ed.) Language, Semantics and Cognition. New Haven, CT: Yale Egyptological Studies 14, pp. 109–132.

30.

Harel

Goldwasser

Nikolaev

(2023) Mapping the ancient Egyptian mind: Introducing iClassifier, a new platform for systematic analysis of classifiers in Egyptian and beyond. In: Roberson JA, Vinson S and Lucarelli R (eds) Ancient Egypt and New Technology. Leiden: Brill, pp. 130–158.

31.

Kammerzell

(2015) Egyptian verb classifiers. In: Kousoulis P and Lazaridis N (eds) Proceedings of the Xth International Congress of Egyptologists, Rhodos 2008. Leuven: Orientalia Lovaniensia Analecta, pp. 1395–1416.

32.

Lincke

(2011) Die Prinzipien der Klassifizierung im Altägyptischen. Wiesbaden: Harrassowitz .

33.

Lincke

Kammerzell

(2012) Egyptian classifiers at the interface of lexical semantics and pragmatics. In: Grossman

Polis

Winand

(eds) Lexical Semantics in Ancient Egyptian. Hamburg : Widmaier, pp. 55–112.

34.

List

Rzymski

Greenhill

et al. (eds) (2021) CLLD Concepticon 2.5.0 [Data Set]. Available at: https://Doi.Org/10.5281/Zenodo.4911605. (accessed 15 November 2023)

35.

Polis

Razanajao

(2016) Ancient Egyptian texts in context: Towards a conceptual data model (The Thot Data Model – TDM). Bulletin of the Institute of Classical Studies 59: 24–41.

36.

Polis

Rosmorduc

(2015) The hieroglyphic sign functions: Suggestions for a revised taxonomy. In: Amstutz

Dorn

, Müller M, et al. (eds) Fuzzy Boundaries: Festschrift für Antonio Loprieno, vol. 1. Hamburg : Widmaier, pp. 149–174.

37.

Satzinger

Stefanovic

(2021) Egyptian Root Lexicon. Hamburg : Widmaier.

38.

Schneider

(2008) Fremdwörter in der ägyptischen Militärsprache des Neuen Reiches und ein Bravourstück des Elitesoldaten (Papyrus Anastasi I 23, 2–7). Journal of the Society for the Study of Egyptian Antiquities 35: 181–205.

39.

Selz

(2021) Appositive semantic classification in Sumerian cuneiform and the implementation of iClassifier. Ash-Sharq: Bulletin of The Ancient Near East 6: 142–171.

40.

Selz

Grinevald

Goldwasser

(2017) The question of Sumerian determinatives: Inventory, classifier analysis, and comparison to Egyptian classifiers. In: Werning D (ed.) Proceedings of the Conference “Crossroads: Whence And Whither?” Lingua Aegyptia 25 Hamburg: Widmaeir, pp. 281–344.

41.

Selz

Zhang

(2024) Classification in Sumerian cuneiform and the implementation of iClassifier. Journal of Chinese Writing Systems 8(1): 59–78.

42.

Stern

Pommerening

(Forthcoming) e(bers)Classifier—digital analysis of papyrus Ebers with a case study of the classifier D26. In: Proceedings of the 13th International Congress of Egyptologists, Leiden.

43.

Soler

(Forthcoming) Classifiers in Ancient Egyptian Scripts: A Corpus-based Analysis of Literary Texts. Doctoral Dissertation, The University of Barcelona, Barcelona, Spain.

44.

TSL = Thot Sign List, Université de Liège and Berlin-Brandenburgische Akademie der Wissenschaften. Available at: https://Thotsignlist.Org/ (accessed 12 May 2023). PI Stephane Polis.

45.

TLA = Thesaurus Linguae Aegyptiae = Corpus Issue 17, Web App Version 2.01, 12/15/2022, Tonio Sebastian Richter and Daniel A (eds). Werning by order of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert and Peter Dils by order of The Sächsische Akademie der Wissenschaften zu Leipzig. Available at: https://thesaurus-linguae-aegyptiae.de/search (accessed 12 May 2023).

46.

Werning

(2011) Das Höhlenbuch: Textkrigrammatitische Edition und Textgrammatik. 2 vols. Wiesbaden: Harrassowitz .

47.

Werning

(2015) Einführung in die hieroglyphisch-ägyptische Schrift und Sprache: Propädeutikum mit Zeichen- und Vokabellektionen, Übungen und Übungshinweisen. Berlin: eDoc-Server der Humboldt-Universität zu Berlin. Available at: http://edoc.hu-berlin.de (accessed 12 May 2023).

48.

Winand

Stella

(2013) Lexique du Moyen Egyptien. Liège: Université de Liège .

49.

(2024) Semantic classifiers in Guodian Bamboo Manuscripts: Reconstructing categories in the Ancient Chinese mind. Journal of Chinese Writing Systems 8(1): 14–33.

50.

(Forthcoming) Comparative Research of Semantic Classifiers in Ancient Chinese Scripts and Ancient Egyptian Scripts. Doctoral Dissertation, The Hebrew University of Jerusalem, Jerusalem, Israel.