Sage Journals: Discover world-class research

Abstract

In our work, we systematize and analyze implicit ontological commitments in the responses generated by large language models (LLMs), focusing on ChatGPT 3.5 as a case study. We investigate how LLMs exhibit implicit ontological categorizations reflected in the texts they generate, despite having no explicit ontology. The article proposes an approach to understanding the ontological commitments of LLMs by defining ontology as a theory that provides a systematic account of the ontological commitments of some text. We investigate the ontological assumptions of ChatGPT and present a systematized account, that is, GPT’s top-level ontology. This includes a taxonomy, which is available as an OWL file, as well as a discussion about ontological assumptions (e.g., about its mereology or presentism). We show that in some aspects GPT’s top-level ontology is quite similar to existing top-level ontologies. However, significant challenges arise from the flexible nature of LLM-generated texts, including ontological overload, ambiguity, and inconsistency.

Keywords

ChatGPT top-level ontology large language model LLM

1. Introduction

Large language models (LLMs) internalize knowledge about the world, which is reflected in the texts they generate. For example, questions like What is the largest city in Western Europe? ChatGPT¹ can answer reliably and correctly. While this knowledge is not represented explicitly in the form of an ontology or some other knowledge representation structure, the responses of LLMs involve implicit categorizations of entities into categories. For example, the question What is the difference between a monkey and a hammer? may elicit a response that contrasts two categories, namely a living organism and an inanimate object (see Figure 1). Importantly, this distinction is introduced without any priming by the user, like questions about ontological categories. Thus, while LLMs do not contain an explicit ontology, their responses contain implicit ontological commitments, which reflect the implicit ontological assumptions of the text corpora that the LLMs are trained on.

Figure 1.

ChatGTP Uses Categories Like “Living Organism” and “Inanimate Object.”

In this article, we will analyze these ontological commitments and present a top-level ontology that represents ontological distinctions that are made by ChatGPT 3.5 (Section 4). As we will discuss in Section 3 in more detail, because of the inherent differences between the technologies, the step from LLM to ontology is methodologically quite problematic. In Section 5 we focus on the differences between ChatGPT and top-level ontologies from the literature (e.g., BFO, DOLCE, UFO). While terms in an ontology are associated with fixed compositional, model-theoretic semantics, LLMs are trained to produce tokens stochastically depending on context. Thus, while ontologies resolve ambiguities, LLMs reproduce them. The mercurial nature of the terms used by LLMs is a significant obstacle to the investigation and the use of LLM’s top-level ontologies.

Nevertheless, we believe they are interesting to study for two reasons: since the ontological distinctions made by LLMs are a distillation of the ontological distinctions made by the authors of the millions of texts that the LLMs are trained on, this top-level ontology may be considered as an approximation of the common sense ontology that underpins everyday discourse. To study this common-sense ontology is of interest in itself. Further, there are already various efforts to use LLMs in the ontology development process (Ciatto et al., 2024), and these tools will likely become a staple in ontology engineering. Understanding the ontological assumptions of LLMs will make it easier to integrate the output they produce within the context of top-level ontologies like BFO, DOLCE, or UFO.

2. Related Work

We aim to develop the underlying top-level ontology of ChatGPT that covers the most important and basic categories. There are already numerous manually created top-level ontologies of high quality. Partridge et al. (2020) present and compare 37 different TLOs in their overview. Some well-known TLOs are BFO, DOLCE, GFO, UFO, and GMO. These top-level ontologies differ methodologically and philosophically. BFO by Arp et al. (2015) and Smith et al. (2015), for example, embraces ontological realism, while DOLCE by Gangemi et al. (2002) is more focused on conceptual and linguistic aspects of ontology engineering, and GMO is only linguistically motivated (Bateman et al., 1995). Other approaches, such as UFO, attempt to synthesize the ideas of the TLOs (Guizzardi et al., 2015). UFO emerged from a synthesis of DOLCE and GFO, a multicategorial approach by Herre (2010). TLOs can ultimately form a basis for specific domain ontologies, which is why SUMO, for example, is very broadly diversified to offer as many points of contact as possible (Niles & Pease, 2001). While the different ontologies represent different approaches, these TLOs are characterized by the fact that they were developed manually, are of high quality, and are philosophically sound.

Our work differs from the existing work on TLOs since our goal is not to manually create a new TLO, but to investigate the top-level ontology of ChatGPT. ChatGPT is based on the GPT-3.5 model, a kind of LLM or, more precisely, a generative pre-trained transformer (OpenAI, 2024). Other LLMs are, for example, Microsoft Copilot (Spataro, 2023), Google Bard (Pichai, 2023), and Claude (Anthropic, 2023).

LLMs are already used for ontology development, and there are many different high-quality and well-founded research approaches, for example, using it for combining terms for representing domain entities (Lopes et al., 2023), enriching ontologies with a fine-tuned GPT-3 model as a tool (Mateiu & Groza, 2023) or exploring the facilitation of (semi-)automatic construction of knowledge graphs, through open-source LLMs (Kommineni et al., 2024). But there are many other projects and approaches to support ontology development using LLMs (Babaei Giglou et al., 2023; Caufield et al., 2023; Chen et al., 2023; Hertling & Paulheim, 2023; Langer et al., 2024; Pan et al., 2023; Zhao et al., 2024). The existing research is about using LLMs to support the development of domain ontologies. In contrast, we focus on the top-level ontology of ChatGPT.

3. Methodology

The responses of LLMs may contain ontological commitments. For example, the response of ChatGPT in Figure 1 uses ontological categories of living organisms, inanimate objects, characteristics, abilities, and (social) behavior. Further, while it is not stated explicitly, the phrasing seems to indicate that living organisms and inanimate objects are disjoint, and ChatGPT asserts a “is used for” relationship between a kind of inanimate object and its function. In Section 4 we will systematize these assumptions and present the result as “GPT’s top-level ontology.” However, this goal raises several important methodological concerns. Most importantly, is there such an ontology in any meaningful sense of the word?

To address this question, we need to be aware that the term “ontology” is used ambiguously in the literature: An ontology $_{D}$ is a kind of document (e.g., version 228 of the ChEBI ontology $_{D}$ that was released in January 2024 and can be downloaded on Bioportal). An ontology $_{C}$ is a conceptualization of a certain domain by a person or a group of people (e.g., the ontology $_{C}$ of human anatomy according to Galen). (The distinction between these two senses of “ontology” is made, for example, in Guarino (1998).) Further, an ontology $_{R}$ consists of the categories of entities and their relationships to a certain part of reality (e.g., the ontology $_{R}$ of chemistry) (Smith, 2004).²

Strictly speaking, an LLM uses an ontology in neither of these senses. LLMs have no access to reality beyond the documents that are in their training corpora and, thus, have no access to ontology $_{R}$ . Since these corpora may include ontologies $_{D}$ among the millions of other documents, ontologies $_{D}$ may be used to train LLMs. However, these ontologies $_{D}$ are not treated any differently to other documents in these corpora, for example, Magna Carta. After training is complete, the LLM does neither contain a copy of the Magna Carta nor any ontology $_{D}$ .

Further, LLMs are like stochastic parrots, which produce without any comprehension texts that are “not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind” (Bender et al., 2021). A parrot may be trained to say “There are possible worlds!,” but we would not consider it a modal realist, since it lacks both the communicative intent as well as the understanding of the concepts involved. The same is true for LLMs, and, thus, it would be a category mistake to attribute a conceptualization or an ontology $_{C}$ to an LLM.

One could argue that while the parrot lacks understanding of its utterances, they reflect the ontology $_{C}$ of its trainer. Analogously, one might argue, the LLM is trained by processing millions of documents and, thus, learns an ontology $_{C}$ that coagulates the ontological views of the millions of authors of these documents. However, this is unfortunately not the case. LLMs generate text one token at a time based on a function that maps sequences of tokens to probability distributions of tokens. Consequently, the sentences generated by a given LLM are dependent on their prompts, and even minor changes to the prompt may lead to the generation of contradictory sentences. Hence, the texts generated by an LLM are often not even logically consistent, let alone express some coherent ontological view (Neuhaus, 2023). Thus, LLMs do not contain an implicit ontology $_{C}$ within their billions of parameters, which we can induce the LLM to reveal by providing the appropriate prompts.

In the remainder of this article, we use “ontology” in a fourth sense, namely as a theory of ontological commitments. More precisely:

Definition 1
Let T be a text (i.e., a sequence of sentences) and K a category of entities. T is ontologically committed to Ks if and only if T logically entails that Ks exist. An ontology of T is a set of sentences about the relationships between the categories that T is ontologically committed to, which is—as far as possible—logically consistent with T.

Definition 1 is based on a variant of an entailment account of ontological commitment (Bricker, 2016). An ontology is a theory (i.e., set of sentences) that provides a systematic account of the ontological commitments of some text $T$ .³ Since a text is defined as a sequence of sentences, “text” does not imply the written form, but includes spoken statements by (possibly different) domain experts. An ontology usually includes a taxonomic hierarchy and disjointness relationships between the categories. However, it may contain more complex assertions about the relationships between the categories, for example, a statement of the form: all instances of $K$ that meet some condition $C$ stand in relation $R$ to some instance of $K^{'}$ . Thus, the vocabulary of the ontology is not restricted to unary predicates that denote the ontological commitments of $T$ but may include additional predicates, for example, to express relationships between instances of categories. $T$ may reflect different points of view and, thus, may be logically inconsistent. For this reason, we do not assume that an ontology of $T$ is logically consistent with $T$ , but it should minimize the number of sentences in $T$ that it is logically inconsistent with.

Definition 1 has the benefit of being applicable to the texts generated by LLMs without having to assume that LLMs can form concepts, although LLMs produce inconsistent texts. The goal of this article is to provide a systematic account of important ontological commitments in the text generated by ChatGPT.

This task is complicated by the fact that the user may influence the ontological commitments in generated texts by the prompt that is used. For example, if one asks ChatGPT to explain the difference between particularized properties and tropes, it will generate a text that contains ontological commitments to particularized properties and tropes. Analogously, by providing the appropriate prompts, one can induce ChatGPT to provide a Platonian or an Aristotelian account of the nature of change. Since LLMs are trained on text corpora that contain philosophical texts, a user may prompt ChatGPT to generate texts reflecting a vast array of philosophical perspectives, which contain equally diverse ontological commitments. However, for the purpose of our article, we are not interested in the ontological commitments of texts that reflect philosophical positions in the literature. Since our goal is to study ChatGPT’s ontological commitments and assumptions, which may influence its usefulness as a tool for ontology engineering, we use prompts that are designed to reveal ontological commitments without priming ChatGPT to respond based on the philosophical literature. Another difficulty is that, in contrast to a human, ChatGPT has no introspection about its ontological commitments. For example, a prompt like “What are the most general ontological categories that you use to organize your knowledge?” will prompt it to generate an answer about ontological categories from the literature, but according to our observation, these are not necessarily the ones it actually uses.⁴

Therefore, to study ChatGPT’s ontological commitments, we use a more indirect approach by asking questions like the one in Figure 1, which elicit responses that contain ontological categories (e.g., animate object and inanimate object). These categories we use in follow-up questions about similarities of entities in these new categories (e.g., “What is the difference between the monkey and monkey’s behavior?”) and other possible instances of these categories (e.g., “Are shadows inanimate objects?”). We also use more theoretical questions about the relations of the categories (e.g., “Are there objects that are both animate and inanimate?”). However, these sometimes lead to responses by ChatGPT that are incoherent with follow-up questions. For example, in the same session, when asked whether there are entities that are both physical and abstract, ChatGPT claimed that national flags exhibit both physical and abstract characteristics. However, when asked whether a national flag is a physical object or an abstract object, it responds that it is physical and not abstract. This example illustrates the points made above: the texts that are generated by ChatGPT are neither logically consistent nor informed by introspection. For the same reason, asking ChatGPT for definitions is not as helpful as one might hope. For example, it defines “physical entity” as “entities that have a tangible, material existence” and “abstract entity” as “entities that lack a tangible, material existence.” Thus, according to these definitions, the categories should provide a partition of entities. However, as just mentioned, it sometimes claims that some entities are both abstract and physical. Further, it also claims (at least sometimes⁵ ) that shadows lack tangible, material existence but are not abstract entities.

These examples illustrate that to understand the ontology of ChatGPT, asking it for definitions of the categories it uses frequently leads to misleading responses. For this reason, we study ChatGPT’s ontology primarily by prompting it to use categories for the classification of examples or distinguishing categories.

A particular challenge for studying the ontology of LLMs is that only minor changes to prompts may lead to different results.⁶ To ensure that the ontology we present in the next section reflects the typical ontological commitments of texts produced by ChatGPT, we studied the results of asking similar questions in different variations. We only included categories that were used consistently in different contexts. However, since ChatGPT generates text based on a stochastic process, it may produce texts that are inconsistent with the ontology we present in the next section. This is a challenge for the verifiability of our claim that the ontology presented in Section 4 reflects the ontology of ChatGPT. Thus, to ensure at least transparency, we published the transcripts of our interactions with ChatGPT, on which our claims are based on.⁷
4. GPT’s Top-Level Ontology

4.1. Hierarchy

In this part, we present the ontological categories, which we have isolated in the conversations with ChatGPT. As already discussed in Section 3, ChatGPT does not use fixed definitions of its terms, and it sometimes provides what seems to be conflicting or outright contradictory responses. However, there is a stable core of ontological commitments that are consistently made by ChatGPT (with few exceptions).

This is the source that we used to construct the ontology that we present in this section. Figure 2 shows the subsumption hierarchy of the top-level categories and some additional classes, which illustrate the meaning of the higher-level categories. The ontology is available at https://w3id.org/gptto/v1.0.0/>. We will discuss the limits of this approach in Section 5.

Figure 2.

Hierarchy of ChatGPT’s Top-Level-Ontology.

The most general category ChatGPT uses is entity. For example, in response to questions like What do a pen and a bird have in common?, they are compared as entities with different characteristics. Furthermore, it differentiates between abstract and concrete entities. The terms for this can vary in different conversations, but the subdivision into the concepts of the categories remains roughly the same. Concrete entities, which are also called physical entities, are characterized by the fact that they physically exist, and have perceivable, observable, or measurable characteristics or behaviors. These entities can be part of complex systems or networks and interact with one another and with their environment through processes.

In contrast, abstract entities are those that do not exist in a physical form and have no physical attributes, therefore, they cannot be perceived directly through the senses. These include concepts, values, processes, and characteristics. Abstract entities are disjoint from concrete entities. In addition to abstract entities and concrete entities, there are also features. These are entities that are used to describe or identify instances in more detail. Features can be abstract or concrete. They can be subdivided further. For example, geographical features are features that contribute to the overall structure of an environment. For example, a desert can be categorized as a geographical feature of Earth.

The category concrete entity is divided into objects and occurrences. An object is a concrete entity that exists physically and has physical characteristics, that can be perceived. Objects can be natural entities and artificial objects. There is a deeper distinction between artificial objects, which are made by human beings and do not naturally occur in the environment, but are created for a specific purpose or function. These include, for example, tools. Natural objects, on the other hand, are concrete entities of the natural world that exist without direct human intervention or manufacturing. Instead, they arise through natural processes. The natural objects are divided into living entities (e.g., birds) and non-living entities (e.g., stones).

Occurrences are concrete entities that happen in the physical world and can be perceived, observed, or measured. An example would be the flying of a bird past your window or a kick flip, which can be classified as an action. Actions occur as sets of movements of living entities.

Another subclass of occurrence is phenomenon.⁸ Therefore, phenomena are occurrences that imply some kind of pattern or broader significance, which then can be scientifically studied. The behavioral pattern of birds flying south during winter can be classified even more precisely, namely as behavior. Behaviors are phenomena of living entities.

While concrete entities primarily comprise objects and occurrences, the class of abstract entities is subdivided into concepts, values, processes, and characteristics. Concepts are abstract entities, that exist as ideas or mental representations and do not have a physical presence. These include, for example, time or the concept of a bird. Concepts can influence human life and society and can represent concrete entities, but they do not themselves exist as physical objects.

Values are abstract entities that are used to express certain attributes or concepts. They can be based on qualitative or quantitative aspects. For example, a quality like beauty can be expressed as an esthetic value or moral values that express a certain understanding of moral. Numerical values are connected with properties, since they represent the magnitude of quantities.

Processes as a subclass of abstract entities encompass sequences of actions, changes, or transformations that occur over time, for example, photosynthesis or the ride on a skateboard. Processes are manifested through individual actions. Below in Section 5 we will discuss the relationship between processes and actions further, especially concerning the fact that processes are abstract entities and actions are concrete.

Characteristics are abstract entities that help to identify, classify, or describe classes of entities or individuals in terms of typical behavior. What distinguishes characteristics from features is the fact that features describe instances in more detail, whereas characteristics describe aspects of classes or typical behavior of individuals. “Your apple is round” is, therefore, an example of a feature of your apple, that is, an instance. “Apples are round,” on the other hand, would be a characteristic of apples in general, that is, the class of apples. The roundness can therefore be both a feature and a characteristic. Jack’s dark humor, for example, does not concern a class, but Jack as an individual. Nevertheless, dark humor is a characteristic here because it describes the phenomenon of typical behavior that occurs in Jack.

Attributes are characteristics that can be observed, measured, or represented. They seem to contribute to observing more overarching matters. For example, someone’s dark humor is a characteristic that describes a part of their personality. Furthermore, the observation of humor can also be representative of behavior or communication style, in which case it is considered an attribute.

Properties are objective, measurable attributes or characteristics of an object or entity that can be quantified, observed, or described using empirical methods. Properties can often be represented in the form of data. In this respect, humor would be difficult to consider as a property, but the speed of an eagle would be something that can be measured and observed and can be collected as data. They can be divided into qualities (e.g., color or beauty) and quantitative attributes (e.g., the specific saturation of a color, speed, or size). Qualities are subjective, perceptual, or experiential attributes of an entity that are subjectively perceived or experienced by individuals, while quantitative attributes comprise numerical or quantitative measurements of the properties of an object. Both can be represented by corresponding values, whereby values for quantities are created by measurements, and values for qualities are more abstract and conceptual. The classification of qualities and quantities in the hierarchy is not easy because it always depends on the context and whether they are used to identify an entity, are observed, or can be recorded as data.

Relationships are characteristics that refer to the relationships or connections that an object has with other objects or entities.

Functional attributes include functions, roles, propensities/dispositions, or purposes that an object fulfills within a system or context. Examples include the function of a tool (e.g., cutting, measuring, writing), and the role of a component in a machine. Propensities are functional attributes that refer to the tendency of a concrete entity to behave or act in a specific way.

Apart from the classes, we have also tried to recognize relations by isolating frequently occurring terms. This was much more difficult because even if terms occur frequently, it is not always possible to identify a clear relation, or it is difficult to determine which entities are part of the relation. One relation that we did notice, however, is manifest. It is often used for the relationship between abstract and concrete entities. For example, the action (concrete entity) manifests the process (abstract entity).

4.2. Additional Ontological Assumptions

In A survey of Top-Level Ontologies - To inform the ontological choices for a Foundation Data Model, the authors have summarized the results of a broad survey on ontologies conducted by the IMF technical team. It discusses the important ontological choices that are relevant for top-level ontologies (Partridge et al., 2020). Not all of the choices listed apply to the ChatGPT-based ontology, but many do. Thus, we discuss ontological assumptions contained in ChatGPT’s ontology based on the criteria in Partridge et al. (2020).

What structural relations are used and what are their properties? (Partridge et al., 2020) mention three structural relationships: type-instance, subtype-supertype, and part-whole. ChatGPT uses all of these relations. As Figure 1 illustrates, ChatGPT uses subtype relationships in its answers (e.g., a hammer is a kind of tool). The subtype-supertype relationship is distinguished by ChatGPT from type-instance relations.⁹ ChatGPT also uses part-whole relations, for example, processes are composed of individual actions. However, part-whole relations occur less frequently in ChatGPT’s responses than in the other two relationships. Other ontological interesting relations used by ChatGPT are “consist of” and “involve,” and “manifestation of.” The latter relation holds between concrete entities and abstract entities.

Since ChatGPT does not use a formal ontology that includes axioms, and its use of terminology is fluid, it is difficult to identify the formal properties of these relationships as discussed in Partridge et al. (2020). One criterion that ChatGPT meets is that the subsumption hierarchy is upward bounded in the sense that “entity” is the top category that subsumes all other categories.

Does it use General Extensional Mereology (GEM)? GEM is a strong mereological theory. One of its features is that there is a (unique) sum of any two arbitrary entities. According to ChatGPT, it is not the case that for any given two objects, there exists a sum. For example, if a duck sits on a table, no object is the sum of both. Thus, concerning objects ChatGPT does not embrace GEM. However, when asked directly, it suggests that there may be a conceptual construct “duck/table.” In contrast to ducks and tables, this construct is an abstract entity in ChatGPT’s ontology, thus, it is no tangible object in the physical world. This example illustrates that for ChatGPT concrete entities may be part of abstract entities and that different mereologies may be applied to different categories of entities.

If the location of two objects overlap, do they share parts? Partridge et al. (2020) define interpenetration as follows: two objects interpenetrate when they do not share parts, but their locations overlap. For example, if we assume that the statue and the clay it is made of are two different entities and that the statue consists of the clay, but has no clay as part, then this would be a case interpenetration. A contrasting view would be that the statue and the lump of clay share parts, and, thus, there is no interpenetration. In a conversation with ChatGPT, a flower bouquet made out of a LEGO set is not seen as one object, but as two objects whose parts are interconnected. If a part of the LEGO set falls off, only the set is incomplete, the bouquet is still intact, even if the arrangement has changed somewhat due to the missing LEGO brick. Thus, ChatGPT’s ontology seems not to embrace interpenetration.¹⁰

Does the ontology embrace endurantism or perdurantism? One important ontological distinction is between endurantism and perdurantism. According to endurantists, material objects are three-dimensional entities that persist through time and are wholly present at any time they exist. According to perdurantists, material objects are four-dimensional entities that occupy space-time. Hence, at any given moment, only a temporal part of the material object is present. While endurantists typically claim that there is a fundamental ontological difference between a material object and processes (e.g., John and John’s life), perdurantists typically deny that these are two different ontological categories. ChatGPT embraces endurantism because it distinguishes systematically and reliably between objects and occurrences as disjoint categories. There are some boundary cases, for example, ChatGPT provided conflicting answers to the question of whether glaciers are objects or occurrents.¹¹ But even in these cases, ChatGPT embraced only one option in a given session and did not claim that glaciers (and by extension other material entities) are both objects and occurrences at the same time. Further, ChatGPT also distinguishes between a person and the person’s life as two separate entities, where the former is an individual who experiences the latter.¹²

Is materialism adopted? TLOs may also be divided by the question of whether the world consists only of material things that exist in space and time or whether there are also abstract particulars, that is, a distinction between materialism and non-materialism (Partridge et al., 2020). ChatGPT describes the world not only based on material entities but also immaterial entities. Therefore, ChatGPT embraces non-materialism.

Are possibilia or possible worlds adopted? ChatGPT does not talk about possibilities in other possible worlds in its answers, but everything is based on information from the actual world. When ChatGPT is asked about possible worlds, it treats them as hypothetical entities and addresses their existence in philosophical discourse. Therefore, there are no possibilia, possible worlds, or possible situations in ChatGPT’s ontology. However, ChatGPT is committed to propensities and functions, which provide a kind of teleological vocabulary.

Is the ontology presentist or eternalist? Natural language is often tensed. For example, in “Yesterday, I was in Paris, today I am in Berlin, tomorrow I am in Warsaw,” we use different variants of “to be” and the words “yesterday,” “today,” and “tomorrow” to distinguish past, present, and future. Presentism assumes that these distinctions have ontological significance and that, strictly speaking, only present entities exist. In contrast, eternalists claim that past, present, and future entities exist.¹³

If ChatGPT is asked about dinosaurs, it answers that they existed in the past. Hence, ChatGPT uses tensed language to distinguish between past and present. However, since ChatGPT is trained to generate well-written English sentences, it follows linguistic conventions. Thus, it is quite difficult to determine whether the tensed language is just a stylistic choice or an ontological commitment to presentism. For example, if asked whether (the late) Kirk Douglas is the father of Michael Douglas, ChatGPT answers affirmatively. If asked, whether this entails that Kirk Douglas exists, it provides conflicting responses: first, it claims that this fact entails that Kirk Douglas exists, then it claims that he “no longer exists in the physical sense, but his legacy lives on through his work in film and the memories of those who knew him.”¹⁴ It seems to us that ChatGPT’s notion of “existence” is quite mercurial, and it switches back and forth between a presentist and an eternalist point of view.

Expressivity: Are indexicals “here” and “now” and are relations of arity larger than two supported? Since ChatGPT can communicate in English, it supports indexicals and sentences with predicate phrases that express relations between more than two entities (e.g., “Berlin is between Paris and Warsaw.”). However, the ontology we created (based on ChatGPT’s output) is written in OWL DL and, thus, supports neither.

5. Discussion

In the previous chapter, we presented ChatGPT’s top-level ontology, that is, a hierarchy of categories (represented in an OWL file) and ontological assumptions. However, as pointed out in Section 3, ChatGPT does not use an ontology (in the sense of a file), what we presented is an attempt by the authors to provide a systematic account of the ontological commitments that we identified in texts that were generated by ChatGPT. In this section, we discuss the similarities and differences to “normal” TLOs.

In some aspects ChatGPT’s top-level ontology is surprisingly traditional: it contains a hierarchy of categories, which are instantiated by instances. The distinction between living and non-living entities has been established since Aristotle’s Categories (Barnes, 1984). Many of the other categories are similar to the ones used in established ontologies. For example, similarly to BFO, the ontology contains objects that may participate in occurrences, that have properties (qualities in BFO), may play roles, and may have propensities (dispositions in BFO) or functions. ChatGPT embraces dualism since it is ontologically committed to physical entities and mental entities, like mental constructs, mental actions, or mental processes. By our comparison, we do not wish to suggest that ChatGPT’s top-level ontology is similar to BFO. Quite contrary, there are major differences with respect to the categories, their definitions, and their organization in the subsumption hierarchy. For example, ChatGPT’s primary distinction is between abstract and concrete entities, while BFO does not include a category for abstract entities. We only want to point out that somebody familiar with established ontologies is able to recognize familiar ontological distinctions by ChatGPT.

The major difference between ontologies and ChatGPT is that ontologies are usually carefully constructed in a way that they resolve ambiguities by distinguishing different concepts and providing clear definitions. This is particularly important in the case of polysemes, where humans rely on context to disambiguate (if necessary). For example, an ontology of cattle will introduce two different terms for “cow,” one in the sense of bovine and the other for female bovine. Similarly, ontologists will distinguish between hammer-as-object, hammer-as-process, and hammer-as-function. In contrast, LLMs generate texts based on a given prompt and a function that maps tokens to probability distributions of tokens. Hence, LLMs do not disambiguate words but rather learn to use them appropriately in a given context. This mercurial use of language is a great benefit for the task of generating natural language texts, but it is an obstacle for using LLMs for the task of creating ontologies.

Firstly, it leads to a kind of “ontological overload” For example, while an ontology typically would distinguish between a coin and the value it represents, ChatGPT treats the same entity as both concrete and abstract. In this example, ChatGPT states that fact explicitly,¹⁵ however, often it is more vague and claims that some entities have both physical and abstract characteristics. For example, the roundness of an apple is both a concrete feature of the apple and, at the same time, a generic characteristic (an abstract entity). This phenomenon is not limited to the distinction between concrete and abstract entities, the entities in ChatGPT’s ontology may be instances of different ontological categories that one expects to be disjoint. Depending on the context, they are treated to be one or the other.

Secondly, it leads to inconsistent responses. For example, in two separate conversations, ChatGPT may answer the same question in exactly the opposite way. For example, a shadow is classified as a concrete entity in Figure 3, because—according to ChatGPT—it is an observable and tangible aspect of the physical world. However, in Figure 4 exactly the same question is answered oppositely, because—according to ChatGPT—shadows, while observable, are not objects. One possible explanation for the different answers is that in Figure 3 “entity” is used by ChatGPT for the most generic category in the ontology, but in Figure 4 it seems to be used as a synonym for “object.” However, explanations like these are a kind of semantic pareidolia: we cannot help to read texts by generative AI as if they had a communicative intent. However, the different answers to the same question are a purely stochastic phenomenon.

Figure 3.

Classification of a Shadow as a Concrete Entity.

Figure 4.

Classification of a Shadow Distinct From a Concrete Entity.

By choosing the settings of GPT 3.5 appropriately, one can ensure that the same prompts will always produce the same output. However, this does not address the concern. Because while it ensures that the same prompt leads to the same output, even minute changes that do not change the meaning of the prompt may lead to inconsistent results. Logically inconsistent responses are not rare and are not limited to the distinction between concrete and abstract entities. For example, “litter” is sometimes classified by ChatGPT as an artifact¹⁶ and sometimes not¹⁷. Interestingly, the inconsistent responses sometimes seem to be inherited along the taxonomies. For example, in ChatGPT’s TLO, shadows are a kind of phenomenon. Phenomena are also sometimes classified as concrete and sometimes as abstract.

While in most cases the ChatGPT’s responses are consistent and do not change significantly if the prompts are rephrased, these examples illustrate that there is a significant number of cases where this is not the case. Thus, if one attempts to use LLMs for ontology development, it is important to cross-validate the classifications of the LLMs with the help of a variety of prompts to ensure that the results are robust. Further, there is likely a significant number of cases where the LLM will provide inconsistent results, and, thus, one needs to account for that possibility in the design of the workflow.

6. Conclusion

There is a significant research interest in the potential use of LLMs in ontology engineering. The development of such tools would be easier if the texts that are generated by LLMs share a top-level ontology, that is, they are committed to the existence of the same kind of entities and share the same ontological assumptions. Thus, the first question is whether there is such a top-level ontology, and, if there is, what it looks like and whether it is suitable for the purpose of ontology engineers. In this article, we investigated these questions as a case study for ChatGPT 3.5.

As we discussed in Section 3, ChatGPT does not use a top-level ontology (TLO) in the sense of an OWL file that may be downloaded. However, there is a stable core of ontological commitments and assumptions, which are repeatable across many interactions with the system. In this sense there ChatGPT uses a TLO. In Section 4 we presented the taxonomy of that ontology and some important ontological assumptions. (The taxonomy is also available as an OWL file.)

ChatGPT’s ontological hierarchy of categories contains some distinctions, which are familiar from the ontological literature. However, it differs significantly from popular existing top-level ontologies. This may be an obstacle to the use of LLM-based tools within ontology engineering projects, which reuse existing TLO (e.g., DOLCE or BFO). For example, any effort to use the GPT 3.5 model to extend a BFO-based ontology will run into the issue that GPT 3.5 uses a different ontological categorization than BFO. (The differences are so significant that no simple ontology alignment is possible.) Thus, without some effort to ensure compatibility with BFO (e.g., by choosing the prompts accordingly), the responses of the LLM will be incompatible with the existing BFO-based taxonomy of the ontology, and, therefore, the result of the ontology extension is going to be incoherent and, possibly, even logically inconsistent.

A different, and more challenging issue is the fact that LLMs do not learn to disambiguate terms, but rather to use terms appropriately in a given context. As we discussed, this results in a kind of “ontological overload,” where entities a classified as members of different, supposedly disjoint categories. For ChatGPT the same entity may be abstract and concrete, depending on the context. Further, minute changes to the prompt or even the same prompt may lead to contradictory results. Thus, any attempt to use LLMs for ontology development needs to involve some strategy to manage “ontological overload” and inconsistent responses, otherwise, the resulting ontology will be of poor quality. Any good ontology provides clear and unambiguous definitions, which enable the consistent usage of its terms. On their own, LLMs deliver neither.

Footnotes

ORCID iD

Fabian Neuhaus

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Anthropic (2023). http://www.anthropic.com/index/introducing-claude .

Arp

Smith

Spear

A. D.

(2015). Building Ontologies With Basic Formal Ontology. Cambridge, MA: MIT Press.

Babaei Giglou

D’Souza

Auer

(2023). Llms4ol: Large language models for ontology learning. In T. R. Payne, V. Presutti, G. Qi, M. Poveda-Villalón, G. Stoilos, L. Hollink, Z. Kaoudi, G. Cheng, & J. Li (Eds.), The Semantic Web—ISWC 2023 (pp. 408–427). Cham. Springer Nature Switzerland.

Barnes

(1984). The Complete Works of Aristotle, Volume One: The Revised Oxford Translation, Volume 1. Princeton, NJ: Princeton University Press.

Bateman

J. A.

Henschel

Rinaldi

(1995). The generalized upper model 2.0.

Bender

E. M.

Gebru

McMillan-Major

Shmitchell

(2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610–623).

Bricker

(2016). Ontological commitment. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy Metaphysics Research Lab, Stanford University, Winter 2016 edition.

Caufield

J. H.

Hegde

Emonet

Harris

N. L.

Joachimiak

M. P.

Matentzoglu

Kim

Moxon

S. A. T.

Reese

J. T.

Haendel

M. A.

Robinson

P. N.

Mungall

C. J.

(2023). Structured prompt interrogation and recursive extraction of semantics (spires): A method for populating knowledge bases using zero-shot learning.

Chen

Keloth

V. K.

Peng

Raja

Zhang

(2023). Large language models in biomedical natural language processing: Benchmarks, baselines, and recommendations. arXiv preprint arXiv:2305.16326.

10.

Ciatto

Agiollo

Magnini

Omicini

(2024). Large language models as oracles for instantiating ontologies with domain-specific knowledge. arXiv preprint arXiv:2404.04108.

11.

Gangemi

Guarino

Masolo

Oltramari

Schneider

(2002). Sweetening ontologies with dolce. In International conference on knowledge engineering and knowledge management (pp. 166–181). Springer.

12.

Guarino

(1998). Formal ontology and information systems. In N. Guarino (Ed.), Formal ontology in information systems: Proceedings of the first international conference (FOIS’98), June 6–8, Trento, Italy Vol. 46 (pp. 3–15). IOS Press.

13.

Guizzardi

Wagner

Almeida

J. P. A.

Guizzardi

R. S.

(2015). Towards ontological foundations for conceptual modeling: The unified foundational ontology (UFO) story. Applied Ontology, 10(3-4), 259–271.

14.

Herre

(2010). General formal ontology (GFO): A foundational ontology for conceptual modelling. In R. Poli, M. Healy, & A. Kameas (Eds.), Theory and Applications of Ontology: Computer Applications (pp. 297–345). Springer.

15.

Hertling

Paulheim

(2023). Olala: Ontology matching with large language models. In Proceedings of the 12th Knowledge Capture Conference 2023 K-CAP ’23, (pp. 131–139). New York, NY, USA. Association for Computing Machinery.

16.

Kommineni

V. K.

König-Ries

Samuel

(2024). From human experts to machines: An IIM supported approach to ontology and knowledge graph construction. arXiv preprint arXiv:2403.08345.

17.

Langer

Neuhaus

N’urnberger

(2024). Cear: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature. To be published.

18.

Lopes

Carbonera

Schmidt

Garcia

Rodrigues

Abel

(2023). Using terms and informal definitions to classify domain entities into top-level ontology concepts: An approach based on language models. Knowledge-Based Systems, 265, 110385.

19.

Mateiu

Groza

(2023). Ontology engineering with large language models. arXiv preprint arXiv:2307.16699.

20.

Neuhaus

(2023). Ontologies in the era of large language models—A perspective. Applied Ontology, 18(4), 399–407.

21.

Niles

Pease

(2001). Towards a standard upper ontology. In Proceedings of the international conference on formal ontology in information systems-volume 2001 (pp. 2–9).

22.

OpenAI (2024). https://platform.openai.com/docs/introduction .

23.

Pan

Luo

Wang

Chen

Wang

(2023). Unifying large language models and knowledge graphs: A roadmap. arXiv preprint arXiv:2306.08302.

24.

Partridge

Mitchell

Cook

Sullivan

West

(2020). A survey of top-level ontologies—To inform the ontological choices for a foundation data model. Technical report, Cambridge University Research Outputs.

25.

Pichai

(2023). https://blog.google/technology/ai/bard-google-ai-search-updates/ .

26.

Smith

(2004). Beyond concepts: Ontology as reality representation. In Proceedings of the third international conference on formal ontology in information systems (FOIS 2004) (pp. 73–84). IOS Press.

27.

Smith

Almeida

Bona

Brochhausen

Ceusters

Courtot

Dipert

Golfain

Grenon

Hastings

Hogan

Jacuzzo

Johansson

Mungall

Natale

Neuhaus

Overton

Petosa

Rovetto

Ruttenberg

Ressler

Rudniki

Seppälä

Schulz

Zheng

(2015). Basic formal ontology 2.0—Specification and user’s guide.

28.

Spataro

(2023). https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/ .

29.

Zhao

Vetter

Aryan

(2024). Using large language models for ontoclean-based ontology refinement.

The Mercurial Top-Level Ontology of Large Language Models

Abstract

Keywords

1. Introduction

3. Methodology

4.1. Hierarchy

5. Discussion

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

Notes

References