Sage Journals: Discover world-class research

Abstract

This paper explains conceptual modeling within the framework of Frame-Based Terminology (Faber, 2012; 2015; 2022), as applied to EcoLexicon (ecolexicon.ugr.es), a specialized knowledge base on the environment (León-Araúz, Reimerink &, Faber, 2019; Faber & León-Araúz, 2021). It describes how a frame-based terminological resource is currently being restructured and reengineered as an initial step towards its formalization and subsequent transformation into an ontology. It also explains how the information in EcoLexicon can be integrated in environmental ontologies such as ENVO (Buttigieg, Morrison, Smith, Mungall & Lewis, 2013; Buttigieg, Pafilis, Lewis, Schildhauer, Walls & Mungall, 2016), particularly at the bottom tiers of the Ontology Learning Layer Cake (Cimiano, 2006; Cimiano, Maedche, Staab & Volker, 2009). The assumption is that frames, as a conceptual modeling tool, and information extracted from corpora can be used to represent the conceptual structure of a specialized domain.

Keywords

Terminology ontology frame-based terminology conceptual modeling corpus analysis

1. Introduction

There is a clear need for explicit models of semantic information (e.g., terminologies) to facilitate information exchange. One approach is through ontologies, regarded as shared models of a specialized domain that encode a view common to a set of users. The close link between terminologies and ontologies has been widely acknowledged (Gillam, Tariq & Ahmad, 2005; L’Homme & Bernier-Colborne, 2012; Roche, 2012; Montiel-Ponsoda, 2022). Terminologies and ontologies are similar because both entail the conceptualization of a specialized subject field. The difference between them lies in the fact that terminologies are generally developed for knowledge acquisition by human users, whereas ontologies are built for knowledge sharing between human and artificial agents.

The standard definition of ontology is ‘a formal, explicit specification of a shared conceptualization’ (Studer, Benjamins & Fensel, 1998, based on Gruber, 1995). This specification takes the form of the definitions of representational vocabulary (classes, relations, etc.), which provide meanings for the vocabulary as well as the constraints on its use. As such, an ontology is a formal representation of domain knowledge in which concepts are hierarchically organized. It is more formal than a terminology since it possesses a set of axioms and logic-based rules to structure, organize, and verify the consistency of the hierarchy through reasoning and inference.

In contrast, a terminology is the result of terminology work, which is defined as the “systematic collection, description, processing and presentation of concepts and their designations” (ISO 1087: 2019, 3.5.1) in a certain subject field. This collection of terms can take various forms. It can be a flat alphabetical list or a set of term records with data fields. However, it can also be a more conceptually oriented resource (i.e., terminological knowledge bases) in which the concepts are structured, based on their meaning, and organized in a set of labelled domains and subdomains. Despite its lack of ontological formality, a terminology can be the first step towards creating a shared conceptualization of a specialized domain.

Terminologies contribute to ontology building because they are often used as a starting point for formalization. In fact, most specialized fields possess terminology for naming, classifying, and standardizing the concepts in them (Sowa, 2000). According to Prieto-Díaz (2003), a taxonomy of terms highlights the key concepts in a field, which can then be defined and related to create an ontology. This may involve the reuse and reengineering of non-ontological resources whose semantics have not yet been formalized in an ontology (Suárez-Figueroa, 2010).

The reuse of resources can also be based on the extraction of information from a corpus of domain-specific texts and terminographic resources as well as expert validation rather than elicitation. Conceptual representations can thus be extracted from natural language texts by identifying lexico-syntactic structures (e.g., knowledge-rich contexts and knowledge patterns) that can be directly mapped or translated into ontology structures (Cimiano & Wenderoth, 2005; 2007; Montiel-Ponsoda & Aguado de Cea, 2014; Montiel-Ponsoda, 2022).

The reengineering of terminological resources may also involve transforming the resource schema into an ontology schema and converting the content into ontology instances. To achieve this, the implicit semantics of the original resource should be made explicit in the target ontology, which generally means that hyperonymy-hyponymy or meronymy relations are translated into subclass-of and part-of relations, respectively (Montiel-Ponsoda, 2022). However, this is easier to accomplish if the terminological resource is more knowledge-oriented and has a more conceptual design to facilitate its reuse and potential conversion into an ontology. This involves a process of conceptual modeling, which has always been one of the major objectives in terminology management (Budin, 1994; Meyer & Mackintosh, 1996; Meyer, Mackintosh, Barrière & Morgan, 1997; Faber, León-Araúz, Prieto-Velasco & Reimerink, 2007, Faber & L’Homme, 2022, inter alia).

1.1. Conceptual modeling in terminology

Conceptual modeling is used both in Terminology and Ontology building as well as in many other areas related to knowledge representation and data modeling. Although each discipline approaches it with varying degrees of formalization, it can be broadly defined as the systematic process of abstracting a model from the real world entities that it seeks to represent. In multilingual terminology management, concept systems are a common modeling device because they allow the aggregation of multilingual equivalents of terms and monolingual terms that are synonyms into a common concept object (Shreve, 1995). This is an acknowledgement of the concept-oriented approach of knowledge resources. Concept systems in Terminology can also be useful to accomplish the following: (1) model concepts and their relations within a subject field; (2) clarify the relations between concepts; (3) lay the foundations of a uniform and standardized terminology; (4) facilitate the design of definitions; and (5) accommodate all relevant concepts in a terminographic resource (based on the ISO standard 704: 2009).

Given the importance of concept systems in specialized language, it is thus surprising that there are relatively few terminology resources that make conceptual structures explicit (Roche, Costa & Carvalho, 2019). An effective knowledge base should not only describe concepts individually but should also specify how each concept is related to others so that users can make inferences regarding characteristics and affordances. This is not only a question of representing single concept and term entries, but also of integrating them into larger knowledge structures or frames to map the relations that they hold with others and thus capture their different conceptual dimensions.

Even though terminological resources store a great deal of conceptual information, quite often, their design does not allow conceptual relations to be easily perceived and extracted. Before a terminological database can begin to approximate an ontology, the knowledge that it contains must be made explicit. For that reason, a resource can be more easily used or even ‘reengineered’ by ontology builders when it has a conceptual design. In that case, it can provide data at the basic levels of an ontology that are directly relevant to ontology building.

1.2. Ontology building and terminology

According to Cimiano (2006) and Cimiano, Maedche, Staab and Volker (2009), ontology building follows a model known as the Ontology Learning Layer Cake (OLC). Even though the OLC has been harshly criticized (Browarnik & Maimon, 2015), it is still widely used, since it provides a coherent and structured way to understand and organize the different steps involved in the process of building ontologies. Moreover, it is seamlessly aligned with Terminology, as all of its layers correspond with the different steps involved in terminology management.

As shown in Fig. 1, from the bottom to the top layer, the tiers of the layer cake involve the extraction or specification of a certain type of information for each ontology component: (i) terms; (ii) synonyms in the same languages and equivalents in different languages; (iii) concepts; (iv) concept hierarchy; (v) concept relations; (vi) axiom and rules.

Fig. 1.

Cimiano’s (2006) ontology learning layer cake.

Ontology development is largely based on the definitions of concepts and the relations between concepts (Wróblewska, Podsiadły-Marczykowska, Bembenik, Protaziuk, & Rybiński, 2012). Terms (and their synonyms) are the linguistic designation of concepts. Concepts differ from terms in that they are ontological entities and thus abstractions of human thought. However, the structure of definitions as well as the different explicit relational structures found in corpus texts can provide valuable information that indicates what terms mean and how concepts relate to each other. This is useful because an important part of ontology building is establishing a concept hierarchy based on vertical relations (i.e., is-a relation or part-of relation) and horizontal ones (e.g., causes, result-of, located-at, takes-place-at, etc.). Much of this information is reflected in language.

The reengineering of a well-modeled terminological knowledge base can thus be a valuable shortcut that greatly facilitates ontology building since terms and their synonyms are already ascribed to a conceptually organized structure. The following sections discuss Frame-based Terminology, and how EcoLexicon, its practical application, is gradually being reengineered with a view to transforming it into an ontology.

The rest of this paper is structured as follows. Section 2 discusses the basic premises of Frame-based Terminology, and its relation to Frame Semantics; Section 3 describes EcoLexicon, the practical application of FBT and explains the process of conceptual modeling currently being applied to approximate EcoLexicon to an ontology, such as ENVO. Section 4 presents the conclusions and outlines future challenges.

2. Frame-based terminology

Frame-Based Terminology (FBT) is a cognitive approach to Terminology that focuses on the contextualized representation of specialized concepts, and which can contribute to ontology building by supplying the information needed for the Ontology Learning Layer Cake. As its name implies, Frame-based Terminology uses premises of Frame Semantics (Fillmore, 1976; 1982; 2006), and situated cognition (Barsalou, 2003; 2008; 2009) to structure specialized domains and create non-language-specific representations (Faber, 2022). The assumption is that language reflects thought, and that the non-language-specific frames for specialized concepts can be extracted from texts.

A frame is an organized package of knowledge that humans retrieve from long-term memory to make sense of the world. In fact, framing experience involves applying stored knowledge derived from similar contexts and situations with a view to understanding complex events and how to deal with them. It is evident that concepts do not exist in a vacuum and are more meaningful when they are related to each other and integrated into progressively more complex knowledge configurations.

Frames are thus crucial to both general and specialized knowledge though the emphasis here is on the terms that designate specialized knowledge concepts. In Environmental Science and in science in general, there is a great need for the consistent descriptions of entities, processes, and features. As highlighted by Löbner (2015, p. 37), frames and functional concepts play a central role in scientific thinking since sciences deal with classes of objects, such as physical objects, living organisms, chemical substances, etc. Frames also underlie the conception of scientific classifications (i.e., taxonomies), as well as of types of processes such as chemical reactions.

The assumption that frames are present in texts is the foundational premise of Frame Semantics (FS), based on Fillmore’s (1976; 1982; 2006) Case Grammar. One of its premises is that all concepts are part of a larger structure (semantic frame) and are related in such a way that the activation of one word evokes the entire frame. FS explains how meanings are structured and associated with words in a semantic structure and how these provide access to our conceptual system, which is the inventory of structured knowledge that we use to navigate the world (Evans & Green, 2006). Frame Semantics is also used to refer to a wide variety of approaches to the systematic description of natural language meanings since it relates linguistic utterances to world knowledge, such as event types and their participants.

The practical application of FS is the FrameNet database for general language (Ruppenhofer, Ellsworth, Petruck, Johnson & Scheffczyk, 2010; Baker, Fillmore & Cronin, 2003), in which a frame is regarded as a conceptual structure that describes a situation, object, or event along with its participants. The goal is to encode scenarios and show how they can be described linguistically with the lexical units (LUs) that evoke them and the grammatical structures that provide details about the participants.

In Terminology and specialized language, Frame Semantics, and FrameNet have had a significant impact. For example, BioFrameNet (Dolbey, Ellsworth & Scheffczyk, 2006) extended the FrameNet lexical database to the domain of molecular biology and examined the syntactic and semantic combinatorial possibilities of the lexical items used in this field to better understand the grammatical properties of specialized language. Schmidt (2009) applied Frame Semantics to multilingual terms in soccer. In the field of Medicine, Verdaguer (2020) used Frame Semantics to analyze the Health Science Corpus (SciE-Lex lexical database) and highlight the common syntactic and semantic features of biomedical terms, motivate their combinatorial patterns, and establish frame-based semantic networks.

Representations, such as those in FrameNet, are also the basis of DiCoEnviro (L’Homme, 2018), an online environmental resource with terms in various languages (e.g., English, French, Spanish, Portuguese, etc.). DiCoEnviro describes terms as lexical units rather than as labels for concepts. Relations between terms are manually encoded by terminologists using lexical functions, LFs (Mel’čuk, Clas & Polguère 1995; Polguère, 2012). DiCoEnviro entries provide information about the linguistic properties of each environmental term along with its argument structure and contextual annotations. These entries are then connected to a knowledge resource called Framed DiCoEnviro (L’Homme, Robichaud & Prévil 2018; L’Homme, Robichaud & Subirats-Rüggeberg, 2020), where they are linked to the frames evoked. DiCoEnviro frames either come directly from FrameNet (for English) or are intuitively created.

EcoLexicon (ecolexicon.ugr.es) is also a terminological knowledge base about environmental science (Faber et al., 2016; León-Araúz et al., 2019; Faber & León-Araúz, 2021). It differs from the previously mentioned resources because of its conceptual design, which is derived from information semi-automatically extracted from specialized texts as well as from the structure of terminological definitions (Faber & León-Araúz, 2021). It thus can provide the information for various tiers of the Ontology Learning Layer Cake (see Fig. 1). As the practical application of Frame-based Terminology, its conceptual modeling and structure are discussed in Section 3.

3. EcoLexicon

The EcoLexicon knowledge base (ecolexicon.ugr.es) was created in 2003 by the LexiCon research group as part of the PuertoTerm research project. It was (and is) a cooperative endeavor between the LexiCon Research group and the Andalusian Inter-University Institute for Earth System Research (IISTA-CEAMA). The resource was originally based on a core list of 794 environmental terms in Spanish and English, collated by the scientists and engineers in the IISTA-CEAMA.

As part of the project, definitions were elaborated for each term, which reflected the level of generality or specificity of the concept as well as its relations with other concepts within the same knowledge domain. Definitions were constructed so that all concepts in the same category followed the same pattern.

In parallel, a corpus of English and Spanish environmental texts was also compiled. Subsequent corpus analysis enriched the initial inventory of terms and detected other related concepts. Thanks to a series of funded research projects (e.g., MarcoCosta, PuertoTerm, ReCord, ConTent, TOTEM, etc.) concepts were gradually organized in semantic categories and structured in concept systems. The original list of terms was enriched by the addition of more terms as well as by its transformation into a conceptual network.

Over the last 20 years, EcoLexicon has grown exponentially and now contains 4654 environmental concepts and 24968 terms in eight languages (English, Spanish, German, French, Dutch, Modern Greek, Russian, and Arabic). Thanks to ThinkMap technology for data visualization, the general view of EcoLexicon is the gateway to a rich inventory of information for each concept, namely its definition, relations to other concepts, graphical representations, and correspondences in other languages. (Faber et al., 2016; Faber & León-Araúz, 2021). Figure 2 shows the main view of the EcoLexicon entry for earthquake.

Fig. 2.

EcoLexicon main view: entry for earthquake.

When users click on any other concepts in the semantic network, the concept system rearranges itself (i.e., some concepts disappear and some others emerge, based on the new search concept). By right-clicking on a concept in the network, the user can access a contextual menu. This menu can be used to perform any of the following actions: (1) centering the concept; (2) fixing a node by dragging it to a certain position; (3) visualizing details of the concept (definition, associated terms, resources, etc.) by selection on the sidebar; (4) generating a URL for direct access to the concept selected; (5) searching Google Images, Google, and Wolfram Alpha; (6) removing a concept and its related concepts from the network; (7) expanding a node to include other hierarchical levels. Any of these actions enhances concept representation by providing a large quantity of conceptual information, depending on the specific needs of the user. Users can also establish the depth of the concept system, namely, its maximum hierarchical level.

However, the information in this view did not automatically generate itself all at once. Instead, it is the result of a long process of conceptual modeling that has gradually evolved over the years.

3.1. Conceptual modeling in EcoLexicon

As previously mentioned, the emphasis on concept systems in Terminology stems from the premise that specialized knowledge acquisition is enhanced when concepts are organized so that the relations between them are made explicit (Budin, 1994; Meyer, Eck & Skuce, 1997). This facilitates the activation of associative information in semantic memory, thus promoting context availability. The basic premise is that new knowledge is more meaningful when it is related to previous knowledge. Consequently, for concepts to become a part of one’s knowledge and be retained in long-term semantic memory, they must be embedded within a knowledge structure (Faber, 2011; 2012; León-Araúz & Faber, 2012).

Initially, EcoLexicon was structured in the form of the Environmental Event. This frame had the advantage of being process-based. The categories were the roles that the entities had within the general event (see Fig. 3).

Fig. 3.

Environmental event.

However, this more informal frame-like structure of EcoLexicon, which represents environmental actions and processes, needed to be complemented with a more formal top-down organization of semantic classes in order to determine degrees of specificity and conceptual similarity (Hahn & Chater, 1997). In 2017, certain premises of ontology building were thus adopted. Accordingly, the concepts in EcoLexicon were classified in 152 semantic categories distributed in five categorization levels (Gil-Berrozpe & Faber, 2017; Gil-Berrozpe, León-Araúz & Faber, 2017; 2018; 2019). The inventory of semantic classes was extracted from concept definitions and corpus information.

The most general level of the hierarchy is composed of the three starter ontological categories (Mahesh & Nirenburg, 1995; Moreno-Ortiz & Pérez-Hernández, 2000) (Table 1).

Table 1

Starter ontological categories

A: attribute – properties of entities and processes

E: entity – physical and mental objects

P: process – events extending over time and involving different participants

The specification of these categories is ongoing as more concepts are added to EcoLexicon. Concepts with a multidimensional nature are classified in as many categorization hierarchies as necessary. Attributes were divided into 16 categories, entities into 93, and processes into 43.

3.1.1. Modeling earthquake

The inclusion of these conceptual categories in EcoLexicon allows users to browse concepts based on their conceptual category, but it also allows terminologists to conceive definitional templates and establishes the groundwork for establishing ontological classes in concept hierarchies. Users can query concepts as to their category by clicking on Conceptual Categories at the bottom left of the screen.

For example, earthquake is categorized within the starter category of process, whose subclasses include action, activity, addition, change, cycle, elimination, emission, formation, loss, method, movement, phase, and phenomenon. These subclasses also have more specific divisions. As one of the subclasses of process, movement has the structure shown in Table 2.

Table 2
Movement: categories

▼ P-11: Movement

○ P-11.1: Earth/ground movement

○ P-11.2: Energy movement

○ P-11.3: Fluid movement

■ P-11.3.1: Water movement

○ P-11.4: Transport

○ P-11.5: Wave movement

○ P-11.6: Wind movement

Earthquake is thus categorized as a type of movement, more specifically, as a type of Earth/ground Movement. As reflected in Fig. 4, users can query EcoLexicon to see not only where earthquake is situated in the Category Hierarchy, but also to view the various types of earthquake (e.g., megathrust earthquake, thrust fault earthquake, supershear earthquake, etc.) that are also category members though at more specific levels.

Fig. 4.

Conceptual categories in EcoLexicon: Earth/ground movement.

As reflected in the conceptual network, the concepts in EcoLexicon are linked by an inventory of conceptual relations, which thus far consist of the following:

Vertical relations: type-of, part-of, made-of, phase-of, delimited-by, located-at

Horizontal relations: attribute-of, result-of, affects, causes, takes-place-at, has-function, measured, represents, studies.

The corresponding inverse relation is reflected in the directionality of the arrows that link each concept to another. For example, a selection of the semantic relations codified for earthquake are listed in Table 3.

Table 3

Selection of conceptual relations for earthquake

Concept 1	Semantic relation	Concept 2
earthquake	type-of	movement
epicenter	part_of	earthquake
hypocenter	part_of	earthquake
earthquake	located_at	earth’s crust
earthquake	takes_place_at	fault plane
earthquake	causes	seismic wave
stress	causes	earthquake
earthquake	affects	construction
seiche	result_of	earthquake

Within the context of earthquake, these relations are as follows:

type_of: this generic-specific relation reflects hierarchical inheritance in concept systems. All entities and events can be categorized as instances of a particular class and hierarchical chains can be built accordingly. For example, strike-slip earthquaketype_ofinterplate earthquaketype_oftectonic earthquaketype_ofearthquaketype_ofmovementtype_ofprocess.

part_of: this relation also reflects the hierarchical structure of the domain. In the case of physical entities, this relation directly refers to parts of each concept, whether concrete or abstract (epicenterpart_ofearthquake).

located_at: this relation is relevant when the location of a physical entity is an essential characteristic for its description. For example, an earthquake is located_at the earth’s crust.

takes_place_at: this relation describes processes which have spatial and temporal dimensions. For example, earthquaketakes_place_atfault plane.

causes: this relation only links entities and events, for example, stresscausesearthquake. Even though this relation initially seems to be the inverse of result_of, there is a difference stemming from the active role played by certain entities. Causes only describes the beginning of a process, whereas result_of may link events or entities that are the consequence of another event.

affects: this relation, along with causes and result_of is a crucial relation in dynamic systems since environmental concepts have a high combinatorial potential. Affects relates a wide variety of concepts to their ever-changing environments. It links processes or entities that cause a change in any other entity or event without producing a final result (e.g., earthquakeaffectsconstruction).

result_of: this relation is relevant to either events or entities that are derived from other processes or events. For example, seiche is the result_ofearthquake.

To represent more specific types of earthquake, the following artificial ‘umbrella’ concepts are also used: (i) earthquake_time_sequence; (ii) earthquake_depth; and (iii) earthquake_origin. Umbrella concepts are introduced at intermediate levels of a hierarchy to further specify the sense of the type_of relation and narrow the link that connects parent concepts to child concepts (Gil-Berrozpe & Faber 2017). This is one of the ways in which multidimensionality is represented in EcoLexicon (León-Araúz & Faber, 2013). Umbrella concepts thus serve the purpose of differentiating sibling concepts that emerge as the result of different classification criteria, also known as conceptual dimensions. For instance, although foreshock earthquake, tectonic earthquake and non-tectonic earthquake are all types of earthquake, not all of them share the same degree of “siblingness”, since the first is the result of a time dimension while the latter two are linked to a causal perspective. This distinction is useful for reasoning purposes, as the siblings resulting from the same dimension are mutually exclusive (e.g. a tectonic earthquake can be at the same time a foreshock earthquake, but not a non-tectonic earthquake).

As shown in Fig. 5, more specific types of earthquake are categorized, based on the following conceptual dimensions: (i) their place in the time sequence of the earthquake event (foreshock, main shock, aftershock); (ii) depth of the earthquake (shallow-focus earthquake, intermediate-focus earthquake, deep-focus earthquake); (iii) origin of the earthquake (tectonic earthquake, non-tectonic earthquake).

Specific types of tectonic earthquake (an origin-based earthquake) are interplate earthquake, intraplate earthquake, and thrust fault earthquake. Earthquake propagation speed is another umbrella concept to encompass types of tectonic earthquake based on how quickly the seismic waves radiate.

Within the category of earthquake_origin and in direct contrast to tectonic earthquake, there is non-tectonic earthquake, which encompasses all earthquakes that are not directly related to tectonic plate movement. Types of non-tectonic earthquake include volcanic earthquake and induced earthquake, which are caused by landslide, collapse, or explosion. Figure 5 shows the earthquake hierarchy in EcoLexicon.

Fig. 5.

Earthquake hierarchy in EcoLexicon.

The various types of earthquake were extracted from the EcoLexicon corpus as well as from other specialized resources such as termbases, dictionaries, and glossaries. In this type of conceptual modeling, definitions are of paramount importance combined with corpus analysis.

3.1.2. Defining earthquake

Definitions are crafted in conjunction with concept systems in Terminology since concept systems lay the groundwork for an internally coherent system and avoid inconsistencies. At the micro-semantic level, a definition is the linguistic description of the characteristics of a concept. According to Antia (2000, pp. 113–115), a definition fixes a concept, describes it, and also links it to others. As one of the most important components of any high-quality terminological resource, definitions are thus a privileged medium for knowledge representation as they are a direct natural language explanation of a concept. In this sense, definitions and their format provide the frame for the other types of information.

Definitions also have a central role in the use of ontologies (Seppälä 2015). Ontological definitions are singular noun phrases which are content words that form part of a domain-specific vocabulary used by a group of experts to communicate about entities to which the terms refer (Seppälä, Ruttenberg & Smith, 2017, p. 75). In ontologies, according to Seppälä, Ruttenberg, Schreiber and Smith (2016), a good definition delimits the intended meaning of an ontology term by describing the instances of the type to which the term refers. It states that the Xs are of the type Y and are distinguished from other instances of this type by some collection Z of one or more characteristic marks. As is well-known, this is also an Aristotelian definition, which is the typical format of most terminological definitions, composed of a generic or superordinate term and differentiating characteristics (Eck and Meyer, 1995, pp. 83–87; Sager, 1990, p. 42).

Definitions in EcoLexicon are also based on templates that make category membership explicit. These definitions reflect a concept’s relations with other concepts and specify essential characteristics (León-Araúz & Faber, 2012, pp. 153–154). The majority of EcoLexicon definitions conform to the guidelines for definitions specified in Seppälä et al. (2017), and those that do not are currently under revision. Evidently, it is necessary for definitions to have a uniform structure that directly refers to the underlying conceptual structure of the domain, as represented in the concept systems. For example, the EcoLexicon definition of earthquake is the following:

earthquake: geological phenomenon involving a sudden oscillatory movement at the surface of the Earth, generally caused by the release of stress along a fault plane, which produces seismic waves that radiate from the point of initial rupture. These waves shake the ground, affect constructions, and can produce damage. Other causes of earthquakes include volcanic activity, atomic explosion. landslide, or collapse

The conceptual relations (in brackets) and the structures pointing to them (in bold) are the following:

geological phenomenon involving a sudden oscillatory movement [type_of]

at the surface of the Earth [located_at],

generally caused by the release of stress [caused_by]

along a fault plane [takes_place_at],

which produce seismic waves that radiate from the point of initial rupture [causes].

These waves shake the ground, affect constructions [affects].,

and can produce damage [causes].

Other causes of earthquakes include volcanic activity, atomic explosion. landslide, or collapse [caused_by].

As previously mentioned, Earthquake is a multidimensional concept, which means that it has various (often interlapping) classes, depending on the feature focused on. This is evident in earthquake terminology and a challenge if the objective is to integrate this conceptual richness into a single hierarchy. As reflected in geological terminology and corpora, experts conceptualize earthquakes in relation to their depth of occurrence (earthquake_depth), time of occurrence (earthquake_time_sequence), and origin of movement (earthquake_origin). Each of these classification focuses on a different conceptual relation.

For example, earthquake subtypes based on origin of movement, namely, where the stress is released, focus on the conceptual relation takes_place_at. This is the case of tectonic earthquake (i.e., along fault or plate boundaries or in tectonic plates), and its subtypes: interplate earthquake, intraplate earthquake, and thrust fault earthquake (see Fig. 6).

Fig. 6.

Tectonic earthquake hierarchy in EcoLexicon based on origin of movement.

This hierarchy in Fig. 6 is reflected in the structure of the definitions of the concepts since, each subordinate concept is defined in terms of the more generic one, immediately preceding it, as shown in Fig. 7.

Fig. 7.

Tectonic earthquake hierarchy in EcoLexicon.

The same thing occurs in the hierarchy of earthquake_depth, whose concepts are the following: (i) shallow-focus earthquake (depths of less than 70 kilometers beneath the Earth’s surface); (ii) intermediate focus earthquake (depths of 70–300 kilometers beneath the Earth’s surface); (iii) deep-focus earthquake (depths exceeding 300 kilometers beneath the Earth’s surface). Figure 8 shows the EcoLexicon hierarchy for this dimension.

Fig. 8.

EcoLexicon hierarchy for earthquake_depth.

The hierarchy in Fig. 8 was extracted from the following interrelated set of definitions:

Depth of movement.

shallow focus earthquake

earthquake that occurs at depths of less than 70 kilometers beneath the Earth’s surface.

intermediate focus earthquake

earthquake that occurs at depths ranging from 70 to 300 kilometers beneath the Earth’s surface. Less frequent than shallow focus earthquakes but more common than deep focus earthquakes, intermediate focus earthquakes typically occur in subduction zones, where one tectonic plate is forced beneath another.

deep focus earthquake

earthquake that occurs at depths of more than 300 kilometers and are primarily associated with subduction zones, where the subducting plate sinks deep into the Earth’s mantle.

As shown in Fig. 9, there are also terms that focus on earthquakes, depending on their time of occurrence in the earthquake event: (i) foreshock (earthquake that occurs before a stronger one); (ii) mainshock (earthquake sometimes preceded by one or more foreshocks and always followed by many aftershocks); (iii) aftershock (earthquake that follows a stronger one).

Fig. 9.

EcoLexicon hierarchy for earthquake_time_sequence.

The hierarchy in Fig. 9 was extracted from the following set of interrelated definitions:

Time of movement.

foreshock earthquake

less powerful earthquake that occurs before a stronger one and that usually originates in the same place as the main earthquake that it precedes.

mainshock earthquake

strongest earthquake in a sequence, sometimes preceded by one or more foreshocks, and almost always followed by many aftershocks.

aftershock earthquake

less powerful earthquake that follows a stronger one and that usually originates in the same place as the main earthquake that it follows.

Since the terms for earthquake do not all appear in any single knowledge resource, it was crucial for us to have a large, well-balanced corpus of texts related to the specialized knowledge domain (León-Araúz, San Martín & Reimerink, 2018). The EcoLexicon (private) corpus currently has a total of 104,964,907 words in English. Of this total, 12,085,810 were words from texts belonging to the domain of Geology, which would presumably focus on, discuss, or describe types of earthquake and earthquake activity. The data obtained from corpus queries were used to structure categories, create concept frames, and characterize general processes and actions. When frames are specified as an action or process with participants, this provides a predicative frame linking conceptual categories.

3.1.3. Corpus analysis and information extraction

In FBT, semantic relations are extracted from corpus texts in different languages through the use of knowledge patterns (KPs), which are the lexico-syntactic structures that encode semantic relations in natural language. Pattern-based approaches assume that there are recurrent and predictable linguistic cues that indicate specific types of information, and which can be used to locate and extract text excerpts and the knowledge they convey (Meyer, Mackintosh, Barrière & Morgan, 1997, p. 257; Marshman, 2022, p. 292). Text excerpts that qualify as knowledge-rich contexts (KRCs) indicate at least one item of domain knowledge that could be useful for conceptual analysis (Meyer 2001, p. 281), which means that they contain at least one KP making the relation between two concepts explicit.

Table 4 shows a brief selection of the KPs used to link concepts in EcoLexicon.

Table 4
Examples of semantic relations and knowledge patterns

Semantic relation Knowledge pattern

type_of such as, rang* from, includ, is a, type of, and other

part_of* includ, consist of, formed by/of

made_of consist* of, built of/from, constructed of, formed by/of/from

located_at form* in/at/on, found in/at/on„located in/at/on

result_of leading to, derived from, formed when/by/from

has_function designed for/to, built to/for, purpose is to, used to/for

takes_place_at tak* place in/at/on

affects affect*

causes cause, caused by, trigger, produce*

Semantic relation	Knowledge pattern
type_of	such as, rang* from, includ*, is a, type of, and other
part_of	includ, consist of, formed by/of
made_of	consist* of, built of/from, constructed of, formed by/of/from
located_at	form* in/at/on, found in/at/on„located in/at/on
result_of	leading to, derived from, formed when/by/from
has_function	designed for/to, built to/for, purpose is to, used to/for
takes_place_at	tak* place in/at/on
affects	affect*
causes	cause, caused by, trigger, produce*

Fig. 10.

Concordances for tectonic earthquake.

These KPs are used to query the corpus and thus detect terms that are related to each other. In this case, important data regarding earthquakes is what causes them because that is information that can be used to divide earthquakes into different classes. As shown in Fig. 10, the concordances provide valuable clues to propositions that can be included in the semantic network and/or definition of tectonic earthquake for the cause relation.

Based on the information in these concordances regarding the cause relation, it is possible to extract the propositions shown in Table 5.

Although this is an effective way of extracting semantically related pairs of concepts from corpora, it can be extremely time-consuming. To speed this process up and make it semi-automatic, we developed the EcoLexicon Semantic Sketch Grammar (ESSG) (León-Araúz & San Martín, 2018; León-Araúz, San Martín & Faber, 2016) in order to retrieve high-density knowledge rich contexts (León-Araúz & Reimerink, 2019). Based on the combination of KPs and part-of-speech tags in the form of regular expressions, the ESSG was created to extract concept pairs related by the following conceptual relations: generic-specific, part-whole, location, cause, and function. We were thus able to apply it to our corpus with the corpus querying tool Sketch Engine (Kilgarriff, Rychlý, Smrz & Tugwell, 2004; Kilgarriff, Baisa, Bušta, Jakubíček, Kovář, Michelfeit, Rychlý & Suchomel, 2014).

The ESSG is made up of more than 200 sketch rules. These rules, together with the statistical features of the word sketch functionality, allow us to extract concept pairs while controlling the process, which still relies on manual verification. For example, Table 6 shows one of the rules used to retrieve hyponymic structures such as the following:

Stony-iron meteoritesare classified intopallasites and mesosiderites.

Modern reefsare classified into several geomorphic types: atoll, barrier, fringing, and patch.

Table 5

Propositions for the cause relation

Concordance no.	Concept 1	Concept relation	Concept 2
1	rock movement	causes	tectonic earthquake
2	tectonic earthquake	causes	tsunami
4	energy release	causes	tectonic earthquake
5	tectonic earthquake	causes	landslide
6	plate shift	causes	tectonic earthquake
8	plate friction	causes	tectonic earthquake
9	tectonic earthquake	causes	liquefaction
10	tectonic earthquake	causes	damage, tsunami

Table 6

Example of a rule used to retrieve hyponymic structures

1:”N.” [word=”,\|∖(”]? [tag=”IN/that\|WDT”]? ”MD” [lemma=”be\|,\|∖(”] ”RB.” [word=”classified\|categori.ed”] ([word=”by”] [tag!=”V.”]+)? [word=”in\|into”] [tag!=”V.”]* [lemma=”type\|kind\| example\|group\|class\| sort\|category\|family\|species\|subtype\| subfamily\|subgroup\| subclass\|subcategory\|subspecies”]? [tag!=”V.”] 2:[tag=”N.*” & lemma!=”type\|kind\|example\| group\|class\| sort\|category\|family\|species\|subtype\|subfamily\|subgroup\| subclass\|subcategory\|subspecies”]
1:”N.*”	The hypernym is a noun.
[word=”,\|∖(”]?	An optional comma or bracket.
[tag=”IN/that\|WDT”]?	Optionally “that” or “which”.
”MD”*	Any modal verb from zero to infinite times.
[lemma=”be\|,\|∖(”]	Lemma “be” or a comma or a bracket.
”RB.”	Any adverb from zero to infinite times.
[word=”classified\|categori.ed”]	Classified, categorized, or categorized.
([word=”by”] [tag!=”V.*”]+)?	Optionally, “by” followed by anything from one to infinite times that does not contain a verb.
[word=”in\|into”]	In or into.
[tag!=”V.”]	Anything from zero to infinite times that does not contain a verb.
[lemma=”type\|kind\| example\|group\|class\|sort\|category\|family\|species\|subtype\| subfamily\|subgroup\|subclass\|subcategory\|subspecies”]?	Optionally any of the lemmas “type”, “kind”, “example”, “group”, “class”, “sort”, “family”, etc.
[tag!=”V.”]	Anything from zero to infinite times that does not contain a verb.
2:[tag=”N.*” & lemma!=”type\|kind\|example\|group\|class\|sort\|category\|family\|species\|subtype\|subfamily\|subgroup\|subclass\|subcategory\|subspecies”]	The hyponym is any noun other than “type”, “kind”, “example”, “group”, “class”, “sort”, “family”, etc.

As can be observed in the examples and the items of the rule, recursivity and optional elements allow the detection of varying forms of the same KP. When all rules in the grammar are applied, different concept pairs are extracted in the form of a word sketch, from which different KRCs with different KPs (is a common, such as, or other, and other, including, typically) can be accessed (Fig. 11).

Fig. 11.

Word sketch of mineral and KRCs.

It goes without saying that the information retrieved with the ESSG still requires manual verification to eliminate false positives, such as the one shown in the word sketch of mineral (e.g. mineral is the generic of mineral because of multi-word terms). When ESSG was evaluated (León-Araúz & San Martín, 2018), causes of false positives were POS-tagger mistakes, polysemous keywords, polysemous KPs, concepts expressed as long periphrastic clauses, general language words instead of specialized terms, etc. Although some of these causes are currently being addressed in a new version of the ESSG, others are inherent limitations of rule-based knowledge extraction. Nevertheless, this is still a useful tool for terminology work, since it accelerates the process and, in any case, all concept pairs must be verified, not only because of false positives, but also because natural language does not codify neat taxonomies. Terms are very often found related to others at different degrees of granularity. For instance, dolphins, ruminants and mammals do not refer to sibling concepts despite the fact that they are children of animal in the corpus (Fig. 12). It is thus the terminologist who needs to situate concepts at the right level in the hierarchy.

Fig. 12.

Hyponymic concordances of animal.

3.1.4. Re-engineering EcoLexicon

As shown in this paper, EcoLexicon is gradually evolving and is now organized in a hierarchy of conceptual categories. Although it has not as yet become an ontology, it has significantly gained in formality. Its structural changes, based on a class hierarchy instead of a formal event, signify that it includes valuable data, which could be used in formal ontologies and contribute to the approach taken by the Ontology Layer Cake (OLC), particularly at the first five levels (terms, synonyms in one or more languages, concepts, concept hierarchies and conceptual relations). Since all concepts in EcoLexicon have now been organized in a domain-based hierarchy, this facilitates the matching and comparison of common information across different environmental terminology resources.

For instance, some of this information in EcoLexicon could be integrated into the Environmental Ontology (ENVO) (https://www.ebi.ac.uk/ols/ontologies/envo), which represents knowledge about environments, environmental processes, ecosystems, habitats, and related entities. Although ENVO began as a relatively simple controlled vocabulary to support the metadata checklists of the Genomic Standards Consortium (GSC), it evolved to become a full-fledged ontology within the OBO Foundry & Library (Buttigieg et al., 2013; Buttigieg et al., 2016).

Figure 13 shows the ENVO entry for earthquake.

Fig. 13.

Entry for earthquake in ENVO.

The entry in Fig. 13 indicates that earthquake is regarded as a type of environmental hazard and is a subclass of material transport process. Similarly, in EcoLexicon, earthquake is also a process, and more specifically, a type of movement (like transport). The definitions in EcoLexicon, and ENVO are very similar since both refer to the earthquake process, what triggers the process (release of energy), and what the process causes (plate movement, seismic waves). Nevertheless, the genus in the ENVO definition is simply process instead of environmental hazard, which raises some issues regarding (1) the principles underlying definition construction in the resource, and (2) the reasons underlying the distinction between class information and superordinate concepts.

Furthermore, the ENVO conceptualization of earthquake can be a source of confusion for various reasons. Firstly, the only type of earthquake in the ENVO hierarchy is tectonic earthquake. No mention is made of non-tectonic earthquake, a separate conceptual category with various subclasses. This hierarchy is relatively flat since they refer to the event that induced them (volcanic earthquake, collapse earthquake, landside-induced earthquake, explosion earthquake, etc.), as reflected in EcoLexicon.

Secondly, the ENVO entry only includes eight subclasses of tectonic earthquake (aftershock, blind thrust earthquake, foreshock, interplate earthquake, intraplate earthquake, megathrust earthquake, multiplet tectonic earthquake), none of which is divided into subclasses.

Thirdly, even though foreshocks, mainshocks, and aftershocks are typically associated with tectonic earthquakes, they are also associated with other types of seismic event, such as volcanic eruptions or human-induced earthquakes. Whereas it is true that tectonic earthquakes are the most common cause of foreshocks, mainshocks, and aftershocks, these phenomena can occur in various seismic contexts.

Fourthly, in the context of earthquakes, “multiplet tectonic earthquake” is not a widely recognized term. A more frequent term is “multiplet earthquake”, which refers to a series of earthquakes that occur for the same general reason and share characteristics such as location, focal mechanism, and temporal pattern. However, these earthquakes are not necessarily tectonic.

Finally, blind thrust earthquakes and megathrust earthquakes would best be described as subtypes of thrust fault earthquake, another subclass of tectonic earthquake that would have to be added to ENVO. These are some examples of data found in EcoLexicon, which could be incorporated in ENVO to tweak its hierarchy, and thus increase its accuracy.

4. Conclusion

This paper has compared terminologies and ontologies and discusses how both can learn from each other. Of the many different types of terminologies, the resources that can best contribute to ontology building are conceptually structured. Unfortunately, such resources are not easy to find. Although most terminologies are treasure troves of data regarding concepts and relations, their design does not always make this information explicit. Even terminology knowledge bases, such as EcoLexicon, need to be significantly reengineered in order to approximate the formal structure of an ontology.

This process is currently taking place in EcoLexicon. In an effort to provide the knowledge base with a more formal structure, its concepts have been assigned to classes within a conceptual hierarchy. This hierarchy currently co-exists with the Environmental Event. These conceptual classes as well as relations to other concepts were extracted from definitions as well as from a large corpus of specialized environmental texts.

Definitional templates were thus created for categories as a way of providing definitions with a more formal structure. Information was also extracted from the EcoLexicon (private) corpus, which currently has a total of 104,964,907 words in English. Since this corpus is subdivided into various knowledge domains, we queried the Geology subcorpus, which has a total of 12,085,810 words. The data obtained from these queries were used to structure categories, create concept frames, and characterize general processes and actions.

The concept of earthquake was taken as a case study. A comparison was made of the entries in EcoLexicon and the ENVO ontology, with a focus on their definitions and conceptual structure. There were various coincidences in regard to the definitions in each. However, there were significant divergences in regard to the classes and subclasses of earthquake. These results highlight the need for future cooperation between experts, terminologists, and ontology builders.

Footnotes

Acknowledgements

This research was carried out in the framework of the projects, PID2020-118369GB-I00), funded by the Spanish Ministry of Science and Innovation, and A-HUM-600-UGR20), funded by the European Regional Development Fund (ERDF).

References

Antia, B.E. (2000). Terminology and Language Planning: An Alternative Framework of Practice and Discourse. Amsterdam/Philadelphia: John Benjamins.

Baker, C., Fillmore, C.J. & Cronin, B. (2003). The structure of the FrameNet database. International Journal of Lexicography, 16(3), 281–296. doi:10.1093/ijl/16.3.281.

Barsalou, L.W. (2003). Situated simulation in the human conceptual system. Language and Cognitive Processes, 18(5–6), 513–562. doi:10.1080/01690960344000026.

Barsalou, L.W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645. doi:10.1146/annurev.psych.59.103006.093639.

Barsalou, L.W. (2009). Simulation, situated conceptualization, and prediction. Philosophical Transactions of the Royal Society B, 1281–1289. doi:10.1098/rstb.2008.0319.

Browarnik, A. & Maimon, O. (2015). Ontology learning from text departing the ontology layer cake. International Journal of Signs and Semiotic Systems, 4(2), 1–14. doi:10.4018/IJSSS.2015070101.

Budin, G. (1994). New challenges in specialized translation and technical communication: An interdisciplinary outlook. In

Snell-Hornby ,

Pöchhacker and

Kaindl (Eds.), Translation Studies: An Interdiscipline (pp. 247–254). Amsterdam/Philadelphia: John Benjamins.

Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J. & Lewis, S.E. (2013). The environment ontology: Contextualising biological and biomedical entities. Journal of Biomedical Semantics, 4(43).

Buttigieg, P.L., Pafilis, E., Lewis, S.E., Schildhauer, M.P., Walls, R.L. & Mungall, C.J. (2016). The environment ontology in 2016: Bridging domains with increased scope, semantic density, and interoperation. Journal of Biomedical Semantics, 7(57).

10.

Cimiano, P. (2006). Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Heidelberg: Springer.

11.

Cimiano, P., Maedche, A., Staab, S. & Völker, J. (2009). Ontology learning. In

Staab and

Studer (Eds.), Handbook on Ontologies (pp. 245–267). Berlin: Springer. doi:10.1007/978-3-540-92673-3_11.

12.

Cimiano, P. & Wenderoth, J. (2005). Learning qualia structures from the web. In

Baldwin ,

Korhonen and

Villavivencio (Eds.), Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition (pp. 28–37). Ann Arbor, Michigan: Association for Computational Linguistics. doi:10.3115/1631850.1631854.

13.

Cimiano, P. & Wenderoth, J. (2007). Automatic acquisition of ranked qualia structures from the web. In

Zaaenen and

van den Bosch (Eds.), Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL07) (pp. 888–895). Prague, Czech Republic: Association for Computational Linguistics.

14.

Dolbey, A., Ellsworth, M. & Scheffczyk, J. (2006). BioFrameNet: A domain-specific FrameNet extension with links to biomedical ontologies. In

Bodenreider (Ed.), Proceedings of KR-MED (pp. 87–94). Baltimore: AMIA.

15.

Eck, K. & Meyer, I. (1995). Bringing Aristotle into the 20th century. Computer-aided definition construction in a terminological knowledge base. In

S.E.

Wright and

R.A.

Strehlow (Eds.), Standardizing and Harmonizing Terminology: Theory and Practice (pp. 83–100). West Conshohocken, PA: ASTM International. doi:10.1520/STP13748S.

16.

Evans, V. & Green, M. (2006). Cognitive Linguistics: An Introduction. Edinburgh: Edinburgh University Press.

17.

Faber, P. (2011). The dynamics of specialized knowledge representation: Simulational reconstruction or the perception–action interface. Terminology, 17(1), 9–29.

18.

Faber, P. (Ed.) (2012). A Cognitive Linguistics View of Terminology and Specialized Language. Berlin: Mouton de Gruyter.

19.

Faber, P. (2015). Frames as a framework for terminology. In

Kockaert and

Steurs (Eds.), Handbook of Terminology (pp. 14–33). Amsterdam/ Philadelphia: John Benjamins. doi:10.1075/hot.1.02fra1.

20.

Faber, P. (2022). Frame-based terminology. In

Faber and

M.-C.

L’Homme (Eds.), Theoretical Perspectives on Terminology: Explaining Terms, Concepts and Specialized Knowledge (pp. 353–376). Amsterdam/ Philadelphia: John Benjamins. doi:10.1075/tlrp.23.16fab.

21.

Faber, P., León Araúz, P., Prieto Velasco, J.A. & Reimerink, A. (2007). Linking images and words: The description of specialized concepts. International Journal of Lexicography, 20(1), 39–65. doi:10.1093/ijl/ecl038.

22.

Faber, P. & León-Araúz, P. (2021). Designing terminology resources for environmental translation. In

Meng Ji and

Laviosa (Eds.), The Oxford Handbook of Translation and Social Practices (pp. 587–616). New York: Oxford University Press. Press. doi:10.1093/oxfordhb/9780190067205.013.7.

23.

Faber, P., León-Araúz, P. & Reimerink, A. (2016). EcoLexicon: New features and challenges. In

Kernerman ,

Kosem ,

Krek and

Trap-Jensen (Eds.), GLOBALEX 2016: Lexicographic Resources for Human Language Technology in Conjunction with Theition of the Language Resources and Evaluation Conference (10th ed., pp. 73–80). Globalex.

24.

Faber, P. & L’Homme, M.-C. (Eds.) (2022). Theoretical Perspectives on Terminology: Explaining Terms, Concepts and Specialized Knowledge. Amsterdam/ Philadelphia: John Benjamins.

25.

Fillmore, C.J. (1976). Frame semantics and the nature of language. Annals of the New York Academy of Sciences, 280(1), 20–32. doi:10.1111/j.1749-6632.1976.tb25467.x.

26.

Fillmore, C.J. (1982). Frame semantics. In Linguistic Society of Korea (ed.). Linguistics in the Morning Calm (pp. 111–138). Seoul: Hanshin.

27.

Fillmore, C.J. (2006). Language, Form and Meaning. Chicago: University of Chicago Press.

28.

Gil-Berrozpe, J.C. & Faber, P. (2017). The role of terminological knowledge bases in specialized translation: The use of umbrella concepts. In

M.A.

Candel-Mora and

Vargas-Sierra (Eds.), Temas actuales en terminología y estudios sobre el léxico (pp. 1–25). Granada: Comares.

29.

Gil-Berrozpe, J.C., León-Araúz, P. & Faber, P. (2017). Specifying hyponymy subtypes and knowledge patterns: A corpus-based study. In

Kosem ,

Kallas ,

Tiberius ,

Krek ,

Jakubíček and

Baisa (Eds.), Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference (pp. 63–92). Brno: Lexical Computing CZ s.r.o.

30.

Gil-Berrozpe, J.C., León-Araúz, P. & Faber, P. (2018). Subtypes of hyponymy in the environmental domain: Entities and processes. In

Roche (Ed.), Proceedings of the 10th International Conference on Terminology & Ontology: Theories and Applications (TOTh 2016) (pp. 39–54). Chambéry: Éditions de l’Université Savoie Mont Blanc.

31.

Gil-Berrozpe, J.C., León-Araúz, P. & Faber, P. (2019). Ontological knowledge enhancement in EcoLexicon. In

Kosem ,

Zingano-Kuhn ,

Correia ,

J.P.

Ferreira ,

Jansen ,

Pereira ,

Kallas ,

Jakubíček ,

Krek and

Tiberius (Eds.), Proceedings of the eLex 2019. Conference: Electronic Lexicography in the 21st Century (pp. 177–197). Brno: Lexical Computing CZ, s.r.o.

32.

Gillam, L., Tariq, M. & Ahmad, K. (2005). Terminology and the construction of ontology. Terminology, 11, 55–81.

33.

Gruber, T.R. (1995). Toward principles of the design of ontologies used for knowledge sharing. Journal of Human Computer Studies, 43(5–6), 907–928. doi:10.1006/ijhc.1995.1081.

34.

Hahn, U. & Chater, N. (1997). Concepts and similarity. In

Lamberts and

Shanks (Eds.), Knowledge, Concepts, and Categories (pp. 43–92). Cambridge (MA)/London: MIT Press. doi:10.7551/mitpress/4071.003.0006.

35.

ISO (2009). ISO Standard 704:2009. Terminology Work: Principles and Methods. Geneva, Switzerland: International Organization for Standardization.

36.

ISO (2019). ISO Standard 1087:2019. Terminology Work and Terminology Science. Geneva, Switzerland: International Organization for Standardization.

37.

Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P. & Suchomel, V. (2014). The sketch engine: Ten years on. Lexicography, 1, 7–36. doi:10.1007/s40607-014-0009-9.

38.

Kilgarriff, A., Rychlý, P., Smrz, P. & Tugwell, D. (2004). The sketch engine. In

Williams and

Vessier (Eds.), Proceedings of the 11th EURALEX International Congress (pp. 105–116). Lorient: Université de Bretagne-Sud.

39.

León Araúz, P. & Faber, P. (2012). Causality in the specialized domain of the environment. In

V.B.

Mititelu ,

Popescu and

Pekar (Eds.), Proceedings of the Workshop, Semantic Relations-II. Enhancing Resources and Applications (LREC’12) (pp. 10–17). Istanbul: ELRA.

40.

León Araúz, P. & Faber, P. (2013). Environmental ontology localization and translation relations. In

Page ,

Fleischer ,

Göbel and

Wohlgemuth (Eds.), Proceedings of the 27th International Conference on Environmental Informatics for Environmental Protection, Sustainable Development and Risk Management, EnviroInfo 2013 (pp. 582–591). Herzogenrath: Shaker Verlag.

41.

León Araúz, P., Reimerink, A. & Faber, P. (2019). EcoLexicon and by-products: Integrating and reusing terminological resources. In

Alcina ,

Costa and

Roche (Eds.), Terminology (Vol. 25, pp. 222–258). doi:10.1075/term.00037.leo.

42.

León-Araúz, P. & Reimerink, A. (2019). High-density knowledge rich contexts. Argentinian Journal of Applied Linguistics, 7(1), 109–130.

43.

León-Araúz, P. & San Martín, A. (2018). The EcoLexicon semantic sketch grammar: From knowledge patterns to word sketches. In

Kerneman and

Krek (Eds.), Proceedings of the LREC 2018 Workshop, Globalex 2018 – Lexicography & WordNets (pp. 94–99). Miyazaki: Globalex.

44.

León-Araúz, P., San Martín, A. & Faber, P. (2016). Pattern-based word sketches for the extraction of semantic relations. In Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016) (pp. 73–82). Osaka, Japan: COLING 2016.

45.

León-Araúz, P., San Martín, A. & Reimerink, A. (2018). The EcoLexicon English corpus as an open corpus in sketch engine. In

Čibej ,

Gorjanc ,

Kosem and

Krek (Eds.), Proceedings of the 18th EURALEX International Congress (pp. 893–901). Ljubljana: Euralex.

46.

L’Homme, M.C. (2018). Maintaining the balance between knowledge and the lexicon in terminology: A methodology based on frame semantics. Lexicography, 4(1), 3–21. doi:10.1007/s40607-018-0034-1.

47.

L’Homme, M.C. & Bernier-Colborne, G. (2012). Terms as labels for concepts, terms as lexical units: A comparative analysis in ontologies and specialized dictionaries. Applied Ontology, 7(4), 387–400. doi:10.3233/AO-2012-0116.

48.

L’Homme, M.C., Robichaud, B. & Prévil, N. (2018). Browsing the terminological structure of a specialized domain: A method based on lexical functions and their classification. In

Calzolari ,

Choukri ,

Cleri ,

Declerck ,

Goggi ,

Hasida ,

Isahara ,

Maegaard ,

Mariani ,

Mazo ,

Moreno ,

Odijk ,

Piperidis and

Tokunaga (Eds.), Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3079–3086). Miyazaki, Japan: ELRA.

49.

L’Homme, M.C., Robichaud, B. & Subirats-Rüggeberg, C. (2020). Building multilingual specialized resources based on FrameNet: Application to the field of the environment. In

T.T.

Torrent ,

C.F.

Baker ,

Czulo ,

Ohara and

M.R.L.

Petruck (Eds.), Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet (pp. 85–92). Marseilles: ELRA.

50.

Löbner, S. (2015). Functional concepts and frames. In

Gammerschlag ,

Gerland ,

Osswald and

Petersen (Eds.), Meaning, Frames, and Conceptual Representation (pp. 15–42). Düsseldorf: Düsseldorf University Press.

51.

Mahesh, K. & Nirenburg, S. (1995). A situated ontology for practical NLP. In Proceedings of IJCAI’95 Workshop on Basic Ontological Issues in Knowledges Sharing, Montreal, August 19–21.

52.

Marshman, E. (2022). Knowledge patterns in corpora. In

Faber and

M.-C.

L’Homme (Eds.), Theoretical Perspectives on Terminology: Explaining Terms, Concepts and Specialized Knowledge (pp. 291–310). Amsterdam/ Philadelphia: John Benjamins. doi:10.1075/tlrp.23.13mar.

53.

Mel’čuk, I., Clas, A. & Polguère, A. (1995). Introduction à la lexicologie explicative et combinatoire. Brussels, Belgium: Duculot.

54.

Meyer, I. (2001). Extracting knowledge-rich contexts for terminography: A conceptual and methodological framework. In

Bourigault ,

Jacquemin and

M.C.

L’Homme (Eds.), Recent Advances in Computational Terminology (pp. 279–302). Amsterdam/Philadelphia: John Benjamins. doi:10.1075/nlp.2.15mey.

55.

Meyer, I., Eck, K. & Skuce, D. (1997). Systematic concept analysis within a knowledge-based approach to terminology.”. In

Wright and

Budin (Eds.), Handbook of Terminology Management. Volume 1: Basic Aspects of Terminology Management (pp. 98–118). Amsterdam/Philadelphia: John Benjamins. doi:10.1075/z.htm1.14mey.

56.

Meyer, I. & Mackintosh, K. (1996). The corpus from a terminographer’s viewpoint. International Journal of Corpus Linguistics, 1(2), 257–285. doi:10.1075/ijcl.1.2.05mey.

57.

Meyer, I., Mackintosh, K., Barrière, C. & Morgan, T. (1997). Conceptual sampling for terminographical corpus analysis. In

Sandrini (Ed.), Proceedings of Terminology and Knowledge Engineering (TKE ’99) (pp. 256–267). Würzburg: Ergon-Verlag.

58.

Montiel-Ponsoda, E. (2022). Terminology and ontologies. In

Faber and

M.-C.

L’Homme (Eds.), Theoretical Perspectives on Terminology: Explaining Terms, Concepts and Specialized Knowledge (pp. 149–174). Amsterdam/ Philadelphia: John Benjamins. doi:10.1075/tlrp.23.07mon.

59.

Montiel-Ponsoda, E. & Aguado-de-Cea, G. (2014). Applying the lexical constructional model to ontology building. In

Nolan and

Periñán (Eds.), Language Processing and Grammars: The Role of Functionally Oriented Computational Models (pp. 313–338). Amsterdam/Philadelphia: John Benjamins. doi:10.1075/slcs.150.13mon.

60.

Moreno-Ortiz, A. & Pérez-Hernández, C. (2000). Reusing the mikrokosmos ontology for concept-based multilingual terminology databases. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, June 2000 (pp. 1061–1067).

61.

Polguère, A. (2012). Propriétés sémantiques et combinatoires des quasi-prédicats sémantiques. Scolia, 26, 131–152. doi:10.3406/scoli.2012.1141.

62.

Prieto-Díaz, R. (2003). A faceted approach to building ontologies. In

Smari and

A.M.

Memon (Eds.), Proceedings of the 2003 IEEE International Conference on Information Reuse and Integration: IRI-2003 (pp. 458–465). Las Vegas, NV: IEEE.

63.

Roche, C. (2012). Ontoterminology: How to unify terminology and ontology into a single paradigm. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 2626–2630). Istanbul, Turkey: ELRA.

64.

Roche, C., Costa, R. & Carvalho, S. (2019). Knowledge-based terminological e-dictionaries: The EndoTerm and al-Andalus pottery projects. Terminology, 25(2), 259–290.

65.

Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C. & Scheffczyk, J. (2010). FrameNet II: Extended theory and practice. Technical Report, Berkeley, CA: International Computer Science Institute. http://framenet.icsi.berkeley.edu/.

66.

Sager, J.C. (1990). A Practical Course in Terminology Processing. Amsterdam/Philadephia: John Benjamins.

67.

Schmidt, T. (2009). The kicktionary – a multilingual lexical resource of football language. In

Boas (Ed.), Multilingual FrameNets in Computational Lexicography, Methods and Applications (pp. 101–134). Berlin/New York: Mouton de Gruyter. doi:10.1515/9783110212976.1.101.

68.

Seppälä, S. (2015). An ontological framework for modeling the contents of definitions. Terminology, 21(1), 23–50.

69.

Seppälä, S., Ruttenberg, A., Schreiber, Y. & Smith, B. (2016). Definitions in ontologies. Cahiers de lexicologie: La définition, 2(109), 173–206.

70.

Seppälä, S., Ruttenberg, A. & Smith, B. (2017). Guidelines for writing definitions in ontologies. Ciência da Informação, 46(1), 73–88.

71.

Shreve, G.M. (1995). SGML representation of concept systems: Identifying and retrieving term-concept systems: Identifying and retrieving term-concept structures in textual context. In

S.E.

Wright and

Strehlow (Eds.), Standardizing and Harmonizing Terminology. Theory and Practice (pp. 157–168). Philadelphia: American Society of Testing and Materials. doi:10.1520/STP13753S.

72.

Sowa, J.F. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA: Brooks Cole.

73.

Studer, R., Benjamins, V.R. & Fensel, D. (1998). Knowledge engineering: Principles and methods. IEEE Transactions on Data and Knowledge Engineering, 25(1–2), 161–197. doi:10.1016/S0169-023X(97)00056-6.

74.

Suárez-Figueroa, M.C. (2010). NeOn methodology for building ontology networks: Specification, scheduling and reuse. Doctoral Thesis defended at the Universidad Politécnica de Madrid. doi:10.20868/UPM.thesis.3879.

75.

Verdaguer, I. (2020). Semantic frames and semantic networks in the health science corpus. Estudios de lingüística del español. Anejo, 1, 117–155.

76.

Wróblewska, A., Podsiadły-Marczykowska, T., Bembenik, R., Protaziuk, G. & Rybiński, H. (2012). Methods and tools for ontology building, learning and integration – application in the SYNAT project. In

Bembenik ,

Skonieczny ,

Rybiński and

Niezgodka (Eds.), Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence (Vol. 390, pp. 121–151). Berlin/Heidelberg: Springer. doi:10.1007/978-3-642-24809-2_9.

▼ P-11: Movement
	○ P-11.1: Earth/ground movement
	○ P-11.2: Energy movement
	○ P-11.3: Fluid movement
		■ P-11.3.1: Water movement
	○ P-11.4: Transport
	○ P-11.5: Wave movement
	○ P-11.6: Wind movement

From specialized knowledge frames to linguistically based ontologies

Abstract

Keywords

1. Introduction

1.1. Conceptual modeling in terminology

1.2. Ontology building and terminology

3. EcoLexicon

Table 2 Movement: categories ▼ P-11: Movement ○ P-11.1: Earth/ground movement ○ P-11.2: Energy movement ○ P-11.3: Fluid movement ■ P-11.3.1: Water movement ○ P-11.4: Transport ○ P-11.5: Wave movement ○ P-11.6: Wind movement

Footnotes

Acknowledgements

References

Table 2
Movement: categories

▼ P-11: Movement

○ P-11.1: Earth/ground movement

○ P-11.2: Energy movement

○ P-11.3: Fluid movement

■ P-11.3.1: Water movement

○ P-11.4: Transport

○ P-11.5: Wave movement

○ P-11.6: Wind movement