Sage Journals: Discover world-class research

Abstract

Ontology matching approaches commonly leverage similarity metrics to establish mappings between entities in the ontologies participating in the process. However, the lack of standardized entity names across these ontologies can cause such metrics to overlook correct mappings. Generally, existing approaches that focus on standardizing entity names neglect the ongoing matching process, leading to inaccurate results, and fail to address the syntactic standardization of entity names. To address these issues, we introduce a novel process that standardizes entity names both lexically and syntactically through a customized lexical analyzer tailored to the ontologies participating in the process. We evaluate this process efficacy using Alin and AML, ontology matching systems, along with the Anatomy and Conference tracks of OAEI, demonstrating an improvement in matching results.

Keywords

ontology matching interactive ontology matching lexical analyzer syntactic analyzer WordNet FMA

1 Introduction

An ontology is a formal and explicit artifact that represents a shared conceptualization of a particular domain, structurally consisting of a collection of interconnected entities, including concepts (also referred to as classes), attributes of concepts, and relationships between concepts. One advantage of using ontologies is that they enhance communication not only among humans but also among application systems, thereby fostering interoperability. However, with the recent and rapid advancements in semantic technologies and the web, multiple ontologies for the same domain have emerged, each using different entities to refer to the same real-world objects. This proliferation of ontologies can lead to communication issues among individuals or application systems that utilize these different ontologies.

Ontology matching involves finding the optimal set of correspondences (mappings) between entities in different ontologies (Euzenat & Shvaiko, 2013). A mapping asserts a semantic relationship between ontology entities, such as disjunction, subsumption, or equivalence. An ontology matching system is responsible for identifying the mappings to be included in the alignment from among all possible options.

One approach to ontology matching is the interactive method, which involves domain experts and can yield better results compared to fully automatic approaches (Paulheim et al., 2013). In some interactive approaches, domain experts provide feedback on selected mappings, determining which should be accepted or rejected. This feedback leverages their expertise to refine the alignment, with a crucial step being the selection of mappings that will receive expert input.

Many ontology matching tools use similarity metrics to select mappings, which are either directly included in the final alignment or presented to experts in interactive systems. Many of these tools incorporate a preprocessing step to standardize entity names before calculating similarity, as the lack of standardization can result in lower similarity values and cause the tool to miss some correct mappings. This preprocessing involves adjustments such as converting characters to lowercase, splitting compound words, removing underscores, etc. (see Section 2).

However, these tools generally do not consider the ongoing matching process during standardization, which can lead to incorrect similarity metric values. For instance, a tool might use a preprocessing technique that removes all symbols except letters and numbers from entity names. As a result, the entity name “head/neck’’ would be transformed into “head neck.’’ When compared to “Head and Neck’’ in another ontology, this would yield a similarity value lower than the maximum if the metric assigns the highest value only to identical strings. If prior studies had revealed that the slash denotes “and,” the preprocessing could be adjusted to convert “head/neck’’ to “head and neck.’’ With all letters converted to lowercase, the comparison would then be between “head and neck’’ and “head and neck,’’ achieving the maximum similarity value and increasing the likelihood that the tool would select this mapping. Conversely, without this adjustment, the tool’s probability of discarding the mapping increases.

An aspect often neglected by these tools is the syntactic standardization of entity names, which enables the application of techniques that would otherwise be hindered. A syntactic analyzer relies on a specific grammar, a formalism developed to describe syntactically well-formed structures in a given language. We propose a grammar to describe well-formed entity names. This grammar incorporates characteristics that improve the use of similarity metrics.

Thus, we aim for two types of entity name standardization. The first focuses on standardizing the spelling of words and symbols in the entity names, which we will refer to as lexical standardization. The second focuses on ensuring that the entity names in both ontologies follow the same grammar, which we will refer to as syntactic standardization. Typically, development teams that produce these tools incorporate lexical standardization directly into the code. Our proposal aims to incorporate both lexical and syntactic standardization into lexical analyzers. We propose creating a lexical analyzer for each ontology considering the ongoing matching process. Since we do not include these lexical analyzers in the code, it is possible for other ontology matching tools to reuse them to match the same ontologies. To evaluate the benefits of our proposal, we use the ontology matching systems Alin (Silva et al., 2017, 2018, 2020) and AML (Faria et al., 2023), and develop a new similarity metric, the Alin metric, which leverages the characteristics of the proposed grammar. We also utilized the Conference track and Anatomy track ontologies from the OAEI to evaluate the benefits of our proposal. The results obtained were encouraging.

The hypothesis we aim to verify in this paper is that standardizing entity names both lexically and syntactically according to the proposed grammar, while considering the involved ontologies, improves the effectiveness of similarity metrics in the ontology matching process.

The contribution of this work is the proposal of a process for developing lexical analyzers to standardize entity names. This standardization will be performed lexically and syntactically, according to the proposed grammar, while considering the ongoing matching process.

The remainder of this paper is organized as follows. Section 2 presents related work. Section 3 explains the process of creating lexical analyzers and discusses related concepts. Section 4 details the ontologies used to test our hypothesis and the tools employed, Alin and AML, including the testing procedure and results. Section 5 provides the research conclusions and suggests potential future work. Appendix A describes the process for developing lexical analyzers for mouse and human ontologies. Appendix B includes the first version of the lexical analyzers and instructions on how to use them within a Java program. Subsequent versions are available on OSF¹. Finally, Appendix C offers additional information about Alin.

2 Related Work

In this section, we review works related to our study, specifically ontology matching approaches that address the issue of entity name standardization within ontologies. Many approaches tackle this problem through a preprocessing step that modifies entity names to achieve standardization, thereby facilitating better comparison between entities.

Several approaches employ techniques for standardizing entity names, including:

Normalization by uppercase or lowercase (NUL). In Faria et al. (2023), Espinoza et al. (2006), Hu et al. (2006), Kuo and Wu (2013), Hassanzadeh et al. (2015), Cerón-Figueroa et al. (2017), Cerón-Figueroa et al. (2017), Hertling and Paulheim (2020), Hertling et al. (2020), Dhouib et al. (2021), He et al. (2021), Zhang et al. (2013) the entity names are converted to either lower or upper case before making comparisons. For example, the names “Head and Neck’’ and “head and neck’’ return a lower value when compared using a string-based metric, but when both are converted to lowercase, the comparison between “head and neck’’ and “head and neck’’ returns a higher value with the same metric.

Stemming (STE). Another technique used in various matching tools (Bouma, 2009; Byrne et al., 2009; Cheatham & Hitzler, 2013; Doan et al., 2002; He et al., 2021; Hu et al., 2006; Jiménez-Ruiz et al., 2020, 2018, 2012; Kammoun & Diallo, 2013; Khiat, 2016; Khiat et al., 2015; Khiat & Mackeprang, 2017; Khiat et al., 2016; Laadhar et al., 2017; Lambrix & Tan, 2006; Lv, 2022; Madhavan et al., 2001; Mohammadi et al., 2019; Ouali et al., 2019; Pesquita et al., 2010; Qassimi & Abdelwahed, 2019; Roussille & Teste, 2022; Schadd & Roos, 2011; Seddiqui & Aono, 2009; Tigrine et al., 2016; Togia et al., 2010; Van Berne & Malaisé, 2014) during the preprocessing phase is stemming, which reduces words to their base form. For example, the tool changes plural words to their singular form or conjugated verbs to their infinitive form.

Separation of the words (SOW). Not all ontologies use the same notation for separating words in entity names. Commonly adopted notations include camel case, underscores, or spaces. Some ontologies even apply different notations to different entity names. Multiple tools (Byrne et al., 2009; Cheatham & Hitzler, 2013; Chua & Kim, 2010; Fallatah et al., 2021; Faria et al., 2023; He et al., 2021; Hertling, 2012; Hertling & Paulheim, 2019, 2020; Paulheim & Hertling, 2013; Roussille et al., 2018; Roussille & Teste, 2022; Schwichtenberg & Engels, 2015; Van Berne & Malaisé, 2014) analyze the words to be compared and choose the appropriate method for separation, which allows for the standardization of notations. For instance, standardization can be applied so that all entity names have their words separated by underscores. In addition to words, a entity name may include punctuation marks such as commas, parentheses, and other symbols, all of which the tool must account for while also separating the words within the entity. This separation of words and symbols enables the tool to perform tasks such as removing punctuation characters, eliminating stop words, and applying expansion and stemming.

Conversion of Roman numbers to Arabic numbers (CRA). The method described in Chakraborty et al. (2020) converts Roman numerals to Arabic numerals.

Removal of stop words (RSW). Stop words are words such as articles, prepositions, pronouns, and conjunctions that are removed because they do not add much information to the text (e.g., stop words in English include “the,’’ “a,’’ “an,’’ “so,’’ “what,’’ “and,’’ and “or’’). In Faria et al. (2023), Kuo and Wu (2013), Dhouib et al. (2021), Seddiqui and Aono (2009), Schadd and Roos (2011), Lv (2022), Kammoun and Diallo (2013), Tigrine et al. (2016), Cheatham and Hitzler (2013), Khiat et al. (2015), Khiat (2016), Khiat et al. (2016), Laadhar et al. (2017), Khiat and Mackeprang (2017), Jiménez-Ruiz et al. (2018), Mohammadi et al. (2019), Qassimi and Abdelwahed (2019), Ouali et al. (2019), Jiménez-Ruiz et al. (2020), Roussille and Teste (2022), Hertling (2012), Cruz et al. (2009), Paulheim and Hertling (2013), Hertling and Paulheim (2019), Karpathiotaki et al. (2014), Bethea et al. (2006), Kiu and Lee (2007), Hamdi et al. (2008), Huber et al. (2011), Massmann et al. (2011), Schadd and Roos (2012), Groß et al. (2012), Hertling and Paulheim (2012), Gulic and Vrdoljak (2013), Khan et al. (2013), Damak et al. (2015), Khiat and Benaissa (2015), Wang et al. (2010), Achichi et al. (2017), Quix et al. (2017), Zhao et al. (2018), Georgala et al. (2020); Real et al. (2020), Lv and Peng (2021), Fallatah et al. (2022), Gomes et al. (2022), Zheng et al. (2022), Laadhar et al. (2019), Efeoglu (2022) stop words are removed from entity names before comparing them.

Removal of non-alphanumeric characters or their replacement with white spaces (RNC). Tools like (Diallo & Ba, 2012; Fallatah et al., 2021, 2022; Faria et al., 2023; Jean-Mary et al., 2009; Laadhar et al., 2017; Roussille & Teste, 2022) remove non-alphanumeric characters or replace them with white spaces.

Removal of punctuation characters (RPC). Tools as Faria et al. (2023), Hu et al. (2006), Cerón-Figueroa et al. (2017), Hertling and Paulheim (2020), Zhang et al. (2013), Kammoun and Diallo (2013), Mohammadi et al. (2019), Real et al. (2020), Gosselin and Zouaq (2022), Zhao and Zhang (2016), Zhao and Zhang (2016), Cheng et al. (2017), Lyu et al. (2017), Giunchiglia et al. (2004), Zhao et al. (2018) remove punctuation characters from the entity names.

Use of synonyms (USY). In addition to the entity’s own name, in Faria et al. (2023), Hertling and Paulheim (2020), Pesquita et al. (2010), Madhavan et al. (2001), Massmann et al. (2011), Zhao and Zhang (2016), Zhao et al. (2018) use synonyms to search for matches in another ontology. These synonyms can either be within the ontology itself—where the ontology may associate two or more entity names as synonyms—or from external resources, such as WordNet (Fellbaum, 1998), as used by Gosselin and Zouaq (2022), Djeddi and Khadir (2011), Jean-Mary et al. (2009), or FMA, as used by Annane and Bellahsene (2020).

Expansion of abbreviations and acronyms (EAA). Tools as Madhavan et al. (2001), Massmann et al. (2011), Real et al. (2020), Gosselin and Zouaq (2022) expand abbreviations and acronyms.

Permutation of words (POW). In Diallo and Ba (2012) and Meilicke and Stuckenschmidt (2015) permutation of the words in the name are used. This technique compares the original name and its permutations with the names and their permutations from the other ontology. A permutation of an entity name occurs when it includes a preposition. For instance, “ConferenceMember’’ serves as an alternative entity name to “MemberOfConference.’’

In Table 1, we present the standardization techniques used by various approaches, providing the tool name if the approach is associated with one, or the title of the paper in quotes if it is not. It is important to note that only documented techniques are listed, meaning that additional techniques are missing if not explicitly documented. We implemented all the standardization techniques discussed in this section in Alin, primarily through lexical analyzers, which we subsequently tested in AML.

Table 1.
Standardization Technique by Tool/Approach.

Tool/Approach NUL STE SOW CRA RSW RNC RPC USY EAA POW

“Discovering and Merging Keyword Senses using Ontology Matching’’ (Espinoza et al., 2006) X

Falcon-AO (Hu et al., 2006) X X X

ODGOMS (Kuo & Wu, 2013) X X

“Understanding a Large Corpus of Web Tables Through Matching with Knowledge Bases: an Empirical Study’’ (Hassanzadeh et al., 2015) X

“Instance-Based Ontology Matching for e-learning Material using an Associative Pattern Classifier’’ (Cerón-Figueroa et al., 2017) X

“Instance-Based Ontology Matching for Open and Distance Learning Materials’’ (Cerón-Figueroa et al., 2017) X X

ATBox (Hertling & Paulheim, 2020) X X X X

“Supervised Ontology and Instance Matching with MELT’’ (Hertling et al., 2020) X

“Measuring Clusters of Labels in an Embedding Space to Refine Relations in Ontology Alignment’’ (Dhouib et al., 2021) X X

“Biomedical Ontology Alignment with BERT’’ (He et al., 2021) X X X

IAMA (Zhang et al., 2013) X X

“Learning to Map between Ontologies on the Semantic Web’’ (Doan et al., 2002) X

SAMBO (Lambrix & Tan, 2006) X

“An Efficient and Scalable Algorithm for Segmented Alignment of Ontologies of Arbitrary Size’’ (Seddiqui & Aono, 2009) X X

AML (Faria et al., 2023) X X X X X X

“Scalable Matching of Industry Models: A Case Study’’ (Byrne et al., 2009) X X

“Cross-Lingual Dutch to English Alignment using EuroWordNet and Dutch Wikipedia’’ (Bouma, 2009) X

BLOOMS (Pesquita et al., 2010) X X

“Harnessing the Power of Folksonomies for Formal Ontology Matching On-the-Fly’’ (Togia et al., 2010) X

MaasMatch (Schadd & Roos, 2011) X X

“An Effective Approach for Large Ontology Matching using Multi-Objective Grasshopper Algorithm’’ (Lv, 2022) X X

ServOMap (Kammoun & Diallo, 2013) X X X

LYAM++ (Tigrine et al., 2016) X X

MapSSS (Cheatham & Hitzler, 2013) X X X

AnAGram (Van Berne & Malaisé, 2014) X X

“Large-scale Interactive Ontology Matching: Algorithms and Implementation’’ (Jiménez-Ruiz et al., 2012) X

STRIM (Khiat et al., 2015) X X

CroLOM (Khiat, 2016) X X

SimCat (Khiat et al., 2016) X X

POMap (Laadhar et al., 2017) X X X

I-Match (Khiat & Mackeprang, 2017) X X

“We Divide, You Conquer: From Large-Scale Ontology Alignment to Manageable Subtasks with a Lexical Index and Neural Embeddings’’ (Jiménez-Ruiz et al., 2018) X X

SANOM (Mohammadi et al., 2019) X X X

“The Role of Collaborative Tagging and Ontologies in Emerging Semantic of Web Resources’’ (Qassimi & Abdelwahed, 2019) X X

“Ontology Alignment using Stable Matching’’ (Ouali et al., 2019) X X

“Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-Based Modules’’ (Jiménez-Ruiz et al., 2020) X X

Cupid (Madhavan et al., 2001) X X X

TOMATO (Roussille & Teste, 2022) X X X X

Eff2Match (Chua & Kim, 2010) X

Hertuda (Hertling, 2012) X X

WeSeE (Paulheim & Hertling, 2013) X X

“Efficient Selection of Mappings and Automatic Quality-Driven Combination of Matching Methods’’ (Cruz et al., 2009) X

RSDL (Schwichtenberg & Engels, 2015) X

Holontology (Roussille et al., 2018) X

DOME (Hertling & Paulheim, 2019) X X

“A Hybrid Approach for Large Knowledge Graphs Matching’’ (Fallatah et al., 2021) X X

OntoConnect (Chakraborty et al., 2020) X

“Enabling Semantic Search for EO Products: an Ontology Matching Approach’’ (Karpathiotaki et al., 2014) X

Onto-Mapology (Bethea et al., 2006) X

OntoDNA (Kiu & Lee, 2007) X

TaxoMap (Hamdi et al., 2008) X

CODI (Huber et al., 2011) X

COMA (Massmann et al., 2011) X X X

“Coupling of WordNet Entries for Ontology Mapping using Virtual Documents’’ (Schadd & Roos, 2012) X

GOMMA (Groß et al., 2012) X

WikiMatch (Hertling & Paulheim, 2012) X

CroMatcher (Gulic & Vrdoljak, 2013) X

SPHeRe (Khan et al., 2013) X

Exona (Damak et al., 2015) X

InsMT+ (Khiat & Benaissa, 2015) X

RiMOM (Wang et al., 2010) X

Legato (Achichi et al., 2017) X

“Ontology Matching for Patent Classification’’ (Quix et al., 2017) X

“Using Domain Lexicon and Grammar for Ontology Matching’’ (Real et al., 2020) X X X

“Applying Edge-Counting Semantic Similarities to Link Discovery: Scalability and Accuracy’’ (Georgala et al., 2020) X

“A Novel Periodic Learning Ontology Matching Model Based on Interactive Grasshopper Optimization Algorithm’’ (Lv & Peng, 2021) X

KGMatcher (Fallatah et al., 2022) X X

OMT (Gomes et al., 2022) X

“Knowledge-Informed Semantic Alignment and Rule Interpretation for Automated Compliance Checking’’ (Zheng et al., 2022) X

“The Impact of Imbalanced Training Data on Local Matching Learning of Ontologies’’ (Laadhar et al., 2019) X

GraphMatcher (Efeoglu, 2022) X

“Ontology Matching with Semantic Verification’’ (Jean-Mary et al., 2009) X X

“Effective Method for Large Scale Ontology Matching’’ (Diallo & Ba, 2012) X X

SEBMatcher (Gosselin & Zouaq, 2022) X X

“Identifying and Validating Ontology Mappings by Formal Concept Analysis’’ (Zhao & Zhang, 2016) X

FCA-Map (Zhao & Zhang, 2016) X X

“Learning Reference-Enriched Approach Towards Large Scale Active Ontology Alignment and Integration’’ (Cheng et al., 2017) X

njuLink (Lyu et al., 2017) X

S-Match (Giunchiglia et al., 2004) X

“New Paradigm for Alignment Extraction’’ (Meilicke & Stuckenschmidt, 2015) X

“Matching Biomedical Ontologies based on Formal Concept Analysis’’ (Zhao et al., 2018) X X X

Alin (Silva et al., 2020) X X X X X X X X X X

AML (Faria et al., 2023) with lexical analyzers X X X X X X X X X X

3 Lexical Analyzer Creation Process

3.1 Preliminaries

In this subsection, we will explain several concepts related to the lexical analyzer creation process.

Definition 3.1 Standardization Technique

It is a technique employed by ontology matching approaches to standardize entity names.

Examples of such techniques include those discussed in Section 2 such as Normalization by uppercase or lowercase, Stemming, and others.

To understand the lexical analyzer creation process, it is crucial to note that it will generate multiple versions of the lexical analyzer. Each new version will incorporate all the standardization techniques (Definition 3.1) from the previous version, along with possibly additional techniques.

3.1.1 Regular Expression and Lexical Analyzer

Definition 3.2 Lexeme, token and word

A lexeme is a sequence of characters in the source string that match a token pattern. A token is a category into which the lexical analyzer classifies lexemes. We will refer to lexemes that are English words as “words.’’

Definition 3.3 Regular expression

A regular expression (abbreviated as regex) is a sequence of characters used to specify a matching pattern in text.

A regex can be associated with a token and any set of characters matching its pattern can be categorized under that token. For example, the regular expression $^{'} ∖ d 3^{'}$ matches any digit (0-9), with {3} indicating that there must be exactly three digits. This regex can find sequences such as “123,’’ “456,’’ and “789’’ in a text. We could create a token called “3-digit’’ and associate the found sequences with this token.

Definition 3.4 Lexical analyzer

A lexical analyzer is a group of regular expressions, each associated with an action.

An action can be associated with a matched regex to return a variation of the name that triggered it. If a non-standard entity name appears, we can modify the lexical analyzer to recognize the name and return a standardized version based on a standardization technique. We used this feature to generate standardized names in the two ontologies involved in the matching process.

Definition 3.5 Lexical analyzer generator and lexer

A lexical analyzer generator takes a lexical analyzer as input and produces a source code (lexer) to read a string, match it against the regexes and execute the associated actions.

Lexers typically serve as the initial front-end phase in compilers, matching keywords, comments, operators, and other elements, generating a stream of tokens for the syntactic analyzer. Another feature of lexers is their ability to list all strings that do not match a regex included in them. We will utilize this list to develop subsequent versions of the lexical analyzers.

To utilize lexical analyzers in the ontology matching systems, we generated lexers as Java classes using a lexical analyzer generator provided by JFlex and compiled them with the systems. Whenever we refer to executing the lexical analyzers in the remainder of this text, we refer to the execution of the compiled lexers.

Definition 3.6 Existing tokens in the lexical analyzer

Our lexical analyzers will categorize the lexemes into three tokens:

–
Noun or modifier—Lexemes composed solely of letters, with the first letter possibly being uppercase;
–
Numeral—Lexemes composed only by numbers;
–
Preposition—Lexemes that are an English preposition, such as “of,’’ “for,’’ “at,’’ etc.

It is important to note that some entity names may contain substrings that do not conform to any of the three tokens mentioned above. For example, hyphenated words may not fit into any of these categories. We defined these tokens for the first version of the lexical analyzers and may insert new lexemes as instances of a token in future versions. For example, in the future, the “noun or modifier’’ token might accept hyphenated lexemes. When an entity name contains a lexeme that does not match any token, it appears in a listing generated by the lexical analyzer.

Take the following entity name as an example: 3rd ventricle choroid plexus. Although it contains four words, when the lexical analyzer, implementing the three tokens (Definition 3.6), processes this string, it does not output exactly these four words. Instead, it generates five strings: 3, rd, ventricle, choroid, and plexus.

According to our definition, a noun or modifier consists only of letters. Therefore, the lexical analyzer outputs the last four strings, while the first is returned based on the numeral definition.

The tokens passed to the syntactic analyzer will be (numeral, noun or modifier, noun or modifier, noun or modifier, noun or modifier). The five generated strings—3, rd, ventricle, choroid, and plexus—are lexemes. It is important to note that a lexeme may not always correspond directly to a single word, as in the case of “3rd,’’ which resulted in two distinct lexemes. Thus, the lexical analyzer categorizes lexemes, not necessarily words, into the three defined tokens.

One of the goals of the final version of the implemented lexical analyzers is to ensure that each lexeme corresponds to a single word.
3.1.2 Syntax and Syntactic Analyzer

Definition 3.7 Noun Phrase

A noun phrase (Spasić et al., 2018) is a phrase made up of a head noun that can be modified with words before it (pre-modifiers), such as adjectives and nouns, or after it (post-modifiers).

Examples of noun phrases with pre-modifiers:

–
academic language (adjective + noun)
–
email address (noun + noun)

Examples of noun phrases with post-modifiers: –
articles by leading academics (noun + prepositional phrase)

Considering only the noun phrases without a preposition, the head noun is the last word. However, in cases where a preposition is present, the head noun is the word immediately before the preposition.
Definition 3.8 Concept and terms

A concept is a mental category that helps us organize objects, and within a domain a term is typically associated with a concept.

For example, in a classroom, the category of objects used for sitting is associated with the term “chair.’’ Terms refer to the same concept in a domain if they refer to the same objects. We can say that entity names are terms.

Definition 3.9 Term Variation

Two terms are variations of the same term in a domain when both refer to the same concept.

One way to obtain a variation of a term is by replacing an adjective with a modifier noun, a process known as derivation. For instance, “enzymatic activity’’ can be varied to “enzyme activity.’’ We find some of these variations in human and mouse ontologies. Another method of obtaining term variations is permutation, which occurs when a term contains prepositions. For example, “activity of enzyme’’ can be varied to “enzyme activity.’’

Definition 3.10 Syntax

Syntax addresses how words are arranged to form phrases, clauses, and sentences.

For example, consider that terms used as entity names in an ontology are composed of noun phrases, and their words can be classified into three tokens as outlined in Definition 3.6: noun or modifier, preposition, or numeral. Furthermore, the entity name has to begin with a noun or modifier. However, only a noun or modifier can follow a preposition, and only a preposition or a noun or modifier can follow a numeral. Consequently, a term with two consecutive numeral words would not conform to this syntax. Similarly, a term that begins with a numeral or a preposition or includes symbols not classified as one of the three tokens, such as a comma, would also be non-compliant with this syntax.

In Definition 3.11, we provide a context-free grammar for the syntax described above.

Definition 3.11 Target Grammar

Where “S’’ is the initial symbol, “q’’ represents a noun or modifier, “n’’ represents a numeral, “p’’ represents a preposition , we refer to the following context-free grammar:

\begin{aligned} S & \to q ∣ q B \end{aligned}

(1)

\begin{aligned} B & \to p S ∣ n ∣ n C ∣ q ∣ q B \end{aligned}

(2)

\begin{aligned} C & \to p S ∣ q ∣ q B \end{aligned}

(3)

as the Target Grammar.

Definition 3.12 Syntactic analyzer

Syntactic analyzer is a program that processes the sequence of tokens generated by the lexical analyzer. Its main job is to verify that the token sequence adheres to the syntax defined by a given grammar.

We have developed a syntactic analyzer specifically for the Target Grammar. Our syntactic analyzer receives a sequence of tokens from the lexical analyzer and checks whether it conforms to the syntax defined in the Target Grammar. For example, if the lexical analyzer passes the sequence “n p q q’’ to the syntactic analyzer, this sequence does not conform to the syntax. According to the Target Grammar, the first token should be “q,’’ representing a noun or modifier.

We can associate an action with a grammar rule. We used this feature to identify the head noun and create variations of the entity name.

We will use the final version of each lexical analyzer to ensure that all entity names in the ontologies comply with the syntax defined by the Target Grammar. An entity name that conforms to this syntax offers the following advantages:

–
We can identify the head noun of an entity name, which will always be a “noun or modifier’’—typically the last word of the entity name unless the last word is a “numeral.’’ If there are prepositions in the entity name, the head noun may not be the last word, but we can determine it using the “preposition’’ classification. Knowing the head noun facilitates the implementation of useful strategies for finding mappings (Spasić et al., 2018), and we leverage this knowledge in the Alin metric.
–
We can generate variations of the entity name in two types, as defined in Definition 3.9: $$
One type of variation involves replacing adjectives in the entity name with nouns. To do this, we search for words classified as “noun or modifier’’ that are not the head noun and then query these words in WordNet. If WordNet classifies a word as an adjective, we retrieve all associated nouns and generate the variations.
$$
Another type of variation is permutation, which requires identifying the words in the entity name that are prepositions.

We utilize variations to improve string-based metrics and the head noun in the Alin metric.

The syntactic analyzer generates a list of all entities with syntactically incorrect names (Definition 3.14). We use this information to improve subsequent versions of the lexical analyzers by modifying syntactically incorrect names to ensure that they conform to the Target Grammar.
Definition 3.13 Entity with standard name

An entity has a standardized name when the lexical analyzer recognizes all characters in its name as part of lexemes, all of which can be classified into one of the three tokens defined in Definition 3.6.

An entity with a non-standard name includes at least one character, such as punctuation or special characters, that cannot be included in a lexeme. These and other characters may be incorporated into lexemes in subsequent versions of the lexical analyzer.

Definition 3.14 Entity with syntactically correct name

It is an entity with a standard name (Definition 3.13) whose name adheres to the Target Grammar (Definition 3.11).

An entity with a non-syntactically correct name is one that either has a non-standard name or whose name does not adhere to the Target Grammar. For example, this occurs when the lexical analyzer classifies the first lexeme of an entity name as a numeral.

Definition 3.15 Rejected entity

It is an entity flagged as invalid for comparison by the lexical analyzer. When this occurs, the entity is no longer considered part of those eligible for inclusion in a mapping in the alignment.

An entity can be rejected by the lexical analyzer when it is an entity with a non-standard or a non-syntactically correct name. It must also meet one of the following conditions:

–
The entity name has a synonym defined within the ontology, one that is syntactically correct. We can use it to search for a mapping instead of the original name;
–
We cannot standardize its name, or we can, and it still shows no similarity to any name in the other ontology.

Another approach for handling entities with non-standard or non-syntactically correct names is to standardize their names.
Definition 3.16 Entity with a name modified to become standardized

An entity whose name is standardized using a standardization technique to unify its spelling.

When the lexical analyzer standardizes an entity name, it must unify the spelling of the modified words, considering both ontologies.

Suppose we want to implement a technique to remove non-alphanumeric characters. One of the non-alphanumeric characters in the entity names contained in the list of non-standardized entity names from the last version of the lexical analyzers is the hyphen. After analyzing the entity names with hyphens in both ontologies, we determined that standardizing entity names with hyphens is preferable to removing the hyphens. For example, if one ontology includes the names “hindlimb bone’’ and “hind limb connective tissue,’’ while the other has “hind-limb joint,’’ it would be beneficial to retain the hyphen consistently. Therefore, we could modify the entity names to “hind-limb bone,’’ “hind-limb connective tissue,’ and “hind-limb joint,’’ thereby unifying the spelling across both ontologies.

The lexical analyzer writer should incorporate “hind-limb’’ as a valid word for the “noun or modifier’’ token into the lexical analyzers so that, in subsequent executions, the analyzers recognize entities whose names contain this word as having a standard name (Definition 3.13). These entities should no longer appear on the list of entities with non-standard names of the lexical analyzers.

The process of creating a lexical analyzer is as follows:

1.
Choose a standardization technique. In the first iteration of the lexical analyzer, the "Separation of the words" technique will be implemented.
2.
Select the entity names to be standardized by the selected technique. This step is only necessary after the first iteration. The selection of entity names can be based on the list of entities with non-standard names generated by the lexical analyzer. If the chosen technique is expected to affect entities with standard names, run also a dedicated program to identify such names. For example, we developed a program to list stop words.
3.
Create a new version of the lexical analyzer
4.
Run the lexical and syntactic analyzers on the ontologies and list the entities with non-standard names. This list may assist in selecting the next standardization technique and in identifying which entity names should be standardized.
5.
If any standardization technique remains unimplemented, return to step 1.

An example of this process can be found in Appendix A, where we detail the construction of the lexical analyzers for the mouse and human ontologies.
3.2 Is It Worth Using a Lexical Analyzer Writer?

With the proposal to develop lexical analyzers for ontologies in ontology matching, a new role emerges: the lexical analyzer writer.

Among the tasks of the lexical analyzer writer are:

–
Developing various versions of the lexical analyzer;
–
Running each version of the lexical analyzer and generating a list of non-standard entity names for that version;
–
Occasionally, creating programs to assist in identifying and handling entity names from the ontologies to standardize them;
–
Evaluating how to implement a standardization technique in the next version of the lexical analyzer based on the list of non-standard entity names from the current versions of both ontologies and the outputs of the programs he developed.

As lexical analyzer writers, we developed several programs to help in constructing the lexical analyzer. These programs can: –
Generate a list of words from the ontologies, illustrating how each ontology handles specific constructions in entity names, such as possessive case, the use of “and’’ and “or,’’ and others;
–
Automatically generate lines for the lexical analyzer, as we did in the sub-activity “standardize hyphenation’’ in subsection A.1.4.

Although the consideration of another professional to the process may increase complexity, this inclusion is justified given the following reasons: –
The work of the lexical analyzer writer will save effort for another professional, the system developer, as systems include entity name standardization in their code. Lexical analyzers can handle entity name standardization rather than system developers. He can do this initially without considering an ontology matching. Subsequently, for each pair of ontologies in an ontology matching, the original lexical analyzers can be adjusted to account for the specific ontologies involved;
–
Using the techniques described in this article, the lexical analyzer writer does not need to be familiar with the specific domain of the ontologies. That makes them a person who can be employed across multiple matching tasks;
–
Once we identify the ontologies, there is a time required to find domain experts, train them on the tool, and have them available to perform the alignment. During this period, the lexical analyzer writer can focus on developing the lexical analyzers.

From our experience, the lexical analyzer complexity increases with the size of the ontology and the number of non-standard entity names it contains. For the human ontology (For further information regarding the ontologies, refer to subsections 4.1.1 and 4.1.2), it took us four days to develop the lexical analyzer and supporting programs. In contrast, for the mouse ontology, smaller and with fewer non-standard entity names, we completed the work in one day, leveraging the programs previously developed for the human ontology. There are seven ontologies in the Conference track, each matched with all the others, resulting in 21 possible alignments. Although we initially planned to create one lexical analyzer for each ontology in each match, we developed one for each ontology that could handle all its matches. As a result, we developed seven lexical analyzers, which are significantly smaller than those required for the human and mouse ontologies. We complete the lexical analyzers in about one day;
–
We can reuse the lexical analyzers in any other matching system that performs the matching for which the lexical analyzer writer created them;
–
Most importantly, we improve the matching results by using lexical standardization—considering the ongoing matching process—and syntactic standardization, performed by the lexical analyzer writer, as discussed in Subsection 4.7.1.

3.3 Uses of ChatGPT in the Production of Lexical Analyzers

Acting as lexical analyzer writers, we utilized ChatGPT to standardize entity names containing abbreviations and acronyms. During the development of the lexical analyzer, we encountered situations where neither the ontologies nor external sources (WordNet and FMA) provided information on entity names with acronyms that could relate one entity to another from a different ontology. For example, with the name “hippocampus CA1’’ from the mouse ontology, we were unable to find the expansion of CA1 in either the ontologies or external sources. However, searching for this acronym in human ontology, we found the term “CA1_Field_of_the_Cornu_Ammonis.’’ Since we did not have clues from the available resources to determine whether these referred to the same entity, we consulted ChatGPT, which confirmed that they were synonyms. This allowed us to unify the names of the two entities, resulting in a successful mapping in the final alignment.

ChatGPT² is a software that generates human-like text based on the input it receives. It can utilize various large language models (LLMs), including GPT-3.5, GPT-4, and others. To standardize entity names containing abbreviations and acronyms, the GPT-4 model was used.

One of its key features is the ability to identify and suggest synonyms for terms, improving clarity and variety in writing. To determine synonyms, ChatGPT analyzes context and can recognize variations of a term, such as different grammatical forms, word inflections, abbreviations, domain-specific jargon, or regional preferences. With this feature, a reasonable question is the following: Does ChatGPT make the standardization of entity names unnecessary? We will attempt to answer this question in Subsection 4.6.

4 Experimental Evaluation

At the beginning of this section, we will discuss preliminary concepts, such as the OAEI tracks and the Alin (Silva et al., 2020) and AML (Faria et al., 2023) tools, which will serve as the foundation for our studies.

Next, we will present several experiments:

–
In the first experiment, we will demonstrate the impact of incorporating standardization techniques into each version of the lexical analyzer;
–
Then, using ALIN, we will show how lexical and syntactic standardization affect the quality of the generated alignment;
–
In the next study, we will explore the influence of lexical standardization on ChatGPT’s responses regarding the search for mappings;
–
Finally, we will investigate whether it is possible to use lexical analyzers created for ALIN in another tool, AML. We will examine the impact of this usage on the quality of the generated alignment.

4.1 Datasets

The Ontology Alignment Evaluation Initiative (OAEI) is a coordinated international effort to assess the strengths and weaknesses of ontology matching systems. As part of this initiative, the OAEI Interactive Matching Track (Pour et al., 2024) provides two data sets—Conference and Anatomy—used in our evaluations.

4.1.1 Conference Track

The Conference track (also known as the Conference dataset) includes 16 ontologies from the conference organization domain, created by real companies to model their systems. This dataset consists of moderate-sized, real-world ontologies, making it well-suited for ontology matching tasks because of its diverse origins. Since different individuals developed these ontologies, the same concepts are often labeled differently, contributing to their heterogeneous nature.

Among the 16 ontologies, 21 reference alignments are present for pairs of 7 of these ontologies, listed in Table 2. We used these seven ontologies for our tests, focusing on mappings between entities of the same type (e.g., class-to-class). The total number of possible mappings across the 21 alignments is approximately 125,000, while the reference alignments contain 305 mappings. Over time, this track has become one of the most widely used for matching evaluation.

Table 2.
Number of Classes, Attributes, and Relationships in the Conference Ontologies.

Ontology Classes Attributes Relationships

Ekaw 74 0 33

Sigkdd 49 11 17

Iasted 140 3 38

Cmt 36 10 49

Edas 104 20 30

ConfOf 38 23 13

Conference 60 18 46

Ontology	Classes	Attributes	Relationships
Ekaw	74	0	33
Sigkdd	49	11	17
Iasted	140	3	38
Cmt	36	10	49
Edas	104	20	30
ConfOf	38	23	13
Conference	60	18	46

The values are referenced from https://oaei.ontologymatching.org/2024/conference/index.html.

A characteristic of the ontologies in this track is the use of short names for classes and attributes, typically consisting of one or two words. Examples of these names include “Person,’’ “Reviewer,’’ “ProgramCommittee,’’ “Committee_member,’’ and “Passive_conference_participant.’’ To calculate the similarity metrics, we consider only the names of the classes and attributes without taking their descriptions into account.

4.1.2 Anatomy Track

The Anatomy track involves matching an ontology that describes adult mouse anatomy with an ontology from the NCI Thesaurus that details human anatomy. Although there are only two ontologies in the Anatomy track, their combined number of entities is significantly higher than the total number of entities in the seven ontologies with reference alignments in the Conference track. Specifically, the mouse ontology contains 2,738 classes, while the human ontology contains 3,298 classes. Neither ontology includes attributes. The mouse ontology has 3 relationships and the human ontology has 2. The total number of possible mappings between these ontologies is 9,066,176, with 1,516 mappings present in the reference alignment.

A characteristic of this track is the greater complexity of names compared to the Conference track. It is common for class names to consist of more than two words. Examples include: “Inferior Oblique Muscle,’’ “Epithelium of Human Prostate Gland,’’ “Antero-Lateral Ascending Tract,’’ “Distal Phalanx of Foot,’’ “visceral organ system,’’ “glomerular capillary basement membrane,’’ and “glomerular capsule.’’ In this track, we also calculate similarity metrics based solely on the entity names, without considering their descriptions.

Another characteristic of this track is the association of synonyms with entity names in the ontologies, which is more pronounced in the human ontology (6,104 synonyms) compared to the mouse ontology (345 synonyms).

4.2 Alin Overview

Alin is an interactive ontology matching tool that has participated in the OAEI interactive matching track since 2016. Since 2020, Alin has employed the Alin metric and lexical analyzers to standardize entity names. In OAEI 2024, Alin was evaluated on the interactive matching track using the Conference and Anatomy datasets, achieving the highest F-Measure among other tools in both datasets (Pour et al., 2024). Since its participation in OAEI 2023, we have developed programs to assist in building lexical analyzers to standardize entity names. It has improved the results of the matching process.

Alin can utilize a wide range of metrics, individually or in combination. Among these metrics is the Alin metric, which we developed to leverage the lexical and syntactic standardization of the entity names. Instead of directly comparing entity names, the Alin metric compares their underlying concepts or constituent concepts. Alin leverages this standardization, along with sources of background knowledge such as WordNet and FMA (see Appendix C.3), to associate entities with concepts. For a more detailed study of Alin, including the Alin metric and its methodology to associate entities with concepts, see Appendix C.

4.3 AML Overview

AgreementMakerLight³ (AML) is an automated and efficient ontology matching system. It has a flexible and extensible framework and is primarily based on the use of element-level matching techniques supported by background knowledge. AgreementMakerLight has been very successful in the OAEI competition, ranking first in F-measure in several tracks throughout the years including: Anatomy, Conference, Multifarm, Library, Interactive Matching Evaluation, and Large Biomedical Ontologies. AgreementMakerLight has been used by several institutions, including NASA, the Janssen Pharmaceutical Companies of Johnson & Johnson, and in the Global Agricultural Concept Scheme (GACS) from the Food and Agriculture Organization of the United Nations.

4.4 Analysis of the Evolution of the Lexical Analyzers in the Anatomy Track from Version to Version

We developed several versions of the lexical analyzer during its construction process, each potentially introducing new standardization techniques. In this subsection, we will examine the impact of incorporating these standardization techniques into the lexical analyzer on the quality of the generated alignment and the number of non-standard entity names.

To illustrate the evolution across versions, we analyzed the lexical analyzers of the Anatomy track ontologies by running Alin with the Alin metric and string-based metrics Jaccard⁴, Jaro-Winkler (Winkler, 1990) and n-gram (Kondrak, 2005).

We created Tables (3 and 4) that show the number of entities with non-standard names, syntactically incorrect names, and rejected items. We represented each version of the lexical analyzer by a Roman numeral on the graph. Since the first version, the lexical analyzer for the human ontology has not included any entities with standard but syntactically incorrect names; thus, we omitted these data from Table 3. Additionally, we created Table 5 to show the improvement in alignment quality and the number of expert interactions. Only versions with modifications in the lexical analyzer compared to the previous version are displayed.

Table 3.
Entities With Non-Standard Name and Rejected Entities in Each Version of the Lexical Analyzer of the Human Ontology.

Version Entities with non-standard name Rejected entities

I 657 0

II 634 0

III 634 38

IV 548 38

VI 544 38

VIII 178 288

IX 122 344

X 94 344

XI 0 438

Version	Entities with non-standard name	Rejected entities
I	657	0
II	634	0
III	634	38
IV	548	38
VI	544	38
VIII	178	288
IX	122	344
X	94	344
XI	0	438

Table 4.

Entities With Non-Standard or Non-Syntactically Correct Name and Rejected Entities in Each Version of the Lexical Analyzer of the Mouse Ontology.

Version	Entities with non-standard name	Rejected entities	Entities with non-syntactically correct name
I	179	0	12
II	106	0	0
III	66	26	0
IV	30	26	0
V	13	43	0
VI	5	49	0
VII	0	49	0

We processed all terms related to classes, attributes, and their synonyms through the lexical analyzers, totaling 9,402 terms from the human ontology (3,298 classes, 0 attributes, and 6,104 synonyms) and 3,083 terms from the mouse ontology (2,738 classes, 0 attributes, and 345 synonyms).

In Table 4 the column “entities with non-syntactically correct names’’ represents the number of entities with standard but non-syntactically correct names, noting that every entity with a non-standard name is also an entity with a non-syntactically correct name.

The tables indicate that in most new versions, the number of entities with non-standard or syntactically incorrect names decreased compared to the previous version, ultimately reaching zero. In Version VII of the mouse ontology, we achieved this reduction to zero by implementing the last standardization technique. In the final version, the lexical analyzer rejected 49 entities in the mouse ontology, representing 1.58% of the total of entity names, while for the human ontology, 438 entities were rejected, accounting for 4.66% of the total of entity names.

The results presented (Table 5) demonstrate that the inclusion of new standardization techniques in lexical analyzers improves the quality of the generated alignment.

Table 5.

Performance of Alin for Each Lexical Analyzer Version, With the alin Metric and With String-Based Metrics.

Version	Number of interactions	Precision	F-Measure	Recall
I	404	0.9871	0.9193	0.8602
II	422	0.9859	0.9269	0.8747
III	405	0.986	0.934	0.8872
IV	404	0.9863	0.9414	0.9004
V	404	0.9863	0.9414	0.9004
VI	403	0.9863	0.9406	0.899
VII	403	0.9863	0.9425	0.9024
VIII	401	0.9863	0.9432	0.9037
IX	401	0.9863	0.9432	0.9037
X	405	0.9866	0.9525	0.9208
XI	405	0.9866	0.9525	0.9208

All versions of the lexical analyzers were uploaded to OSF⁵.

4.5 Comparison of Various Uses of Lexical Analyzers Using the Alin Tool

To evaluate the lexical analyzers, we followed the same protocol as the OAEI interactive matching track (Pour et al., 2024) in which Alin participated (Silva et al., 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024). The OAEI provides reference alignments to evaluate the performance of ontology matching tools. Alin utilizes these reference alignments to simulate expert responses.

In each interaction, up to three selected mappings can be submitted to the expert, as long as each selected mapping has one entity in common with another selected mapping in the interaction (Faria, 2016).

The quality of an alignment generated by a matching approach is typically measured using the F-Measure, the harmonic mean of Recall and Precision. In interactive ontology matching processes, an additional quality metric is the number of interactions with the expert (total requests).

To assess whether lexical standardization considering the other ontology and syntactic standardization considering the Target Grammar improve the use of similarity metrics, we will run Alin six times with different configurations:

In the first execution, we ran Alin using three string-based metrics: Jaccard, Jaro-Winkler and n-gram. In this execution, we did not use the lexical analyzers, and the similarity metrics were calculated without changing the entity names;

In the second execution, we employed lexical analyzers to implement standardization techniques without considering the other ontology. The techniques included “Separation of the words,” “Conversion of Roman numerals to Arabic numerals,” “Removal of stop words,” “Removal of non-alphanumeric characters or their replacement with white spaces,” and “Removal of punctuation characters.” From this point onward, entity name conversion to uppercase and stemming were applied, but rather than including them in the lexical analyzer, we implemented these processes directly in the code;

In the third and subsequent executions, we used the latest version of the lexical analyzers, which took the other ontology into account.

In the fourth execution, we ran Alin without the string-based metrics, relying solely on the Alin metric;

In the fifth execution, we included the three string-based metrics along with the Alin metric, applying the string-based metrics as in the third execution;

In the sixth execution, we used the same metrics as in the fifth execution but modified the application of the string-based metrics. Instead of applying the string metric solely to the entity name returned by the lexical analyzer, we incorporated variations and synonyms. For example, consider two entities: the first has two variations and two synonyms, resulting in five associated names, including the original. If we aim to calculate the string metric relative to a second entity with three associated names, this will result in fifteen comparisons. Alin assigns the highest value found to the metric calculation.

In the second and third executions, we opted to include in the alignment, without requiring expert approval, all names that became identical after preprocessing, i.e., after applying the lexical analyzers. In the last three executions, we no longer included mappings with identical entity names in the alignment. Instead, we included mappings where entities shared the same concept (Appendix C.3) or where the value of the Alin metric was 1. In all Alin executions, it presents the mappings with the highest sum of the metrics used in the matching process to the expert.

We can see the results of these executions in Tables 6 and 7.

Table 6.
Comparison of Various Uses of Lexical Analyzers—Conference.

Total Requests Precision F-Measure Recall

String Metrics 51 0.767 0.564 0.456

Standardization without considering the other ontology 82 0.768 0.570 0.465

Standardization considering the other ontology 119 0.780 0.606 0.515

ALIN Metric 119 0.894 0.750 0.659

ALIN Metric with String Metrics with Standardization 208 0.908 0.798 0.725

ALIN Metric with String Metrics with Variations 221 0.908 0.799 0.725

	Total Requests	Precision	F-Measure	Recall
String Metrics	51	0.767	0.564	0.456
Standardization without considering the other ontology	82	0.768	0.570	0.465
Standardization considering the other ontology	119	0.780	0.606	0.515
ALIN Metric	119	0.894	0.750	0.659
ALIN Metric with String Metrics with Standardization	208	0.908	0.798	0.725
ALIN Metric with String Metrics with Variations	221	0.908	0.799	0.725

Table 7.

Comparison of Various Uses of Lexical Analyzers—Anatomy.

	Total Requests	Precision	F-Measure	Recall
String Metrics	0	0.997	0.762	0.616
Standardization without considering the other ontology	266	0.997	0.844	0.732
Standardization considering the other ontology	291	0.997	0.858	0.753
ALIN Metric	229	0.986	0.922	0.865
ALIN Metric with String Metrics with Standardization	367	0.986	0.941	0.9
ALIN Metric with String Metrics with Variations	405	0.987	0.952	0.921

4.6 Does ChatGPT Render the Standardization of Entity Names Unnecessary?

One of the features of ChatGPT is its ability to recognize synonyms, including variations of a term, such as different grammatical forms, word inflections, or abbreviations. Therefore, a reasonable question is: Does ChatGPT make the standardization of entity names unnecessary?

To address this question, we will conduct the following experiment: we will perform ontology matching twice with Alin for each track, once using the original entity names and once using the entity names lexically standardized by the lexical analyzers. Alin typically accesses the reference alignment to simulate the response of the domain expert. We modified Alin so that, instead of accessing the reference alignment, it queries ChatGPT. To perform these queries, we used ChatGPT 3.5.

Initially, we considered comparing all entity names, but the number of possible combinations is too large—approximately 125,000 on the Conference track and around 9,000,000 on the Anatomy track. That would result in both matchings taking many days to complete and, in the case of the Anatomy track, even several weeks, as accessing ChatGPT through a program is slow. Additionally, ChatGPT, when invoked from within a program, limits the number of similar questions it can process. Once we exceed this threshold, it intermittently does not respond to the questions. Based on our experience, we reached this limit after fewer than 2,000 queries.

We experimented as follows: ChatGPT will evaluate only the mappings selected by Alin, with the selection process following the same method as the sixth run in Subsection 4.5. Thus, for both executions of each track, ChatGPT will analyze the same set of mappings. We will match twice for each track: once with ChatGPT analyzing the original names to find mappings, and once with it analyzing the lexically modified names produced by the lexical analyzers.

We can see the results in Table 8, which shows that the standardization of entity names improves the quality of the ChatGPT responses.

Table 8.
Ontology Matching Using ChatGPT.

Track Total Requests to ChatGPT Precision F-Measure Recall

Conference—No Standardization 893 0.857 0.596 0.466

Conference—With Standardization 850 0.853 0.642 0.525

Anatomy—No Standardization 1421 0.991 0.772 0.633

Anatomy—With Standardization 1425 0.995 0.824 0.703

Track	Total Requests to ChatGPT	Precision	F-Measure	Recall
Conference—No Standardization	893	0.857	0.596	0.466
Conference—With Standardization	850	0.853	0.642	0.525
Anatomy—No Standardization	1421	0.991	0.772	0.633
Anatomy—With Standardization	1425	0.995	0.824	0.703

4.7 Use of Lexical Analyzers in AML

To evaluate whether we can utilize the lexical analyzers generated for the Anatomy track in another tool and whether they improve its results, we modified AML to perform matching with these lexical analyzers. To implement the modifications, we downloaded the AML source code from GitHub⁶.

We integrated the classes related to the two lexical analyzers into the AML project and modified the method that invokes the standardization of entity names. We modified this method to provide three execution options:

One option in which it does not utilize the standardization techniques already employed by the program;

Another option in which it employs its standardization techniques as usual;

A third option where, instead of using its standardization techniques, it utilizes the lexical analyzers.

The AML uses a configuration file that allows us to select which matching techniques we wish to employ during its execution. We configured this file in two ways: one performing matching solely with string matching and the other utilizing all available matching techniques, which include using background knowledge bases such as Uberon and DOID, word matching, structural matching, and ontology repair. To perform string matching, AML uses the Jaccard similarity metric. We executed the program to automatically select the mappings for the final alignment in a non-interactive way.

We conducted five executions of the AML, characterized by the following features:

One employs only string matching without using the standardization techniques already implemented by the program;

Another uses only string matching while employing the standardization techniques it typically utilizes;

A third uses only string matching but employs the lexical analyzers;

A fourth utilizes all matching techniques of the AML while employing its standardization techniques;

A fifth utilizes all matching techniques of the AML but employs the lexical analyzers.

We can see the results in Table 9.

Table 9.
Use of Lexical Analyzers in AML.

Precision F-Measure Recall

Only string matching with no standardization 0.984 0.147 0.079

Only string matching with standardization 0.952 0.855 0.776

Only string matching with lexical analyzers 0.953 0.872 0.804

Using all matching techniques with standardization 0.934 0.928 0.922

Using all matching techniques with the lexical analyzers 0.945 0.932 0.919

	Precision	F-Measure	Recall
Only string matching with no standardization	0.984	0.147	0.079
Only string matching with standardization	0.952	0.855	0.776
Only string matching with lexical analyzers	0.953	0.872	0.804
Using all matching techniques with standardization	0.934	0.928	0.922
Using all matching techniques with the lexical analyzers	0.945	0.932	0.919

4.7.1 Analysis of the Results

Our results demonstrate that considering the other ontology during lexical standardization yields better outcomes than disregarding it, as evidenced by the comparison between the results of the third and second executions in Tables 6 and 7, where we can observe that the F-Measure for the Conference track increased from 0.570 to 0.606, while the growth in the Anatomy track was from 0.844 to 0.858.

Syntactic standardization has also proved beneficial. The use of the Alin metric depends on the entity names adhering to a syntax provided by the Target Grammar. Comparing the results of the fourth and fifth executions with those of the third execution reveals that the Alin metric yielded positive results, with the F-measure increasing from 0.858 to 0.922 and then to 0.941 on the Anatomy track, and from 0.606 to 0.750 and then to 0.798 on the Conference track.

Using synonyms and variations in the sixth run resulted in ambiguous outcomes. Although the Conference track showed almost no variation between the sixth and fifth runs, the Anatomy track exhibited a more significant change, with the F-Measure increasing from 0.941 to 0.952. We can attribute this difference to the fact that the Anatomy track ontologies include many synonyms (6,104 in the human ontology and 345 in the mouse ontology), whereas the Conference ontologies contain no synonyms. The Conference track contains entity names with few words, resulting in fewer variations, which also impacts the results.

Table 8 shows us that the standardization of entity names leads to higher quality responses from ChatGPT, which results in improved alignment when using it. We achieved that on both the Conference track (an increase in F-measure from 0.577 to 0.631) and the Anatomy track (an increase in F-measure from 0.772 to 0.824).

The experiment carried out with AML demonstrates that it is possible to reuse the lexical analyzers in another tool that is not the one for which we originally developed them, to perform the same ontology matching. Table 9 indicates that there was a quality improvement compared to the standardization typically performed by the tool, both when running the program using only string matching (an increase in F-Measure from 0.855 to 0.872) and when running it with all available techniques (an increase from 0.928 to 0.932). However, it is significant to note the observed decrease in recall (from 0.922 to 0.919).

5 Conclusion and Future Work

Many tools lexically standardize entity names before calculating the values of similarity metrics, often overlooking both the ongoing matching process and the syntactic standardization of entity names.

Lexical standardization focuses on standardizing the spelling of words and symbols in entity names. Syntactic standardization focuses on ensuring that the entity names in both ontologies follow the same grammar.

We evaluated the effectiveness of this standardization by evaluating how the inclusion of ongoing matching and the proposed grammar improves the quality of the generated alignment. For this evaluation, we used the ontology matching systems Alin and AML in conjunction with the OAEI Anatomy and Conference tracks.

The hypothesis we will verify in this paper is that standardizing entity names both lexically and syntactically according to the proposed grammar, while considering the involved ontologies, improves the effectiveness of similarity metrics in the ontology matching process.

To measure the improvement in alignment quality due to lexical standardization, while considering the other ontology, we compared the performance of Alin using the Anatomy and Conference tracks, as shown in rows 2 and 3 of Tables 6 and 7. The results indicate an increase in the F-Measure from 0.844 to 0.858 in the Anatomy track and from 0.570 to 0.606 in the Conference track.

To leverage the syntactic standardization of entity names, we developed a new metric, the Alin metric. This metric requires identifying the head noun, which is facilitated by the Target Grammar. Comparing rows 3, 4 and 5 of Tables 6 and 7, we observed an increase in the F-Measure from 0.858 to 0.922, then to 0.941 in the Anatomy track, from 0.606 to 0.750, and then to 0.798 in the Conference track.

The Conference track did not show gain using synonyms and variations in the sixth run, but the Anatomy track exhibited improvement, with the F-Measure increasing from 0.941 to 0.952. Variations of an entity name occur when, for example, we substitute a noun with its corresponding adjective or when we permute its modifiers.

We also compared the response quality of ChatGPT with and without the standardization of entity names, finding that standardization improves ChatGPT’s response quality, with the F-Measure increasing from 0.596 to 0.642 in the Conference track and from 0.772 to 0.824 in the Anatomy track, as illustrated in Table 8.

The experiment carried out with AML demonstrates that it is possible to reuse the lexical analyzers in another tool that is not the one for which we originally developed them, to perform the same ontology matching. There was a quality improvement when the program was executed using only string matching (an increase in F-Measure from 0.855 to 0.872) and executing the program with all available techniques (an increase from 0.928 to 0.932), as illustrated in Table 9.

An area for future exploration is the standardization of entity names not only within ontologies but also across external resources such as WordNet and FMA. Standardizing terms from WordNet and FMA could prove valuable for identifying new mappings, especially in cases where a synonym exists in these external resources but not within the involved ontologies.

Another area of interest is the automation of lexical analyzer construction. We have developed programs that provide information to the lexical analyzer writer. In some cases, these programs directly generate the lines to be included in the lexical analyzer, suggesting that we can further automate this process.

Footnotes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Kate Revoredo is funded by the Berliner Chancengleichheitsprogramm (BCP) as part of the DiGiTal Graduate Program. Fernanda Baiao is partially funded by FAPERJ (grants 200.514/2023 and 211.308/2019) and CNPq (grants 312059/2022-1 and 422810/2021-5). The article processing charge was funded by the Open Access Publication Fund of Humboldt-Universität zu Berlin.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

Orcid iDs

Jomar Silva

Kate Revoredo

Fernanda Araujo Baião

Cabral Lima

Appendix A.

Appendix B

Appendix C

References

Achichi

Bellahsene

Todorov

(2017). Legato results for OAEI 2017. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 12th International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference (ISWC 2017), CEUR Workshop Proceedings, Vol. 2032, CEUR-WS.org, Aachen, Germany (pp. 146–152). https://ceur-ws.org/Vol-2032/oaei17_paper6.pdf

Annane

Bellahsene

(2020). GBKOM: A generic framework for BK-based ontology matching. Journal of Web Semantics, 63, 100563. https://doi.org/10.1016/j.websem.2020.100563 . https://www.sciencedirect.com/science/article/pii/S1570826820300111

Bethea

W. L.

Fink

Beecher-Deighan

(2006). JHU/APL Onto-Mapology Results for OAEI 2006. In P. Shvaiko, J. Euzenat, N.F. Noy, H. Stuckenschmidt, V.R. Benjamins & M. Uschold (Eds.), Proceedings of the 1st International Workshop on Ontology Matching (OM-2006) collocated with the 5th International Semantic Web Conference (ISWC-2006), CEUR Workshop Proceedings, Vol. 225, CEUR-WS.org, Aachen, Germany. https://ceur-ws.org/Vol-225/paper13.pdf

Bouma

(2009). Cross-Lingual Dutch to English Alignment Using EuroWordNet and Dutch Wikipedia. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt & N. Noy (Eds.), Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) collocated with the 8th International Semantic Web Conference (ISWC-2009), CEUR Workshop Proceedings, Vol. 551, CEUR-WS.org, Aachen, Germany (pp. 224–229). http://www.dit.unitn.it/∼p2p/OM-2009/oaei09_paper13.pdf

Byrne

Fokoue

Kalyanpur

Srinivas

Wang

(2009). Scalable Matching of Industry Models: A Case Study. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt & N. Noy (Eds.), Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) collocated with the 8th International Semantic Web Conference (ISWC-2009), CEUR Workshop Proceedings, Vol. 551, CEUR-WS.org, Aachen, Germany (pp. 1–12). http://www.dit.unitn.it/∼p2p/OM-2009/om2009_Tpaper1.pdf

Cerón-Figueroa

López-Yáñez

Alhalabi

Camacho-Nieto

Villuendas-Rey

Aldape-Pérez

Yáñez-Márquez

(2017). Instance-based ontology matching for e-learning material using an associative pattern classifier. Computers in Human Behavior, 69, 218–225. https://doi.org/10.1016/j.chb.2016.12.039 . https://www.sciencedirect.com/science/article/pii/S0747563216308603

Cerón-Figueroa

López-Yáñez

Villuendas-Rey

Camacho-Nieto

Aldape-Pérez

Yáñez-Márquez

(2017). Instance-based ontology matching for open and distance learning materials. The International Review of Research in Open and Distributed Learning, 18(1). https://doi.org/10.19173/irrodl.v18i1.2681 . https://www.irrodl.org/index.php/irrodl/article/view/2681

Chakraborty

Yaman

Virgili

Konar

Bansal

S. K.

(2020). OntoConnect: Results for OAEI 2020. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 15th International Workshop on Ontology Matching collocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Vol. 2788, CEUR-WS.org, Aachen, Germany (pp. 204–210). https://disi.unitn.it/∼pavel/om2020/papers/oaei20_paper11.pdf

Chapman

(2007). Simmetrics: Open Source Similarity Measure Library, Accessed: 2025-02-12. http://sourceforge.net/projects/simmetrics/

10.

Cheatham

Hitzler

(2013). StringsAuto and MapSSS results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 146–152). https://ceur-ws.org/Vol-1111/oaei13_paper7.pdf

11.

Cheng

Ursu

Oprea

T. I.

Schurer

(2017). Learning reference-enriched approach towards large scale active ontology alignment and integration. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE Computer Society, Los Alamitos, CA, USA (pp. 1658–1663). https://doi.org/10.1109/BIBM.2017.8217908. https://doi.ieeecomputersociety.org/10.1109/BIBM.2017.8217604

12.

Chua

W. W. K.

Kim

J.-J.

(2010). Eff2Match Results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao & I. Cruz (Eds.), Proceedings of the 5th International Workshop on Ontology Matching (OM-2010) collocated with the 9th International Semantic Web Conference (ISWC-2010), CEUR Workshop Proceedings, Vol. 689, CEUR-WS.org, Aachen, Germany (pp. 150–157). http://www.dit.unitn.it/∼p2p/OM-2010/oaei10_paper5.pdf

13.

Cruz

I. F.

Antonelli

F. P.

Stroe

(2009). Efficient Selection of Mappings and Automatic Quality-Driven Combination of Matching Methods. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt & N. Noy (Eds.), Proceedings of the 4th International Workshop on Ontology Matching (OM-2009) collocated with the 8th International Semantic Web Conference (ISWC-2009), CEUR Workshop Proceedings, Vol. 551, CEUR-WS.org, Aachen, Germany (pp. 49–60). https://ceur-ws.org/Vol-551/om2009_Tpaper5.pdf

14.

Damak

Souid

Kachroudi

Zghal

(2015). EXONA results for OAEI 2015. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, Vol. 1545, CEUR-WS.org, Aachen, Germany (pp. 145–149). https://ceur-ws.org/Vol-1545/oaei15_paper5.pdf

15.

David

Euzenat

Scharffe

Trojahn dos Santos

(2011). The alignment API 4.0. Semantic Web, 2(1), 3–10. https://doi.org/10.3233/SW-2011-0028

16.

Dhouib

Faron-Zucker

Tettamanzi

(2021). Measuring clusters of labels in an embedding space to refine relations in ontology alignment. Journal on Data Semantics, 10, 399–408. https://doi.org/10.1007/s13740-021-00137-8

17.

Diallo

(2012). Effective method for large scale ontology matching. In A. Paschke, A. Burger, P. Romano, M.S. Marshall & A. Splendiani (Eds.), Proceedings of the 5th International Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS), CEUR Workshop Proceedings, Vol. 952, CEUR-WS.org, Aachen, Germany. http://ceur-ws.org/Vol-952/paper_27.pdf

18.

Djeddi

W. E.

Khadir

(2011). A Dynamic Multistrategy Ontology Alignment Framework Based on Semantic Relationship using WordNet. In A. Amine, O.A. Mohamed, B. Benatallah & Z. Elberrichi (Eds.), Proceedings of the Third International Conference on Computer Science and its Applications (CIIA’11), CEUR Workshop Proceedings, Vol. 825, CEUR-WS.org, Aachen, Germany. https://ceur-ws.org/Vol-825/paper_238.pdf

19.

Doan

Madhavan

Domingos

Halevy

(2002). Learning to Map between Ontologies on the Semantic Web. In Proceedings of the 11th International Conference on World Wide Web, WWW’02, Association for Computing Machinery, New York, NY, USA (pp. 662–673). ISBN 1581134495. https://doi.org/10.1145/511446.511532.

20.

Efeoglu

(2022). GraphMatcher: A graph representation learning approach for ontology matching. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 17th International Workshop on Ontology Matching (OM 2022) collocated with the 21th International Semantic Web Conference (ISWC 2022), CEUR Workshop Proceedings, Vol. 3324, CEUR-WS.org, Aachen, Germany (pp. 174–180). http://ceur-ws.org/Vol-3324/oaei22_paper7.pdf

21.

Espinoza

Trillo

Gracia

Mena

(2006). Discovering and merging keyword senses using ontology matching. In P. Shvaiko, J. Euzenat, N.F. Noy, H. Stuckenschmidt, V.R. Benjamins & M. Uschold (Eds.), Proceedings of the 1st International Workshop on Ontology Matching (OM-2006), held in conjunction with the 5th International Semantic Web Conference (ISWC-2006), CEUR Workshop Proceedings, Vol. 225, CEUR-WS.org, Aachen, Germany (pp. 211–215). https://dl.acm.org/doi/10.5555/2889633.2889655

22.

Euzenat

Shvaiko

(2013). Ontology matching—Second Edition. Springer. ISBN 978-3-642-38720-3. https://doi.org/10.1007/978-3-642-38721-0.

23.

Fallatah

Zhang

Hopfgartner

(2021). A hybrid approach for large knowledge graphs matching. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 16th International Workshop on Ontology Matching collocated with the 20th International Semantic Web Conference (ISWC 2021), CEUR Workshop Proceedings, Vol. 3063, CEUR-WS.org, Aachen, Germany (pp. 37–48). http://disi.unitn.it/∼pavel/om2021/papers/om2021_LTpaper4.pdf

24.

Fallatah

Zhang

Hopfgartner

(2022). KGMatcher+ results for OAEI 2022. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 17th International Workshop on Ontology Matching (OM 2022) collocated with the 21th International Semantic Web Conference (ISWC 2022), CEUR Workshop Proceedings, Vol. 3324, CEUR-WS.org, Aachen, Germany (pp. 181–187). https://ceurspt.wikidata.dbis.rwth-aachen.de/Vol-3324/oaei22_paper8.html

25.

Faria

(2016). Using the SEALS Client’s Oracle in Interactive Matching, Technical Report, Ontology Alignment Evaluation Initiative. https://github.com/DanFaria/OAEI_SealsClient/blob/master/OracleTutorial.pdf

26.

Faria

Santos

Balasubramani

B. S.

Silva

M. C.

Couto

F. M.

Pesquita

(2023). Agreementmakerlight, Semantic Web Journal Preprint. https://www.semantic-web-journal.net/content/agreementmakerlight-0

27.

Fellbaum

(1998). WordNet: An Electronic Lexical Database. MIT Press. ISBN 9780262061971.

28.

Georgala

Röder

Sherif

M. A.

Ngonga Ngomo

(2020). Applying edge-counting semantic similarities to Link Discovery: Scalability and Accuracy. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 15th International Workshop on Ontology Matching collocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Vol. 2788, CEUR-WS.org, Aachen, Germany (pp. 36–47). https://disi.unitn.it/∼pavel/om2020/papers/om2020_LTpaper4.pdf

29.

Giunchiglia

Shvaiko

Yatskevich

(2004). S-Match: An Algorithm and an Implementation of Semantic Matching. In C.J. Bussler, J. Davies, D. Fensel, & R. Studer (Eds.), The Semantic Web: Research and Applications, Lecture Notes in Computer Science (LNCS), Vol. 3053, Springer Berlin Heidelberg (pp. 61–75). ISBN 978-3-540-25956-5. https://link.springer.com/chapter/10.1007/978-3-540-25956-5_5

30.

Gomes

J. A. R.

Gançarski

A. L.

Henriques

P. R.

(2022). OMT, a Web-Based Tool for Ontology Matching. In J.A. Cordeiro, M.J.a. Pereira, N.F. Rodrigues & S.A. Pais (Eds.), 11th Symposium on Languages, Applications and Technologies (SLATE 2022), Open Access Series in Informatics (OASIcs), Vol. 104, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany (pp. 8:1–8:12). ISSN 2190-6807. ISBN 978-3-95977-245-7. https://doi.org/10.4230/OASIcs.SLATE.2022.8.

31.

Gosselin

Zouaq

(2022). SEBMatcher results for OAEI 2022. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 17th International Workshop on Ontology Matching (OM 2022) collocated with the 21th International Semantic Web Conference (ISWC 2022), CEUR Workshop Proceedings, Vol. 3324, CEUR-WS.org, Aachen, Germany (pp. 202–209). https://ceur-ws.org/Vol-3324/oaei22_paper12.pdf

32.

Groß

Hartung

Kirsten

Rahm

(2012). GOMMA Results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao & N. Noy (Eds.), Proceedings of the 7th International Workshop on Ontology Matching (OM-2012) collocated with the 11th International Semantic Web Conference (ISWC-2012), CEUR Workshop Proceedings, Vol. 946, CEUR-WS.org, Aachen, Germany (pp. 133–140). http://www.dit.unitn.it/∼p2p/OM-2012/oaei12_paper3.pdf

33.

Gulic

Vrdoljak

(2013). CroMatcher—Results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 117–122). http://www.dit.unitn.it/∼p2p/OM-2013/oaei13_paper3.pdf

34.

Hamdi

Zargayouna

Safar

Reynaud

(2008). TaxoMap in the OAEI 2008 Alignment Contest. In P. Shvaiko, J. Euzenat, F. Giunchiglia & H. Stuckenschmidt (Eds.), Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008) collocated with the 7th International Semantic Web Conference (ISWC-2008), CEUR Workshop Proceedings, Vol. 431, CEUR-WS.org, Aachen, Germany (pp. 206–213). https://ceur-ws.org/Vol-431/oaei08_paper12.pdf

35.

Hassanzadeh

Ward

M. J.

Rodriguez-Muro

Srinivas

(2015). Understanding a large corpus of web tables through matching with knowledge bases: An empirical study. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, Vol. 1545, CEUR-WS.org, Aachen, Germany (pp. 25–34). http://www.dit.unitn.it/∼pavel/om2015/papers/om2015_TLpaper3.pdf

36.

Chen

Antonyrajah

Horrocks

(2021). Biomedical ontology alignment with BERT. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 16th International Workshop on Ontology Matching collocated with the 20th International Semantic Web Conference (ISWC 2021), CEUR Workshop Proceedings, Vol. 3063, CEUR-WS.org, Aachen, Germany (pp. 1–12). https://ceur-ws.org/Vol-3063/om2021_LTpaper1.pdf

37.

Hertling

(2012). Hertuda results for OEAI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao & N. Noy (Eds.), Proceedings of the 7th International Workshop on Ontology Matching (OM-2012) collocated with the 11th International Semantic Web Conference (ISWC-2012), CEUR Workshop Proceedings, Vol. 946, CEUR-WS.org, Aachen, Germany (pp. 141–144). http://www.dit.unitn.it/∼p2p/OM-2012/oaei12_paper4.pdf

38.

Hertling

Paulheim

(2012). WikiMatch: Using Wikipedia for Ontology Matching. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao & N. Noy (Eds.), Proceedings of the 7th International Workshop on Ontology Matching (OM-2012) collocated with the 11th International Semantic Web Conference (ISWC-2012), CEUR Workshop Proceedings, Vol. 946, CEUR-WS.org, Aachen, Germany (pp. 37–48). http://www.dit.unitn.it/∼p2p/OM-2012/oaei12_paper15.pdf

39.

Hertling

Paulheim

(2019). DOME Results for OAEI 2019. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 14th International Workshop on Ontology Matching collocated with the 18th International Semantic Web Conference (ISWC 2019), CEUR Workshop Proceedings, Vol. 2536, CEUR-WS.org, Aachen, Germany (pp. 123–130). http://www.dit.unitn.it/∼pavel/om2019/papers/oaei19_paper6.pdf

40.

Hertling

Paulheim

(2020). ATBox results for OAEI 2020. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 15th International Workshop on Ontology Matching collocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Vol. 2788, CEUR-WS.org, Aachen, Germany (pp. 168–175). https://ceur-ws.org/Vol-2788/oaei20_paper5.pdf

41.

Hertling

Portisch

Paulheim

(2020). Supervised Ontology and Instance Matching with MELT, CoRR abs/2009.11102. https://doi.org/10.48550/arXiv.2009.11102. https://arxiv.org/abs/2009.11102

42.

Cheng

Zheng

Zhong

(2006). The Results of Falcon-AO in the OAEI 2006 Campaign. In P. Shvaiko, J. Euzenat, N.F. Noy, H. Stuckenschmidt, V.R. Benjamins & M. Uschold (Eds.), Proceedings of the 1st International Workshop on Ontology Matching (OM-2006) collocated with the 5th International Semantic Web Conference (ISWC-2006), CEUR Workshop Proceedings, Vol. 225, CEUR-WS.org, Aachen, Germany (pp. 124–133). http://www.dit.unitn.it/∼p2p/OM-2006/11-Falcon-OAEI’06.pdf

43.

Huber

Sztyler

Noessner

Meilicke

(2011). CODI: Combinatorial optimization for data integration—Results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao & I. Cruz (Eds.), Proceedings of the 6th International Workshop on Ontology Matching (OM-2011) In conjunction with the International Semantic Web Conference (ISWC2011), CEUR Workshop Proceedings, Vol. 814, CEUR-WS.org, Aachen, Germany (pp. 134–141). https://ceur-ws.org/Vol-814/oaei11_paper4.pdf

44.

Jean-Mary

Y. R.

Shironoshita

E. P.

Kabuka

M. R.

(2009). Ontology matching with semantic verification. Journal of Web Semantics, 7(3), 235–251. https://doi.org/10.1016/j.websem.2009.04.001

45.

Jiménez-Ruiz

Agibetov

Chen

Samwald

Cross

(2020). Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-Based Modules. In G.D. Giacomo, A. Catalá, B. Dilkina, M. Milano, S. Barro, A. Bugarın & J. Lang (Eds.), ECAI 2020—24th European Conference on Artificial Intelligence Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), Frontiers in Artificial Intelligence and Applications, Vol. 325, IOS Press, Amsterdam, Netherlands (pp. 784–791). https://doi.org/10.3233/FAIA200167

46.

Jiménez-Ruiz

Agibetov

Samwald

Cross

V. V.

(2018). We divide, you conquer: From large-scale ontology alignment to manageable subtasks with a lexical index and neural embeddings. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, C. Trojahn, M. Achichi, & K. Todorov (Eds.), Proceedings of the 13th International Workshop on Ontology Matching collocated with the 17th International Semantic Web Conference (ISWC 2018), CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, Aachen, Germany (pp. 13–24). https://disi.unitn.it/∼pavel/om2018/papers/om2018_LTpaper2.pdf

47.

Jiménez-Ruiz

Grau

B. C.

Zhou

Horrocks

(2012). Large-scale interactive ontology matching: Algorithms and implementation. Frontiers in Artificial Intelligence and Applications, 242, 444–449. https://doi.org/10.3233/978-1-61499-098-7-444

48.

Kammoun

Diallo

(2013). ServOMap Results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 169–176). http://www.dit.unitn.it/∼p2p/OM-2013/oaei13_paper10.pdf

49.

Karpathiotaki

Dogani

Koubarakis

(2014). Enabling semantic search for EO products: an ontology matching approach. In P. Shvaiko, J. Euzenat, M. Mao, E. Jiménez-Ruiz, J. Li & A. Ngonga (Eds.), Proceedings of the 9th International Workshop on Ontology Matching collocated with the 13th International Semantic Web Conference (ISWC 2014), CEUR Workshop Proceedings, Vol. 1317, CEUR-WS.org, Aachen, Germany (pp. 186–187). https://ceur-ws.org/Vol-1317/om2014_poster9.pdf

50.

Khan

W. A.

Amin

M. B.

Khattak

A. M.

Hussain

Lee

(2013). System for Parallel Heterogeneity Resolution (SPHeRe) Results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 184–189). http://www.dit.unitn.it/∼p2p/OM-2013/oaei13_paper12.pdf

51.

Khiat

(2016). CroLOM: cross-lingual ontology matching system results for OAEI 2016. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh & R. Ichise (Eds.), Proceedings of the 11th International Workshop on Ontology Matching collocated with the 15th International Semantic Web Conference (ISWC 2016), CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, Aachen, Germany (pp. 146–152). https://ceur-ws.org/Vol-1766/oaei16_paper3.pdf

52.

Khiat

Benaissa

(2015). InsMT+ results for OAEI 2015 instance matching. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, Vol. 1545, CEUR-WS.org, Aachen, Germany (pp. 158–161). https://ceur-ws.org/Vol-1545/oaei15_paper7.pdf

53.

Khiat

Benaissa

Belfedhal

(2015). STRIM Results for OAEI 2015 Instance Matching Evaluation. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, Vol. 1545, CEUR-WS.org, Aachen, Germany (pp. 208–215). http://www.dit.unitn.it/∼pavel/om2015/papers/oaei15_paper16.pdf

54.

Khiat

Mackeprang

(2017). I-Match and OntoIdea results for OAEI 2017. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 12th International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference (ISWC 2017), CEUR Workshop Proceedings, Vol. 2032, CEUR-WS.org, Aachen, Germany (pp. 135–137). http://www.dit.unitn.it/∼pavel/om2017/papers/oaei17_paper4.pdf

55.

Khiat

Ouhiba

E. A.

Belfedhal

M. A.

Zoua

C. E.

(2016). SimCat results for OAEI 2016. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh & R. Ichise (Eds.), Proceedings of the 11th International Workshop on Ontology Matching collocated with the 15th International Semantic Web Conference (ISWC 2016), CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, Aachen, Germany (pp. 217–221). https://ceur-ws.org/Vol-1766/oaei16_paper14.pdf

56.

Kiu

Lee

(2007). OntoDNA: Ontology alignment results for OAEI 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia & B. He (Eds.), Proceedings of the 2nd International Workshop on Ontology Matching (OM-2007) collocated with the 6th International Semantic Web Conference (ISWC-2007), CEUR Workshop Proceedings, Vol. 304, CEUR-WS.org, Aachen, Germany (pp. 196–205). http://www.dit.unitn.it/∼p2p/OM-2007/8-o-OntoDNA.pdf

57.

Kondrak

(2005). N-gram similarity and distance. In M. Consens & G. Navarro (Eds.), Proceedings of the 12th International Conference on String Processing and Information Retrieval, SPIRE’05, Springer-Verlag, Berlin, Heidelberg (pp. 115–126). ISBN 3540297405. https://doi.org/10.1007/11575832_13. https://doi.org/10.1007/11575832_13

58.

Kuo

(2013). ODGOMS—Results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao, & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 153–160). http://www.dit.unitn.it/∼p2p/OM-2013/oaei13_paper8.pdf

59.

Laadhar

Ghozzi

Megdiche

Ravat

Teste

Gargouri

(2017). POMap: An effective pairwise ontology matching system. In A. Fred, D. Aveiro, J.L.G. Dietz, K. Liu, J. Bernardino, A. Salgado & J. Filipe (Eds.), Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Communications in Computer and Information Science, Vol. 976, SciTePress, Setúbal, Portugal (pp. 161–168). ISBN 978-989-758-272-1. https://doi.org/10.5220/0006492201610168.

60.

Laadhar

Ghozzi

Megdiche

Ravat

Teste

Gargouri

(2019). The impact of imbalanced training data on local matching learning of ontologies. In W. Abramowicz, & R. Corchuelo (Eds.), Business Information Systems—Proceedings of the 22nd International Conference, BIS 2019, Lecture Notes in Business Information Processing (LNBIP), Vol. 353, Springer International Publishing, Cham, Switzerland (pp. 162–175). ISBN 978-3-030-20485-3. https://doi.org/10.1007/978-3-030-20485-3_13.

61.

Lambrix

Tan

(2006). SAMBO—A system for aligning and merging biomedical ontologies. Journal of Web Semantics, 4(3), 196–206. https://doi.org/10.1016/j.websem.2006.05.003 . http://www.sciencedirect.com/science/article/pii/S1570826806000151 Semantic Web for Life Sciences.

62.

Lastra-Díaz

J. J.

García-Serrano

Batet

Fernández

Chirigati

(2017). HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66(C), 97–118. https://doi.org/10.1016/j.is.2017.02.002

63.

(2022). An Effective Approach for Large Ontology Matching Using Multi-Objective Grasshopper Algorithm. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence, ICCAI’22, Association for Computing Machinery, New York, NY, USA (pp. 110–116). ISBN 9781450396110. https://doi.org/10.1145/3532213.3532230.

64.

Peng

(2021). A novel periodic learning ontology matching model based on interactive grasshopper optimization algorithm. Knowledge-Based Systems, 228, 107239. https://doi.org/10.1016/j.knosys.2021.107239 . https://www.sciencedirect.com/science/article/pii/S0950705121005013

65.

Lyu

Zhang

Sun

(2017). njuLink: Results for instance matching at OAEI 2017. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 12th International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference (ISWC 2017), CEUR Workshop Proceedings, Vol. 2032, CEUR-WS.org, Aachen, Germany (pp. 158–165). https://ceur-ws.org/Vol-2032/oaei17_paper8.pdf

66.

Madhavan

Bernstein

P. A.

Rahm

(2001). Generic schema matching with cupid. In P.M.G. Apers, P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, & R.T. Snodgrass (Eds.), Proceedings of the 27th International Conference on Very Large Data Bases, VLDB’01, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (pp. 49–58). ISBN 1558608044. https://dl.acm.org/doi/10.5555/645927.672191

67.

Manning

C. D.

Surdeanu

Bauer

Finkel

Bethard

S. J.

McClosky

(2014). The Stanford CoreNLP Natural Language Processing Toolkit. In K. Bontcheva & J. Zhu (Eds.), Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Stroudsburg, USA (pp. 55–60). https://doi.org/10.3115/v1/P14-5010. https://aclanthology.org/P14-5010.pdf

68.

Massmann

Raunich

Aumüller

Arnold

Rahm

(2011). Evolution of the COMA Match System. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao & I. Cruz (Eds.), Proceedings of the 6th International Workshop on Ontology Matching (OM-2011) In conjunction with the International Semantic Web Conference (ISWC2011), CEUR Workshop Proceedings, Vol. 814, CEUR-WS.org, Aachen, Germany (pp. 49–60). https://ceur-ws.org/Vol-814/om2011_Tpaper5.pdf

69.

Meilicke

Stuckenschmidt

(2015). New paradigm for alignment extraction. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, Vol. 1545, CEUR-WS.org, Aachen, Germany (pp. 1–12). https://ceur-ws.org/Vol-1545/om2015_TLpaper1.pdf

70.

Mohammadi

Atashin

A. A.

Hofman

W. J.

Tan

Y.-H.

(2019). SANOM Results for OAEI 2019. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 14th International Workshop on Ontology Matching collocated with the 18th International Semantic Web Conference (ISWC 2019), CEUR Workshop Proceedings, Vol. 2536, CEUR-WS.org, Aachen, Germany (pp. 175–180). http://www.dit.unitn.it/∼pavel/om2019/papers/oaei19_paper14.pdf

71.

Ouali

Ghozzi

Taktak

Hadj Sassi

M. S.

(2019). Ontology alignment using stable matching. Procedia Computer Science, 159, 746–755. https://doi.org/10.1016/j.procs.2019.09.230 . https://www.sciencedirect.com/science/article/pii/S1877050919314164 Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 23rd International Conference KES2019.

72.

Paulheim

Hertling

(2013). WeSeE-Match Results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 197–202). http://www.dit.unitn.it/∼p2p/OM-2013/oaei13_paper14.pdf

73.

Paulheim

Hertling

Ritze

(2013). Towards Evaluating Interactive Ontology Matching Tools. In P. Cimiano, O. Corcho, V. Presutti, L. Hollink & S. Rudolph (Eds.), The Semantic Web: Semantics and Big Data, Springer Berlin Heidelberg, Berlin, Heidelberg (pp. 31–45). ISBN 978-3-642-38288-8. https://doi.org/10.1007/978-3-642-38288-8_3.

74.

Pesquita

Stroe

Cruz

I. F.

Couto

F. M.

(2010). BLOOMS on Agreementmaker: Results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao & I. Cruz (Eds.), Proceedings of the 5th International Workshop on Ontology Matching (OM-2010) collocated with the 9th International Semantic Web Conference (ISWC-2010), CEUR Workshop Proceedings, Vol. 689, CEUR-WS.org, Aachen, Germany (pp. 134–141). http://www.dit.unitn.it/∼p2p/OM-2010/oaei10_paper3.pdf

75.

Pour

M. A. N.

Algergawy

Blomqvist

Buche

Chen

Cotovio

P. G.

Coulet

Cufi

Dong

Faria

Ferraz

Hertling

Horrocks

Ibanescu

Jain

Jiménez-Ruiz

Karam

Kraus

Lambrix

Monnin

Paulheim

Pesquita

Sharma

Shvaiko

Silva

Sousa

Trojahn

Vataščinová

Yaman

Zamazal

Zhou

(2024). Results of the Ontology Alignment Evaluation Initiative 2024. In E. Jiménez-Ruiz, O.H.C. Trojahn, S. Hertling, H. Li, P. Shvaiko & J. Euzenat (Eds.), Proceedings of the 19th International Workshop on Ontology Matching collocated with the 23rd International Semantic Web Conference (ISWC 2024), CEUR workshop proceedings, Vol. 3897, CEUR-WS.org, Aachen, Germany (pp. 64–97). https://ceur-ws.org/Vol-3897/oaei2024_paper0.pdf

76.

Qassimi

Abdelwahed

E. H.

(2019). The role of collaborative tagging and ontologies in emerging semantic of web resources. Computing, 101(10), 1489–1511. https://doi.org/10.1007/s00607-019-00704-9

77.

Quix

Geisler

Hai

Alekh

(2017). Ontology matching for patent classification. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 12th International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference (ISWC 2017), CEUR Workshop Proceedings, Vol. 2032, CEUR-WS.org, Aachen, Germany (pp. 37–48). https://ceur-ws.org/Vol-2032/om2017_Tpaper4.pdf

78.

Real

F. J. Q.

Bella

McNeill

Bundy

(2020). Using Domain Lexicon and Grammar for Ontology Matching. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 15th International Workshop on Ontology Matching collocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Vol. 2788, CEUR-WS.org, Aachen, Germany (pp. 1–12). http://disi.unitn.it/∼pavel/om2020/papers/om2020_LTpaper1.pdf

79.

Roussille

Bousarsar

I. M.

Teste

dos Santos

C. T.

(2018). Holontology: Results of the 2018 OAEI evaluation campaign. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, C. Trojahn, M. Achichi & K. Todorov (Eds.), Proceedings of the 13th International Workshop on Ontology Matching collocated with the 17th International Semantic Web Conference (ISWC 2018), CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, Aachen, Germany (pp. 167–172). http://www.dit.unitn.it/∼pavel/om2018/papers/oaei18_paper8.pdf

80.

Roussille

Teste

(2022). TOMATO: Results of the 2022 OAEI evaluation campaign. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 17th International Workshop on Ontology Matching (OM 2022) collocated with the 21th International Semantic Web Conference (ISWC 2022), CEUR Workshop Proceedings, Vol. 3324, CEUR-WS.org, Aachen, Germany (pp. 210–215). https://ceur-ws.org/Vol-3324/oaei22_paper13.pdf

81.

Schadd

F. C.

Roos

(2011). MaasMatch results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao & I.F. Cruz (Eds.), Proceedings of the 6th International Workshop on Ontology Matching (OM-2011) In conjunction with the International Semantic Web Conference (ISWC2011), CEUR Workshop Proceedings, Vol. 814, CEUR-WS.org, Aachen, Germany, 2011. http://www.dit.unitn.it/∼p2p/OM-2011/oaei11_paper9.pdf

82.

Schadd

Roos

(2012). Coupling of WordNet Entries for Ontology Mapping using Virtual Documents. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao & N. Noy (Eds.), Proceedings of the 7th International Workshop on Ontology Matching (OM-2012) collocated with the 11th International Semantic Web Conference (ISWC-2012), CEUR Workshop Proceedings, Vol. 946, CEUR-WS.org, Aachen, Germany (pp. 25–36). http://www.dit.unitn.it/∼p2p/OM-2012/om2012_Tpaper3.pdf

83.

Schwichtenberg

Engels

(2015). RSDL workbench results for OAEI 2015. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 10th International Workshop on Ontology Matching collocated with the 14th International Semantic Web Conference (ISWC 2015), CEUR Workshop Proceedings, Vol. 1545, CEUR-WS.org, Aachen, Germany (pp. 192–199). http://www.dit.unitn.it/∼pavel/om2015/papers/oaei15_paper14.pdf

84.

Seddiqui

M. H.

Aono

(2009). An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Journal of Web Semantics, 7(4), 344–356. https://doi.org/10.1016/j.websem.2009.09.001 . https://www.sciencedirect.com/science/article/pii/S1570826809000432 Semantic Web challenge 2008.

85.

Silva

Baião

Revoredo

(2016). ALIN Results for OAEI 2016. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh & R. Ichise (Eds.), Proceedings of the 11th International Workshop on Ontology Matching collocated with the 15th International Semantic Web Conference (ISWC 2016), CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, Aachen, Germany (pp. 130–137). http://www.dit.unitn.it/∼pavel/om2016/papers/oaei16_paper1.pdf

86.

Silva

Baião

Revoredo

(2017). ALIN Results for OAEI 2017. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 12th International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference (ISWC 2017), CEUR Workshop Proceedings, Vol. 2032, CEUR-WS.org, Aachen, Germany (pp. 114–121). http://www.dit.unitn.it/∼pavel/om2017/papers/oaei17_paper1.pdf

87.

Silva

Baião

Revoredo

(2018). ALIN Results for OAEI 2018. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, C. Trojahn, M. Achichi & K. Todorov (Eds.), Proceedings of the 13th International Workshop on Ontology Matching collocated with the 17th International Semantic Web Conference (ISWC 2018), CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, Aachen, Germany (pp. 117–124). http://www.dit.unitn.it/∼pavel/om2018/papers/oaei18_paper1.pdf

88.

Silva

Delgado

Revoredo

Baião

(2019). ALIN Results for OAEI 2019. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 14th International Workshop on Ontology Matching collocated with the 18th International Semantic Web Conference (ISWC 2019), CEUR Workshop Proceedings, Vol. 2536, CEUR-WS.org, Aachen, Germany (pp. 94–100). http://www.dit.unitn.it/∼pavel/om2019/papers/oaei19_paper2.pdf

89.

Silva

Delgado

Revoredo

Baião

(2020). ALIN Results for OAEI 2020. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 15th International Workshop on Ontology Matching collocated with the 19th International Semantic Web Conference (ISWC 2020), CEUR Workshop Proceedings, Vol. 2788, CEUR-WS.org, Aachen, Germany (pp. 139–146). https://disi.unitn.it/∼pavel/om2020/papers/oaei20_paper1.pdf

90.

Silva

Revoredo

Baião

F. A.

Euzenat

(2017). Semantic interactive ontology matching: Synergistic combination of techniques to improve the set of candidate correspondences. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham & O. Hassanzadeh (Eds.), Proceedings of the 12th International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference (ISWC 2017), CEUR Workshop Proceedings, Vol. 2032, CEUR-WS.org, Aachen, Germany (pp. 13–24). http://disi.unitn.it/∼pavel/om2017/papers/om2017_Tpaper2.pdf

91.

Silva

Revoredo

Baião

F. A.

Euzenat

(2018). Interactive Ontology Matching: Using Expert Feedback to Select Attribute Mappings. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, C. Trojahn, M. Achichi & K. Todorov (Eds.), Proceedings of the 13th International Workshop on Ontology Matching collocated with the 17th International Semantic Web Conference (ISWC 2018), CEUR Workshop Proceedings, Vol. 2288, CEUR-WS.org, Aachen, Germany. http://disi.unitn.it/∼pavel/om2018/papers/om2018_LTpaper3.pdf

92.

Silva

Revoredo

Baião

Euzenat

(2020). Alin: Improving interactive ontology matching by interactively revising mapping suggestions. The Knowledge Engineering Review, 35, e1. https://doi.org/10.1017/S0269888919000249 . https://doi.org/10.1017/s0269888919000249

93.

Silva

Revoredo

Baião

Lima

(2021). ALIN Results for OAEI 2021. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 16th International Workshop on Ontology Matching collocated with the 20th International Semantic Web Conference (ISWC 2021), CEUR Workshop Proceedings, Vol. 3063, CEUR-WS.org, Aachen, Germany (pp. 109–116). http://disi.unitn.it/∼pavel/om2021/papers/oaei21_paper1.pdf

94.

Silva

Revoredo

Baião

Lima

(2022). ALIN Results for OAEI 2022. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 17th International Workshop on Ontology Matching (OM 2022) collocated with the 21th International Semantic Web Conference (ISWC 2022), CEUR Workshop Proceedings, Vol. 3324, CEUR-WS.org, Aachen, Germany (pp. 129–136). http://disi.unitn.it/∼pavel/om2022/papers/oaei22_paper1.pdf

95.

Silva

Revoredo

Baião

Lima

(2023). ALIN Results for OAEI 2023. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, O. Hassanzadeh & C. Trojahn (Eds.), Proceedings of the 18th International Workshop on Ontology Matching collocated with the 22nd International Semantic Web Conference (ISWC 2023), CEUR Workshop Proceedings, Vol. 3591, CEUR-WS.org, Aachen, Germany (pp. 140–145). http://disi.unitn.it/∼pavel/om2023/papers/oaei23_paper1.pdf

96.

Silva

Revoredo

Baião

Lima

(2024). ALIN Results for OAEI 2024. In E. Jiménez-Ruiz, O.H.C. Trojahn, S. Hertling, H. Li, P. Shvaiko & J. Euzenat (Eds.), Proceedings of the 19th International Workshop on Ontology Matching collocated with the 23rd International Semantic Web Conference (ISWC 2024), CEUR Workshop Proceedings, Vol. 3897, CEUR-WS.org, Aachen, Germany (pp. 118–123). https://ceur-ws.org/Vol-3897/oaei2024_paper4.pdf

97.

Spasić

Corcoran

Gagarin

Buerki

(2018). Head to head: Semantic similarity of multi-word terms. IEEE Access, 6, 20545–20557. https://doi.org/10.1109/ACCESS.2018.2826224 . https://doi.org/10.1109/access.2018.2826224

98.

Tigrine

A. N.

Bellahsene

Todorov

(2016). LYAM++ results for OAEI 2016. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh & R. Ichise (Eds.), Proceedings of the 11th International Workshop on Ontology Matching collocated with the 15th International Semantic Web Conference (ISWC 2016), CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, Aachen, Germany (pp. 196–200). http://ceur-ws.org/Vol-1766/oaei16_paper11.pdf

99.

Togia

McNeill

Bundy

(2010). Harnessing the Power of Folksonomies for Formal Ontology Matching On-the-Fly. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao & I. Cruz (Eds.), Proceedings of the 5th International Workshop on Ontology Matching (OM-2010) collocated with the 9th International Semantic Web Conference (ISWC-2010), CEUR Workshop Proceedings, Vol. 689, CEUR-WS.org, Aachen, Germany (pp. 226–227). http://www.dit.unitn.it/∼p2p/OM-2010/om2010_poster4.pdf

100.

Van Berne

Malaisé

(2014). Evaluation of String Normalisation Modules for String-Based Biomedical Vocabularies Alignment with AnAGram. In M. Horridge, M. Rospocher, & J. van Ossenbruggen (Eds.), Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference (ISWC 2014), CEUR Workshop Proceedings, Vol. 1272, CEUR-WS.org, Aachen, Germany (pp. 237–240). http://www.dit.unitn.it/∼p2p/OM-2014/om2014_poster1.pdf

101.

Wang

Zhang

Hou

Zhao

J.-Z.

Tang

(2010). RiMOM results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao & I. Cruz (Eds.), Proceedings of the 5th International Workshop on Ontology Matching (OM-2010) collocated with the 9th International Semantic Web Conference (ISWC-2010), CEUR Workshop Proceedings, Vol. 689, CEUR-WS.org, Aachen, Germany (pp. 195–202). http://www.dit.unitn.it/∼p2p/OM-2010/oaei10_paper11.pdf

102.

Winkler

W. E.

(1990). String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage, Technical Report, RR93/09, U.S. Bureau of the Census. https://eric.ed.gov/?id=ED325505

103.

Zhang

Wang

Liu

Zhao

(2013). IAMA Results for OAEI 2013. In P. Shvaiko, J. Euzenat, K. Srinivas, M. Mao & E. Jiménez-Ruiz (Eds.), Proceedings of the 8th International Workshop on Ontology Matching collocated with the 12th International Semantic Web Conference (ISWC 2013), CEUR Workshop Proceedings, Vol. 1111, CEUR-WS.org, Aachen, Germany (pp. 123–130). http://www.dit.unitn.it/∼p2p/OM-2013/oaei13_paper4.pdf

104.

Zhao

Zhang

(2016). Identifying and validating ontology mappings by formal concept analysis. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh & R. Ichise (Eds.), Proceedings of the 11th International Workshop on Ontology Matching collocated with the 15th International Semantic Web Conference (ISWC 2016), CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, Aachen, Germany (pp. 61–72). https://disi.unitn.it/∼pavel/om2016/papers/om2016_Tpaper6.pdf

105.

Zhao

Zhang

(2016). FCA-Map results for OAEI 2016. In P. Shvaiko, J. Euzenat, E. Jiménez-Ruiz, M. Cheatham, O. Hassanzadeh & R. Ichise (Eds.), Proceedings of the 11th International Workshop on Ontology Matching collocated with the 15th International Semantic Web Conference (ISWC 2016), CEUR Workshop Proceedings, Vol. 1766, CEUR-WS.org, Aachen, Germany (pp. 172–177). http://www.dit.unitn.it/∼pavel/om2016/papers/oaei16_paper7.pdf

106.

Zhao

Zhang

Chen

(2018). Matching biomedical ontologies based on formal concept analysis. Journal of Biomedical Semantics, 9, 11. https://doi.org/10.1186/s13326-018-0178-9

107.

Zheng

Zhou

Lin

(2022). Knowledge-informed semantic alignment and rule interpretation for automated compliance checking. Automation in Construction, 142, 104524. https://doi.org/10.1016/j.autcon.2022.104524 . https://www.sciencedirect.com/science/article/pii/S0926580522003971

Tool/Approach	NUL	STE	SOW	CRA	RSW	RNC	RPC	USY	EAA	POW
“Discovering and Merging Keyword Senses using Ontology Matching’’ (Espinoza et al., 2006)	X
Falcon-AO (Hu et al., 2006)	X	X					X
ODGOMS (Kuo & Wu, 2013)	X				X
“Understanding a Large Corpus of Web Tables Through Matching with Knowledge Bases: an Empirical Study’’ (Hassanzadeh et al., 2015)	X
“Instance-Based Ontology Matching for e-learning Material using an Associative Pattern Classifier’’ (Cerón-Figueroa et al., 2017)	X
“Instance-Based Ontology Matching for Open and Distance Learning Materials’’ (Cerón-Figueroa et al., 2017)	X						X
ATBox (Hertling & Paulheim, 2020)	X		X				X	X
“Supervised Ontology and Instance Matching with MELT’’ (Hertling et al., 2020)	X
“Measuring Clusters of Labels in an Embedding Space to Refine Relations in Ontology Alignment’’ (Dhouib et al., 2021)	X				X
“Biomedical Ontology Alignment with BERT’’ (He et al., 2021)	X	X	X
IAMA (Zhang et al., 2013)	X						X
“Learning to Map between Ontologies on the Semantic Web’’ (Doan et al., 2002)		X
SAMBO (Lambrix & Tan, 2006)		X
“An Efficient and Scalable Algorithm for Segmented Alignment of Ontologies of Arbitrary Size’’ (Seddiqui & Aono, 2009)		X			X
AML (Faria et al., 2023)	X		X		X	X	X	X
“Scalable Matching of Industry Models: A Case Study’’ (Byrne et al., 2009)		X	X
“Cross-Lingual Dutch to English Alignment using EuroWordNet and Dutch Wikipedia’’ (Bouma, 2009)		X
BLOOMS (Pesquita et al., 2010)		X						X
“Harnessing the Power of Folksonomies for Formal Ontology Matching On-the-Fly’’ (Togia et al., 2010)		X
MaasMatch (Schadd & Roos, 2011)		X			X
“An Effective Approach for Large Ontology Matching using Multi-Objective Grasshopper Algorithm’’ (Lv, 2022)		X			X
ServOMap (Kammoun & Diallo, 2013)		X			X		X
LYAM++ (Tigrine et al., 2016)		X			X
MapSSS (Cheatham & Hitzler, 2013)		X	X		X
AnAGram (Van Berne & Malaisé, 2014)		X	X
“Large-scale Interactive Ontology Matching: Algorithms and Implementation’’ (Jiménez-Ruiz et al., 2012)		X
STRIM (Khiat et al., 2015)		X			X
CroLOM (Khiat, 2016)		X			X
SimCat (Khiat et al., 2016)		X			X
POMap (Laadhar et al., 2017)		X			X	X
I-Match (Khiat & Mackeprang, 2017)		X			X
“We Divide, You Conquer: From Large-Scale Ontology Alignment to Manageable Subtasks with a Lexical Index and Neural Embeddings’’ (Jiménez-Ruiz et al., 2018)		X			X
SANOM (Mohammadi et al., 2019)		X			X		X
“The Role of Collaborative Tagging and Ontologies in Emerging Semantic of Web Resources’’ (Qassimi & Abdelwahed, 2019)		X			X
“Ontology Alignment using Stable Matching’’ (Ouali et al., 2019)		X			X
“Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-Based Modules’’ (Jiménez-Ruiz et al., 2020)		X			X
Cupid (Madhavan et al., 2001)		X						X	X
TOMATO (Roussille & Teste, 2022)		X	X		X	X
Eff2Match (Chua & Kim, 2010)			X
Hertuda (Hertling, 2012)			X		X
WeSeE (Paulheim & Hertling, 2013)			X		X
“Efficient Selection of Mappings and Automatic Quality-Driven Combination of Matching Methods’’ (Cruz et al., 2009)					X
RSDL (Schwichtenberg & Engels, 2015)			X
Holontology (Roussille et al., 2018)			X
DOME (Hertling & Paulheim, 2019)			X		X
“A Hybrid Approach for Large Knowledge Graphs Matching’’ (Fallatah et al., 2021)			X			X
OntoConnect (Chakraborty et al., 2020)				X
“Enabling Semantic Search for EO Products: an Ontology Matching Approach’’ (Karpathiotaki et al., 2014)					X
Onto-Mapology (Bethea et al., 2006)					X
OntoDNA (Kiu & Lee, 2007)					X
TaxoMap (Hamdi et al., 2008)					X
CODI (Huber et al., 2011)					X
COMA (Massmann et al., 2011)					X			X	X
“Coupling of WordNet Entries for Ontology Mapping using Virtual Documents’’ (Schadd & Roos, 2012)					X
GOMMA (Groß et al., 2012)					X
WikiMatch (Hertling & Paulheim, 2012)					X
CroMatcher (Gulic & Vrdoljak, 2013)					X
SPHeRe (Khan et al., 2013)					X
Exona (Damak et al., 2015)					X
InsMT+ (Khiat & Benaissa, 2015)					X
RiMOM (Wang et al., 2010)					X
Legato (Achichi et al., 2017)					X
“Ontology Matching for Patent Classification’’ (Quix et al., 2017)					X
“Using Domain Lexicon and Grammar for Ontology Matching’’ (Real et al., 2020)					X		X		X
“Applying Edge-Counting Semantic Similarities to Link Discovery: Scalability and Accuracy’’ (Georgala et al., 2020)					X
“A Novel Periodic Learning Ontology Matching Model Based on Interactive Grasshopper Optimization Algorithm’’ (Lv & Peng, 2021)					X
KGMatcher (Fallatah et al., 2022)					X	X
OMT (Gomes et al., 2022)					X
“Knowledge-Informed Semantic Alignment and Rule Interpretation for Automated Compliance Checking’’ (Zheng et al., 2022)					X
“The Impact of Imbalanced Training Data on Local Matching Learning of Ontologies’’ (Laadhar et al., 2019)					X
GraphMatcher (Efeoglu, 2022)					X
“Ontology Matching with Semantic Verification’’ (Jean-Mary et al., 2009)						X		X
“Effective Method for Large Scale Ontology Matching’’ (Diallo & Ba, 2012)						X				X
SEBMatcher (Gosselin & Zouaq, 2022)							X		X
“Identifying and Validating Ontology Mappings by Formal Concept Analysis’’ (Zhao & Zhang, 2016)							X
FCA-Map (Zhao & Zhang, 2016)							X	X
“Learning Reference-Enriched Approach Towards Large Scale Active Ontology Alignment and Integration’’ (Cheng et al., 2017)							X
njuLink (Lyu et al., 2017)							X
S-Match (Giunchiglia et al., 2004)							X
“New Paradigm for Alignment Extraction’’ (Meilicke & Stuckenschmidt, 2015)										X
“Matching Biomedical Ontologies based on Formal Concept Analysis’’ (Zhao et al., 2018)					X		X	X
Alin (Silva et al., 2020)	X	X	X	X	X	X	X	X	X	X
AML (Faria et al., 2023) with lexical analyzers	X	X	X	X	X	X	X	X	X	X

Enhancing Ontology Matching: Lexically and Syntactically Standardizing Ontologies Through Customized Lexical Analyzers

Abstract

Keywords

1 Introduction

2 Related Work

3.1 Preliminaries

Definition 3.1 Standardization Technique

3.1.1 Regular Expression and Lexical Analyzer

Definition 3.2 Lexeme, token and word

Definition 3.3 Regular expression

Definition 3.4 Lexical analyzer

Definition 3.5 Lexical analyzer generator and lexer

Definition 3.6 Existing tokens in the lexical analyzer

Definition 3.7 Noun Phrase

Definition 3.9 Term Variation

Definition 3.10 Syntax

Definition 3.11 Target Grammar

Definition 3.14 Entity with syntactically correct name

Definition 3.15 Rejected entity

4 Experimental Evaluation

4.1.1 Conference Track

Table 2. Number of Classes, Attributes, and Relationships in the Conference Ontologies. Ontology Classes Attributes Relationships Ekaw 74 0 33 Sigkdd 49 11 17 Iasted 140 3 38 Cmt 36 10 49 Edas 104 20 30 ConfOf 38 23 13 Conference 60 18 46

4.2 Alin Overview

4.3 AML Overview

4.4 Analysis of the Evolution of the Lexical Analyzers in the Anatomy Track from Version to Version

Table 3. Entities With Non-Standard Name and Rejected Entities in Each Version of the Lexical Analyzer of the Human Ontology. Version Entities with non-standard name Rejected entities I 657 0 II 634 0 III 634 38 IV 548 38 VI 544 38 VIII 178 288 IX 122 344 X 94 344 XI 0 438

5 Conclusion and Future Work

Footnotes

Funding

Declaration of Conflicting Interests

Notes

Orcid iDs

Appendix A.

Appendix B

Appendix C

References

Table 2.
Number of Classes, Attributes, and Relationships in the Conference Ontologies.

Ontology Classes Attributes Relationships

Ekaw 74 0 33

Sigkdd 49 11 17

Iasted 140 3 38

Cmt 36 10 49

Edas 104 20 30

ConfOf 38 23 13

Conference 60 18 46

Table 3.
Entities With Non-Standard Name and Rejected Entities in Each Version of the Lexical Analyzer of the Human Ontology.

Version Entities with non-standard name Rejected entities

I 657 0

II 634 0

III 634 38

IV 548 38

VI 544 38

VIII 178 288

IX 122 344

X 94 344

XI 0 438