Sage Journals: Discover world-class research

Abstract

Word sense disambiguation (WSD) remains a critical challenge in natural language processing (NLP), requiring precise identification of context-dependent word meanings. This article proposes a novel global optimization framework for WSD by reformulating it as a combinatorial optimization problem. We introduce a discrete variant of the rat swarm optimizer (RSO), a metaheuristic inspired by rat foraging behavior, to efficiently select optimal sense combinations across sentences. Unlike local disambiguation methods, our approach leverages phrase-level embeddings and WordNet gloss definitions within a unified semantic fitness function, enabling robust global sense selection. Experiments on SensEval and SemEval benchmarks demonstrate state-of-the-art performance, achieving accuracy improvements of 76.32% (SensEval2), 71.80% (SensEval3), 58.30% (SemEval2007), 77.17% (SemEval2013), and 74.40% (SemEval2015) surpassing previous methods. The effectiveness of the model shows its benefits for downstream NLP tasks such as machine translation, information retrieval, and document indexing. Future work will focus on hyperparameter optimization and multilingual adaptation to enhance scalability for low-resource languages.

Keywords

word sense disambiguation discrete rat swarm optimizer swarm-based intelligence word embedding computational linguistics

1. Introduction

Word sense disambiguation (WSD) is one of the open problems in computational linguistics. It attempts to determine the intended meaning of a word according to its context (Ransing & Gulati, 2022). WSD typically involves two major processes: identifying ambiguous words and contextually labeling every word by its correct meaning with high precision. The word “bass” can have multiple meanings in the sentence “The bass swims in the river” , such as a type of fish or a low-frequency sound in music. In this context, the intended sense of “bass” is “a type of fish” . The WSD algorithms analyze the surrounding words and sentence structure to determine that the context refers to a fish rather than a musical sound.

It is crucial for real-world natural language processing (NLP) applications, which impacts accuracy and user experience. Consider machine translation (Nguyen et al., 2018), where “bank” can refer to a financial institution or a riverbank. Incorrect sense selection leads to mistranslations, affecting the overall quality. Similarly, search engines rely on WSD to return contextually relevant results. An “apple stock” query should retrieve financial information, not results about the fruit (Koppula et al., 2022). Sentiment analysis also benefits, as words like “bright” can be positive ( “bright future” ) or negative ( “bright light in my eyes” ), depending on context (Hung & Chen, 2016). Dialogue systems and chatbots also require WSD to understand the intent of the user. The query “set the table” demands the understanding that the “table” is furniture, not a data table (Kouris et al., 2021; Rahman & Borah, 2020). Effective WSD significantly improves these applications, motivating the development of precise, context-sensitive disambiguation methods.

WSD approaches can be categorized into two main types: knowledge-based methods that utilize external resources and machine learning-based methods, which include techniques such as artificial neural networks and swarm intelligence algorithms (Bevilacqua et al., 2021).

Despite significant advancements in WSD research, various existing methods, such as supervised, unsupervised, and knowledge-based, still encounter considerable limitations. Supervised WSD techniques are generally accurate but rely heavily on large annotated datasets, which are both resource-intensive and specific to individual languages, thus limiting their scalability. On the other hand, unsupervised methods address this dependency but often result in lower precision and face challenges with ambiguous clustering, particularly in complex linguistic contexts. Knowledge-based methods, which depend on lexical databases like WordNet, may struggle to adapt to domain-specific or under-resourced scenarios where such resources are incomplete (Ransing & Gulati, 2022).

Word embeddings, distributed representations of words as real-valued vectors, capture semantic and syntactic relationships from unannotated corpora, proving invaluable in various NLP applications like word similarity and machine translation (Pham & Le, 2018). Recent advances in WSD have also incorporated word embeddings, leveraging their ability to provide context-aware representations. Swarm optimization algorithms, which are inspired by natural social behaviors (Esmin et al., 2015; Monga et al., 2022), provide efficient solutions to complex optimization problems by balancing exploration and exploitation strategies (Gad, 2022; Yang & Karamanoglu, 2020). Although these algorithms have been successful in various domains (Hussien et al., 2020), their application in WSD, especially alongside modern NLP techniques, has not been thoroughly explored.

This paper introduces DRSO-WSD, a novel variant of discrete rat swarm optimization (RSO) (Dhiman et al., 2021) aimed at addressing the WSD challenge. Unlike traditional WSD methods that rely heavily on extensive labeled data or predefined lexical resources, our approach determines the semantic similarity between the context of an ambiguous word and its candidate senses using two embedding strategies: Doc2Vec (Lau & Baldwin, 2016), which captures the overall semantics of a document, and BERT (Devlin et al., 2018), which offers contextualized word-level representations. These similarity scores serve as fitness values in the DRSO algorithm, which iteratively searches for the optimal sense assignment by balancing exploration and exploitation.

We demonstrate the effectiveness of DRSO-WSD through comprehensive evaluations on the SensEval2 (Edmonds & Cotton, 2001), SensEval3 (Mihalcea et al., 2004), SemEval2007 (Navigli et al., 2007), SemEval2013 (Navigli et al., 2013), and SemEval2015 (Moro & Navigli, 2015) benchmarks, comparing our results against state-of-the-art methods.

The following research questions guide this study:

RQ1: Can integrating Doc2Vec and BERT embeddings with a swarm-based optimizer improve WSD accuracy over traditional methods?

RQ2: How effective is a discrete form of the rat swarm optimization algorithm in navigating the WSD search space?

RQ3: Does the proposed approach generalize well across standard WSD benchmarks?

We hypothesize that combining semantic similarity measures from modern embedding models with a discrete swarm optimization strategy will outperform existing WSD methods in accuracy and robustness across diverse benchmark datasets. The DRSO-WSD method shows strong potential for real-world NLP applications by improving WSD. Machine translation improves translation quality by selecting the correct word sense based on context. Information retrieval allows for more relevant document retrieval, boosting precision and user satisfaction. The method also applies in question answering, sentiment analysis, and text summarization, highlighting its versatility for different linguistic contexts and multilingual applications. The rest of the article is structured as follows: Section 2 addresses related work in WSD. Section 3 introduces our proposed hybrid method. Section 4 presents and analyzes the experimental results. Finally, Section 5 concludes the study and suggests directions for future research.

2. Related Works

This section focuses on previous works on two main tasks: the WSD approach, which uses machine learning techniques or knowledge-based methods for tagging the corpora explicitly, and the word embedding approach, which uses linguistic resources, such as WordNet, to represent meanings. Many researchers have addressed the WSD studies, including supervised, semi-supervised, and unsupervised approaches. Supervised methods use sense-annotated datasets to identify the appropriate meanings of the words. Training and testing are the most critical phases in supervised approaches. The training phase requires annotated data to create classifiers based on machine learning algorithms. During the evaluation stage, approaches attempt to determine the appropriate senses based on context. In the literature, there are many supervised methods available. Probability methods use the Bayes theorem to predict the meanings of ambiguous words. Several authors in different languages have adopted this model, like Wang and Hirst (2014) and Walia et al. (2018). The decision list technique is a supervised method of classifying test instances using discriminative rules. The training sets extract a feature set, generating type rules (feature value, sense, and score). The recognition of rules based on their descending score determines the decision list. Similarity-based methods disambiguate the words by comparing the features of raw sample data with those of trained data and then selecting the most similar pattern. The results of supervised methods are always better than those of other approaches (Ransing & Gulati, 2022). While supervised models have their strengths, they also face limitations. They rely on extensive training data for each ambiguous word, necessitating manual sense label annotation. Unfortunately, manual sense annotation is impractical for large-scale projects due to the task’s difficulty, expense, and time-consuming nature. As a result, supervised models often need help with disambiguation, and the lack of high-quality training data hinders their performance (Bevilacqua et al., 2021). Unsupervised methods for WSD acquire knowledge from unannotated data by assuming similarity in word clusters based on the idea that ambiguous words share similar contexts (Bevilacqua et al., 2021). For this reason, they use some contextual similarity metrics. The main task of unsupervised methods is to identify sense classes to avoid knowledge acquisition bottlenecks caused by limited manually annotated linguistic resources (Ransing & Gulati, 2022). Recently, swarm intelligence techniques have successfully solved unsupervised WSD, such as the work of Farahani et al. (2020). Bakhouche et al. (2015) applied the ant colony algorithm to solve the WSD problem, relying on gloss overlap similarity to maximize the semantic relatedness between words in sentences. Vij and Jain used the genetic algorithm (Vij et al., 2020) to attain the same objective. Several works have used the semantic similarity techniques adopted by Zhang et al. (2008) in the maximization task of the genetic algorithm. To address this problem, Alsaeedan et al. (2017) present a hybrid approach using ant colony optimization and genetic algorithms. The authors used an untagged corpus to identify semantic classes of ambiguous words. Bhatia et al. (2022) used the dynamic configuration window function to perform Hindi WSD using a genetic algorithm. The comparison results show that this approach outperforms several other WSD methods. Nodehi and Charkari (2022) created the hybrid meta-heuristic algorithm. They combine the meta-heuristic method with the neural surrogate function to help researchers find the best meaning in a given situation. Researchers used a neural network as an objective function. Rajini and Vasuki (2021) employed techniques, such as the Bees’ algorithm, the Optimization Firefly, and the Cuckoo algorithm, to label samples and numerous words within the corpus. These methods were evaluated using SemEval 2016 task 11. According to the experiment results, optimizing fireflies performs better than any other algorithm. Ajeena Beegom and Chinmayan (2020) proposed another approach for solving WSD problems involving combinatorial optimization. In terms of performance, the proposed algorithm outperforms others due to its powerful features. Using a combinatorial optimization approach, Abdelaali et al. (2022) disambiguated the ambiguous words in specific contexts. They achieved this goal by implementing the second version of Crow search optimization. Using specific benchmark datasets, they compared the proposed method to other methods. The results demonstrate that this approach performs well. The swarm-based optimization algorithms attain high precision in WSD in many languages (Abualigah et al., 2021). They proposed a discrete version of RSO, a metaheuristic, for the WSD problem because it effectively solves various optimization problems (Dhiman et al., 2021) and yields superior results. However, sense embedding is an alternative to traditional word vector models, such as word2vec (Pennington et al., 2014) and GloVe (Mikolov et al., 2013), which represent monosemous words well but fail when it comes to ambiguous words. Sense embeddings represent each sense of polysemous words with a separate vector. A sense inventory can be induced from unlabeled data (Iacobacci & Navigli, 2019) or linked to a particular inventory (Iacobacci et al., 2015). LSTMEmbed uses pre-trained embeddings to learn BabelNet-linked sense embeddings (Navigli & Ponzetto, 2012). Despite primary testing in English, this approach could apply to other BabelNet languages. A GlossBERT model (Huang et al., 2019) implements a system WSD that improves significantly by leveraging gloss information. According to Wang et al. (2019), the sentence-pair classification technique is helpful in this model. The system uses context and gloss as inputs. The system concatenates and categorizes context-gloss pairs using a specific token. Batanović and Nikolić (2017) used word embeddings from a large unlabeled corpus as classification features. They looked at how lemmatization and stemming methods affected the performance of sentiment analysis classifiers trained on the Serbian Movie Review Dataset for sentiment analysis. Batanović et al. (2018) present a novel supervised bag-of-words model for semantic textual similarity (STS) tasks. They’re integrating part-of-speech and term frequency weightings. The authors introduce a new STS dataset for evaluation: the Serbian Semantic Textual Similarity News Corpus (STS.news.sr). This corpus comprises 1192 news-based sentence pairs annotated with fine-grained similarity scores comparable to existing English and other primary language datasets.

3. Proposed Approach Description

In this section, we present our approach in further detail. It includes two main parts. The first part explains how to use pre-trained word embedding models to represent target words’ contexts and sense definitions. The second part introduces the DRSO algorithm and demonstrates its application to WSD. Figure 1 illustrates an overview of the proposed approach.

Figure 1.

An overview of the proposed DRSO-WSD approach.

3.1. Theoretical Framework and Experimental Design

The experimental methodology is based on swarm intelligence and distributional semantics. It utilizes the RSO algorithm to navigate complex search spaces in WSD, focusing on balancing exploration and exploitation to optimize sense assignments. The choice of RSO is theoretically motivated by its ability to efficiently search discrete solution spaces, aligning well to find the most semantically coherent global sense assignments. This theoretical foundation informs the experimental design and influences the selection and evaluation of the candidate sense. The fitness function is derived from distributional semantics, employing word embeddings such as Doc2Vec and BERT to compute the cosine similarity between ambiguous word contexts and their candidate senses. Doc2Vec captures document-level semantics, while BERT offers fine-grained contextualized representations, ensuring a nuanced understanding of the disambiguation context. The DRSO algorithm extends the RSO approach by incorporating discrete representation mechanisms and adaptive control, which enhance semantic coherence during sense assignment. DRSO was selected over alternative metaheuristics such as genetic algorithms (GA) and particle swarm optimization (PSO) due to its superior convergence behavior, robustness in high-dimensional spaces, and discrete solution modeling characteristics, particularly well suited for WSD tasks. This methodology exemplifies a multidisciplinary approach, integrating principles from optimization theory, computational linguistics, and deep learning. This integration enables a more robust and accurate disambiguation process that takes advantage of semantic understanding and efficient search strategies. The evaluation of the DRSO-WSD method is carried out using benchmark datasets, including SensEval2, SensEval3, SemEval2007, SemEval2013, and SemEval2015. Performance is assessed using standard evaluation metrics, including precision, recall, and F1-score.

3.2. Preprocessing

The proposed approach begins with a preprocessing phase that prepares the input corpus for disambiguation. The raw texts are first tokenized into individual words using the NLTK (Khemani & Adgaonkar, 2021) and TextBlob (Loria, 2018) libraries. Next, we remove punctuation, numerical tokens, and other non-alphabetic characters. All text is converted to lowercase to maintain consistency. Then stop-word removal is applied to eliminate commonly occurring but semantically weak terms (e.g., ‘‘the,’’ ‘‘and,’’ ‘‘is’’) that do not contribute meaningfully to sense disambiguation. In the case of multilingual or domain-specific corpora, a customized stop-word list was used to improve effectiveness. No stemming or lemmatization was applied, as we rely on contextual embeddings (e.g., BERT) that capture morphological and syntactic nuances directly from the full word form. This preprocessing pipeline ensures that the input to the disambiguation stage is clean and semantically informative, allowing the DRSO algorithm to operate effectively.

3.3. Distributed Representation of Context and Synset Gloss

In modern machine learning, a popular technique involves representing words or larger text units (sentences, paragraphs, or documents) as vectors. These vectors capture the text’s essential meaning and structure (semantics and syntax). Here, we use word embedding models to assess the representation of various word meanings and their surrounding context. Our approach uses existing word embedding models, such as Doc2Vec, for each sense in the definition provided by a resource like WordNet. Next, we will briefly explain the specific word embedding model we used to ensure understanding.

3.3.1. Document to Vector (Doc2Vec)

In 2014, Le and Mikolov introduced Doc2Vec, a popular method for learning text embeddings using simple neural networks. It focuses on projecting documents (sentences, paragraphs, or text) into a latent-dimensional space. Doc2Vec is an extension of Word2Vec and can be obtained using two neural network methods: paragraph embedding distributed memory and distributed bag-of-words. The PV-DM model, or ‘‘distributed memory’’ model, predicts a target word using surrounding words as context. On the other hand, the DBOW model, or ’distributed bag of words’, ignores context words and randomly selects words from the output. This approach may need to be more accurate, but it is useful in tasks where context is less relevant, such as language generation.

3.3.2. Bidirectional Encoder Representations From Transformers (BERT)

BERT is a language representation model that considers both the left and right contexts, achieving state-of-the-art results on NLP tasks without extensive architecture modifications. BERT generally offers strong context-based representation, pre-trained knowledge, fine-tuning capabilities, and competitive performance, making it compelling to address WSD challenges. BERT remains an excellent tool for WSD, but several alternatives offer specific advantages based on the task and the limited resources. Table 1 provides a comparative overview.

Table 1.
BERT vs. Alternative for WSD.

Model Strengths Weaknesses Considerations for WSD

BERT Captures complex relationships between words and their surrounding context (Powerful contextual representation). Leverages existing knowledge from vast text data (Pre-trained on massive datasets). Requires significant computational resources for training and inference. Achieves excellent accuracy on various WSD benchmarks.

XLNet Addresses limitations in BERT’s pre-training process. Captures bidirectional context effectively. May require more complex training procedures. Beneficial for tasks with long-range dependencies.

Model	Strengths	Weaknesses	Considerations for WSD
BERT	Captures complex relationships between words and their surrounding context (Powerful contextual representation). Leverages existing knowledge from vast text data (Pre-trained on massive datasets).	Requires significant computational resources for training and inference.	Achieves excellent accuracy on various WSD benchmarks.
XLNet	Addresses limitations in BERT’s pre-training process. Captures bidirectional context effectively.	May require more complex training procedures.	Beneficial for tasks with long-range dependencies.

3.4. Overview of the Proposed Approach

The proposed method is based on swarm intelligence and semantic similarity. It primarily consists of three key components:

Rat Swarm Optimization (RSO): We start from the original formulation of the continuous RSO algorithm, which is inspired by the collective foraging behavior of rats.

Discrete RSO (DRSO): To address the combinatorial nature of the WSD task, we develop a discrete version of RSO that operates over finite sets of candidate word senses. Compared to traditional metaheuristics such as PSO or GA, DRSO offers a more balanced trade-off between exploration and exploitation in discrete search spaces. Although GA may suffer from premature convergence and PSO from weak local refinement in our WSD context, DRSO has shown superior robustness and convergence behavior. These advantages make it particularly suitable for linguistic optimization problems.

Fitness Function: The disambiguation process is driven by a semantic similarity-based fitness function, which evaluates the contextual suitability of candidate sense assignments using distributional semantic models such as Doc2Vec and BERT.

These components work together to formulate WSD as an optimization problem, where the goal is to select the most coherent combination of word senses for a given context. The detailed mechanisms of the algorithm are explained in the following subsections.

3.4.1. Original RSO

The RSO algorithm is a new metaheuristic approach that helps solve global optimization problems. This algorithm takes its inspiration from the social intelligence of rats, known for their aggressive hunting behavior. In particular, rats follow specific behaviors when hunting prey, including chasing and fighting. The RSO algorithm mathematically models this behavior. Each possible solution is considered a rat’s position, and the population’s positions are randomly chosen in the solution space. Rats update their positions during iterations based on chasing and fighting principles.

Chasing the prey:

The social behavior of rats generally allows them to hunt in groups. They exploit the best rat’s position obtained to update their positions. It can be expressed mathematically, as in Eq. (1).

\vec{P} = A \cdot \vec{P_{i}} (x) + C \cdot (\vec{P_{r}} (x) - \vec{P_{i}} (x))

(1)

\vec{P_{i}} (x)

is the

i

th rat’s current position, and

\vec{P_{r}} (x)

is the group’s best particle position. Parameters A and C control the algorithm’s exploration-exploitation phases. Eq. (2) determines A, whereas C is a stochastic number between 0 and 2.

A = R - x (\frac{R}{{Max}_{iteration}})

(2)

Fighting with prey:

The fighting with prey behavior of the rats is mathematically modeled as in Eq. (3).

{\vec{P}}_{i} (x + 1) = | {\vec{P}}_{r} (x) - \vec{P} |

(3)

Where is the next position of the

i

th rat in the population, is the best rats’ position in the swarm, and is calculated by Eq. (1). The steps of the original RSO are presented as follows:

*

Step 1: Initialize the rats population $P_{i}$ , where $i = 1, 2, \dots, N$ .

*

Step 2: Initialize the parameters of RSO: A, C, and R.

*

Step 3: Evaluate the fitness value of each rat’s position.

*

Step 4: Check the best position of the rats.

*

Step 5: Use Eq. (3) to update the rats’ positions.

*

Step 6: Check if any particle exceeds the search space boundaries and adjust.

*

Step 7: Update the $P_{r}$ If the new rat’s fitness value is better than the current $P_{r}$ .

3.4.2. Discrete RSO

The RSO algorithm has yet to be proposed for solving optimization problems with discrete variables. Thus, this article proposes the implementation of DRSO as a discrete optimization algorithm based on RSO to solve WSD. The following section presents and describes an implementation of DRSO for WSD. The proposed discrete version of continuous RSO is designed to address the WSD problem. Thus, the new version of the RSO algorithm requires two modifications to the original version. The rat’s position is encoded, and the original position’s update formulas are modified to adjust them for discrete values.

Rat’s position encoding for WSD and initialization:

In our proposed DRSO for WSD, the positions of the rat are represented using a discrete d-dimensional vector $R = R_{1}, R_{2} \dots R_{d}$ , where $R_{i}$ is the selected sense (discrete value representing the sense code) for the $i$ th ambiguous word to be disambiguated, and d is the number of ambiguous words in the text. Figure 3 represents a sample of the position vector (solution). The first element of the vector corresponds to the third meaning of $w_{1}$ , the second element corresponds to the first meaning of $w_{2}$ , the third element corresponds to the first meaning of $w_{3}$ , and the fourth and final element corresponds to the third meaning of $w_{4}$ .

Figure 2.

An overview of the document to vector (Doc2Vec).

Figure 3.

A sample of a rat’s position encoding.

To initialize the rat’s population for the disambiguation of a text containing $d$ ambiguous words, $N$ discrete d-dimensional vectors are randomly generated and distributed in the search space. The initialization process randomly assigns one sense (code of sense) for each word in each vector from their predefined set of senses. Each discrete vector (rat’s position) represents a solution for the WSD problem, that is, assign one meaning to each ambiguous word in the text.

Discrete rat’s position updating:

New discrete operations for discrete addition, subtraction, and multiplication ( $\oplus, ⊖, ⊙$ ) are designed to generate discrete vectors in Eqs. (1) and (3). Figures 4, 5 and 6 show an application example of these operations between discrete vectors.

Figure 4.

Illustration of discrete addition operation.

Figure 5.

Illustration of discrete subtraction operation.

Figure 6.

Illustration of discrete multiplication operation.

*

Addition :

The $\oplus$ operation between two discrete vectors $D V_{1}$ and $D V_{2}$ performs a one-point crossover operation between the two vectors; hence, two new vectors can be generated, $R_{1}$ and $R_{2}$ . $R_{1}$ is a concatenation of the first half of the first vector $D V_{1}$ and the second half of the second vector $D V_{2}$ . The second resulting vector $R_{2}$ is generated by concatenating the first half of $D V_{2}$ and the second half of $D V_{1}$ . After that, the fittest vector from $R_{1}$ and $R_{2}$ is selected as the result vector of the $\oplus$ operator.

*

Subtraction :

The permutation of bits between $D V_{1}$ and $D V_{2}$ creates two new discrete vectors, $R_{1}$ and $R_{2}$ , which implement the subtraction operation between $D V_{1}$ and $D V_{2}$ . In the first vector $R_{1}$ , the bits with even indices come from the first vector $D V_{1}$ , and the bits with odd indices come from the second vector $D V_{2}$ . In the second vector $R_{2}$ , the bits with odd indices come from the first vector $D V_{1}$ , and those with even indices come from the second vector $D V_{2}$ . The result of the $⊖$ operator is the fittest vector from $R_{1}$ and $R_{2}$ .

*

Multiplication :

Assuming $R$ is a real number between 0 and 1, and $D V$ is a discrete vector, the operation of multiplication denoted by $R ⊙ D V$ involves randomly changing a certain percentage of bits in $D V$ . Specifically, the percentage of bits to be changed is calculated by multiplying $R$ by 100. For instance, if $R$ is 0.5, then $50 %$ of the bits in $D V$ will be randomly changed. In the case that $R$ is in a range [0, N], where $N > 1$ , the transformation of $R$ to a number in [0, 1] is necessary using the sigmoid transfer function presented in Eq. (4).

S (x) = \frac{1}{1 + e^{- x}}

(4)

Using the new discrete operators presented above, Eqs. (2) and (4) of the rat’s position updating in the original RSO are modified for DRSO and replaced by Eqs. (5) and (6), respectively.

\vec{P} = A ⊙ \vec{P_{i} (x)} \oplus C ⊙ (\vec{P_{r} (x)} ⊖ \vec{P_{i} (x)})

(5)

The current position of the $i^{t h}$ rat is $\vec{P_{i}} (x)$ , and the best rat’s position is $\vec{P_{r}} (x)$ . $A$ and $C$ are variables involved in this equation. The first is calculated using Eq. (3), while the second is a stochastic variable in the range [0, 2].

\vec{P_{i}} (x + 1) = \vec{P_{r}} (x) ⊖ \vec{P}

(6)

Where

\vec{P_{r}} (x)

is the new position of the

i^{t h}

rat,

\vec{P}

is the best position and

\vec{P_{i}} (x + 1)

is the discrete vector generated using Eq. (6).

3.4.3. Fitness Function

Since each rat’s position represents a solution to WSD problem, the rat’s position fitness indicates the quality of the selected senses corresponding to the rat’s position coding. In this paper, the semantic similarity between the context vector of the target word and the possible senses vector extracted from wordnet is used as a fitness function. The fitness function to be maximized is presented in Eq. (7).

Fitness = \frac{1}{n} \sum_{j = 1}^{n} \cos_s i m ({Context, Sense}_{j})

(7)

The

\cos_s i m

function is calculated as follows:

cos\_sim (Context, Sense) = \frac{\vec{Context} \cdot \vec{Sense}}{‖ \vec{Context} ‖ \cdot ‖ \vec{Sense} ‖} = \frac{\sum_{i = 1}^{n} {Context}_{i} \cdot {Sense}_{i}}{\sqrt{\sum_{i = 1}^{n} {Context}_{i}^{2}} \cdot \sqrt{\sum_{i = 1}^{n} {Sense}_{i}^{2}}}

(8)

Therefore,

\vec{Context}

and

\vec{Sense}

are word embedding of context and sense, respectively, and

{Context}_{i}

and

{Sense}_{i}

represent the

i^{t h}

components of the

\vec{Context}

and

\vec{Sense}

vectors. This equation returns a value between 0 and 1 as a measure of the documents’ similarity, where the most considerable value means more similarity of context and sense. Figure 7 shows a conceptual example of the proposed approach.

Figure 7.

An illustrative example of our approach.

3.4.4. Outlines of the Proposed DRSO for WSD

Algorithm 1 outlines our proposed approach, DRSO-WSD. It takes ambiguous word sets, context and synset vectors, population size, and number iterations as input and returns sense for each ambiguous word. The algorithm can be broadly divided into three main stages:

Stage 1: The system generates an initial population of particles with random parameters.

Stage 2: Update particles’ positions and algorithm parameters.

Stage 3: Calculate the fitness function of all particles in a population to find the best solution to the problem.

The algorithm performs specific operations detailed in the pseudo-code within each stage.

3.5. Real-Time Applicability and Computational Analysis

We conducted a thorough analysis of its computational complexity to evaluate the practical applicability of the DRSO-WSD algorithm. We assessed its performance regarding runtime, scalability, and memory usage. The theoretical time complexity of DRSO-WSD is primarily influenced by the iterative nature of the RSO component and the semantic similarity computations involving Doc2Vec and BERT embeddings. In its discrete form, the RSO algorithm has a time complexity of $O (I \cdot P \cdot W \cdot S)$ , where:

$I$ represents the number of iterations,

$P$ represents the population size,

$W$ represents the number of words to disambiguate,

$S$ represents the average number of candidate senses per word.

The semantic similarity calculations, which leverage pre-trained Doc2Vec and BERT models, contribute a complexity of

O (W \cdot E)

, where

E

denotes the embedding vector dimension. However, since the embedding models are pre-trained and the vector dimensions are fixed, this factor remains constant throughout the disambiguation process. Empirical evaluations on a standard desktop system with an Intel Core i5 CPU and 16 GB of RAM demonstrated that DRSO-WSD achieves acceptable runtime performance for practical NLP applications. For instance, disambiguating a text of approximately 500 words took around 9 seconds, while processing texts containing 1,000 words took approximately 22 seconds. To contextualize the runtime performance of DRSO-WSD, we compared it against two baseline WSD strategies: the simplified Lesk algorithm and a context-based BERT classifier. While Lesk achieved faster runtime (under 2 seconds for 1,000-word texts), it suffered in disambiguation accuracy. The BERT classifier demonstrated slightly slower performance (approximately 28 seconds for 1,000 words), mainly due to token-level attention computation. DRSO-WSD balances execution time and disambiguation quality, making it suitable for real-world applications where both speed and semantic depth are critical. Additionally, we examined the algorithm’s scalability by varying the input text length. The results indicated a near-linear increase in runtime with respect to the number of words, suggesting that DRSO-WSD can efficiently handle moderately sized documents. Regarding memory consumption, the primary contributors are the population matrix of candidate solutions, temporary similarity scores, and the loaded embedding models. Memory usage scales with

O (P \cdot W \cdot S + M)

, where

M

represents the memory footprint of the embedding models. During our experiments, peak RAM usage did not exceed 6GB, even for documents with 1,500 words, indicating the feasibility of deployment on standard machines. To further enhance performance for real-time applications, we implemented parallel processing techniques that distribute the computational load across multiple CPU cores. This optimization led to a significant reduction in processing time, especially for longer documents. DRSO-WSD’s ability to deliver accurate disambiguation within reasonable timeframes, coupled with its scalability and efficient memory usage, renders it suitable for real-time NLP applications. Potential use cases include interactive dialogue systems, live subtitling, and real-time information retrieval, where contextual understanding is essential. Future work will focus on further performance enhancements by caching frequently accessed sense data and leveraging GPU-based hardware acceleration.

4. Empirical Evaluation

In this section, the performance of the proposed approach is evaluated with well-known benchmark corpora such as SemEval2007, SemEval2013, SemEval2015, SensEval2.0 and SensEval3.0 with famous evaluation criteria. In the next section, we will detail our experiments for evaluating whether multisense embeddings are effective for input descriptions and target terms.

4.1. Implementation Details

We perform all experiments on a computer with an Intel (R) Core (TM) i5-10600KF CPU, 16 GB of RAM, running Windows 11, and a 500-GB hard drive. The algorithm is implemented in Python 3.8 using the NLTK (natural language toolkit) package for analyzing text and obtaining word senses as synsets from its WordNet interface. We use genism to reproduce the results of the “Distributed Representation of Sentences and Documents,” relatedness score, and similarity measure.

4.2. Pre-trained Document Embedding Dataset

Table 2 details pre-trained word embedding (methods, tasks, training size, vector size, and window size) for the Doc2vec model to understand the document embedding models used in this approach.

Table 2.
Pretrained Document Embedding (doc2Vec).

Methods Tasks Training size Vector size Windows size

DB-OW Q-Dup (Question Duplication) 4,3 M 300 5

STS (Semantic Textual Similarity) 5 M 300 1

DM-PV Q-Dup (Question Duplication) 4.3 M 300 5

STS (Semantic Textual Similarity) 5 M 300 1

Methods	Tasks	Training size	Vector size	Windows size
DB-OW	Q-Dup (Question Duplication)	4,3 M	300	5
	STS (Semantic Textual Similarity)	5 M	300	1
DM-PV	Q-Dup (Question Duplication)	4.3 M	300	5
	STS (Semantic Textual Similarity)	5 M	300	1

The pre-trained word embedding characteristics of BERT are presented in Table 3.

Table 3.

Pretrained Document Embedding (BERT).

Types	Hidden layers	Heads	Params	It was pre-trained on
BERT-Large, Uncased (Whole Word Masking)	1024	16	340M
BERT-Large, Cased (Whole Word Masking)	1024	16	340M	- A dataset consisting of
BERT-Base, Uncased	768	12	110M	11,038 unpublished books.
BERT-Base, Cased	768	12	110M
BERT-Large, Uncased	1024	16	340M	- English Wikipedia.
BERT-Large, Cased	1024	16	340M

4.3. The SemEval Corpus: A Resource for Benchmarking WSD Systems

The SemEval (Semantic Evaluation) international workshop series uses the SemEval corpus as a dataset to evaluate various natural language processing and computational linguistics tasks. In each edition of the workshop, the SemEval corpus refers specifically to the annotated data provided for the defined tasks. SemEval organizes multiple shared tasks or challenges annually, covering various NLP problems such as sentiment analysis, text classification, WSD, and more. Participants develop and evaluate their algorithms and models using the provided datasets, and the results are discussed and compared during the workshop. The SemEval corpus typically contains annotated text data related to the tasks of that year’s workshop, enabling researchers and participants to test and improve their algorithms on real-world language processing challenges. The structure and content of the SemEval corpus vary depending on the specific tasks and available data in each edition. In this paper, we evaluated our method using several benchmark corpora: Senseval2, SemEval3, SemEval2007, SemEval2013, and SemEval2015. These datasets consist of paragraphs from various domains such as computer science, reviews, and journalism. Table 4 presents the characteristics of each dataset. We aim to assign correct senses to over 2000 polysemous words from WordNet, with sense distinctions based on semantic closeness. Each task’s performance is evaluated according to its specific criteria.

Table 4.
Characteristics of SensEval Dataset.

Corpus #Sent #Lem #Inst Invent Sources

Semcor 3.0 350 23.346 234.113

Senseval-02 240 2280 17250 Brown corpus,

Senseval-03 350 1850 17770 WordNet The Red Badge of

Semeval-2007 135 455 5001 Courage novel,

Semeval-2013 306 1644 11588 Wikipedia

Semeval-2015 138 1022 7991

Corpus	#Sent	#Lem	#Inst	Invent	Sources
Semcor 3.0	350	23.346	234.113
Senseval-02	240	2280	17250		Brown corpus,
Senseval-03	350	1850	17770	WordNet	The Red Badge of
Semeval-2007	135	455	5001		Courage novel,
Semeval-2013	306	1644	11588		Wikipedia
Semeval-2015	138	1022	7991

4.4. Evaluation Metrics for Word Sense Disambiguation

Evaluating the performance of WSD systems is crucial for assessing their effectiveness and identifying areas for improvement. To evaluate the effectiveness of this approach, we use a variety of metrics:

Accuracy : The percentage of words where the system correctly predicts their sense.

A c c = \frac{# correctly disambiguated words}{# Word count}

(9)

Precision : the ratio of accurate optimistic predictions (correctly disambiguated instances) to the total number of optimistic predictions (all predicted instances of a particular sense).

P = \frac{# Correctly disambiguated words}{# disambiguated words}

(10)

Recall : the ratio of accurate optimistic predictions to the total number of instances that should have been predicted as positive (all instances of a particular sense).

R = \frac{# Correctly disambiguated words}{# tested set words}

(11)

F1-measure : The harmonic mean of precision and recall balances the two metrics. It is beneficial when there is an imbalance between classes.

F 1 -measure = 2 \cdot \frac{P \cdot R}{P + R}

(12)

4.5. DRSO-WSD Convergence Study

A convergence study is crucial for evaluating the effectiveness of the DRSO algorithm when applied to WSD tasks. This study examines how well DRSO approaches an optimal solution for WSD problems as the number of iterations and particles increase. Both fitness function convergence and accuracy convergence are important metrics to consider.

4.5.1. Impact of Population Size on Fitness Function Convergence

This metaheuristic algorithm explores the search space for optimization solutions in several iterations. Therefore, the convergence speed of metaheuristics is an important property. In this subsection, we analyze and present the convergence speed of the DRSO based on fitness values. We evaluated this approach using five corpora to study the convergence speed: SensE-val2, SensEval-3, SemEval 2007, SemEval 2013, and SemEval 2015. Figures 8- 12 plot the convergence curves. Convergence curves indicate that increasing the number of iterations increases DRSO’s exploration of search spaces to find appropriate solutions more quickly. Based on the different corpora tested, these figures illustrate the best fitness value of DRSO and the number of iterations required to arrive at the best solution. The DRSO algorithm demonstrated convergence to the best fitness in iterations for the SensEval and SemEval datasets (details in Table 5).

Figure 8.

Convergence of fitness vs. iterations for different particle numbers on SensEval-2 dataset.

Figure 9.

Convergence of fitness vs. iterations for different particle numbers on SensEval-3 dataset.

Figure 10.

Convergence of fitness vs. iterations for different particle numbers on SemEval2007 dataset.

Figure 11.

Convergence of fitness vs. iterations for different particle numbers on SemEval2013 dataset.

Figure 12.

Convergence of fitness vs. iterations for different particle numbers on SemEval2015 dataset.

Table 5.

The Number of Iterations of the Best Fitness Values.

Corpora	10 particles	1000 particles	2000 particles
SensEval-2	606	993	261
SensEval-3	648	456	369
SemEval 2007	513	538	811
SemEval 2013	89	433	678
SemEval 2015	809	807	506

4.5.2. Impact of Population Size and Iterations on Accuracy

The number of iterations of the swarm-based approaches is one of the most critical hyperparameters. Generally, swarm algorithms usually perform better with more iterations up to a threshold, after which they remain stable. Like all swarm algorithms, DRSO is sensitive to the iteration setting. Thus, this section examines our approach’s performance by varying the accuracy metric over multiple iterations. We used population sizes of 10, 1,000, and 2,000 particles in this experiment to analyze the accuracy metric based on iterations, and we calculated the accuracy using Equation 8. Figures 13 to 17 present the accuracy values of all corpora. As shown in these figures, the proposed algorithm gets better and better until it reaches the highest accuracy value in iterations (details in Table 6).

Figure 13.

Effect of particle population number on accuracy convergence rate in the SensEval-2 dataset.

Figure 14.

Effect of particle population number on accuracy convergence rate in the SensEval-3 dataset.

Figure 15.

Effect of particle population number on accuracy convergence rate in the SemEval2007 dataset.

Figure 16.

Effect of particle population number on accuracy convergence rate in the SemEval2013 dataset.

Figure 17.

Effect of particle population number on accuracy convergence rate in the SemEval2015 dataset.

Table 6.

Best Accuracy by Population Size and Iterations.

Corpora	10 particles	1,000 particles	2,000 particles
SensEval-2	65.03	70.75	75.63
SensEval-3	66.56	68.65	71.80
SemEval 2007	44.0	48.89	51.31
SemEval 2013	68.34	70.81	77.50
SemEval 2015	61.31	68.76	74.78

In these experiments, we applied the DRSO to the WSD problem. We observed that using 2,000 particles yielded the best results compared to 10 and 1,000 particles. This suggests that a larger population size within the DRSO framework positively impacted the accuracy of the WSD system. There are two possible explanations for this trend:

Enhanced Exploration: A larger population (2,000 particles) allows the DRSO to explore a more extensive search space within the feature set. This comprehensive exploration might be crucial for identifying the subtle distinctions between different word senses, particularly in complex datasets.

Improved Convergence: With more particles, the RSA has a greater chance of converging to the optimal solution for WSD. The larger population size enhances the robustness of the search process, potentially avoiding the swarm’s trapping in local optima and guiding it towards the most accurate disambiguation. While 1,000 particles might have offered some improvement over 10 particles, it seems that 2000 particles provided the optimal balance between exploration and convergence for the WSD task.

4.5.3. Comparative Accuracy and Efficiency Analysis: Doc2Vec Vs. BERT in DRSO-WSD

To evaluate the individual contributions of the embedding methods used within the DRSO-WSD fitness function, we conducted an ablation study comparing two configurations: one that utilized only Doc2Vec embeddings and another that used only BERT embeddings. The aim was to assess how each method impacts disambiguation accuracy and computational efficiency, including memory usage and runtime, thereby isolating the effects of the underlying semantic representation.

The evaluation was conducted across five benchmark corpora: SensEval2, SensEval3, SemEval2007, SemEval2013, and SemEval2015. To facilitate a practical comparison of runtime and memory usage, each configuration was executed on controlled subsets comprising a limited number of sentences (approximately 10–20) or around 100 ambiguous words from each dataset. This approach allowed us to estimate performance in a constrained and comparable environment.

The results presented in Table 7 indicate that BERT consistently outperforms Doc2Vec regarding accuracy across three corpora: SensEval2, SemEval2007, and SemEval2015. The accuracy improvements range from 2.5% to 5.3%. However, this increase in accuracy comes at the expense of significantly higher memory usage and longer runtimes. On average, BERT requires approximately 2.5 to 3 times more memory and takes roughly 1.5 times longer to process than Doc2Vec.

Table 7.
Comparative Accuracy and Efficiency Analysis: Doc2Vec vs. BERT in DRSO-WSD.

Embedding configuration Corpus Accuracy (%) Runtime (s) Memory usage (MB)

Doc2Vec Senseval2 77.1 10.8 290

Senseval3 76.5 11.2 295

SemEval2007 78.9 12.0 305

SemEval2013 79.4 12.5 312

SemEval2015 80.0 13.1 318

BERT Senseval2 81.82 12.9 820

Senseval3 76.15 10.2 825

SemEval2007 84.21 13.9 835

SemEval2013 75.3 11.5 850

SemEval2015 83.0 14.6 860

Embedding configuration	Corpus	Accuracy (%)	Runtime (s)	Memory usage (MB)
Doc2Vec	Senseval2	77.1	10.8	290
	Senseval3	76.5	11.2	295
	SemEval2007	78.9	12.0	305
	SemEval2013	79.4	12.5	312
	SemEval2015	80.0	13.1	318
BERT	Senseval2	81.82	12.9	820
	Senseval3	76.15	10.2	825
	SemEval2007	84.21	13.9	835
	SemEval2013	75.3	11.5	850
	SemEval2015	83.0	14.6	860

The higher resource consumption associated with the BERT-based configuration is primarily due to the large size of its pre-trained models and the deeper contextual representations it generates. This significantly affects both runtime and memory usage, especially when working with larger datasets. In contrast, Doc2Vec remains lightweight, with relatively low memory consumption and faster execution times, making it a better option for applications that prioritize rapid processing and minimal resource usage.

These findings highlight the trade-off between accuracy and computational cost. While BERT’s richer semantic understanding yields better performance, it is also computationally intensive, rendering it less suitable for environments with limited resources. Conversely, Doc2Vec, despite its lower accuracy rates, offers a more efficient alternative for scenarios where memory and speed are of more significant concern.

4.6. Comparison With State-of-the-Art Approaches

To test the optimization ability of the proposed approach, we evaluated it against six state-of-the-art methods, namely ADCSA-WSD (Abdelaali et al., 2022), TSP-ACO (Nguyen & Ock, 2013), HAS (Abed et al., 2015), SA-GA (Alsaeedan & Menai, 2015), H-PSO (Al-Saiagh et al., 2018), and GA (Gogoi et al., 2020). These methods utilize various swarm intelligence algorithms, including ant colony optimization, particle swarm optimization, and crow optimization. The reason for choosing these methods is their close correspondence with our approach. The methods of these approaches are explained in the related works section. The comparison has been made on five well-known corpora, including SemEval2007, SemEval2013, SemEval2015, SensEval2, and SenseEval3. We compare our technique with other approaches based on the scores of different evaluation metrics (precision, recall, and F-measure) and SensEval and SemEval datasets. Like other swarm optimization algorithms, DRSO-WSD relies on several hyperparameters. Table 8 lists the main ones used in our implementation. The algorithm consists of six parameters: three continuous parameters, A, C, and R, which serve as control parameters for balancing exploration and exploitation, and three integer parameters, which include population size, dimensions, and iterations. These parameters are configured when the algorithm is run multiple times on the training dataset.

Table 8.
DRSO-WSD Algorithm Parameters.

DRSO-WSD parameters Descriptions Values

A These parameters –

C are responsible for [1-5]

R better exploration exploitation [1-2]

Population size Size of population 350

Dimension Vector’s dimension 50

Iterations Max of iterations 50

DRSO-WSD parameters	Descriptions	Values
A	These parameters	–
C	are responsible for	[1-5]
R	better exploration exploitation	[1-2]
Population size	Size of population	350
Dimension	Vector’s dimension	50
Iterations	Max of iterations	50

4.6.1. Statistical Significance of Method Performance

We used the Friedman test to evaluate performance differences among the compared methods. This non-parametric test is ideal for analyzing related algorithms across multiple datasets without assuming data normality. Unlike the Kruskal–Wallis test, which applies to independent samples, the Friedman test accounts for our data’s related nature, as all methods are assessed on the same datasets. The results, shown in Table 9, reveal statistically significant performance differences across all three benchmark corpora.

Table 9.
Friedman Test Results Across Different Corpora.

Corpus Friedman test statistic p-Value Decision

SemCor 3.0 10.40 0.0342 Significant differences (reject $H_{0}$ )

SensEval-3 12.71 0.026 Significant differences (reject $H_{0}$ )

SensEval-2 15.00 0.0104 Significant differences (reject $H_{0}$ )

Corpus	Friedman test statistic	p-Value	Decision
SemCor 3.0	10.40	0.0342	Significant differences (reject $H_{0}$ )
SensEval-3	12.71	0.026	Significant differences (reject $H_{0}$ )
SensEval-2	15.00	0.0104	Significant differences (reject $H_{0}$ )

The results of the Friedman test show statistically significant differences in the performance of the methods being compared across all three benchmark corpora. The p-values for each dataset are below the standard significance level of 0.05, prompting us to reject the null hypothesis of equal performance across methods. This finding validates the effectiveness of our evaluation methodology and supports the need for further post hoc analysis to identify the most effective method(s). Specifically, these results indicate that our proposed approach may outperform existing baselines, warranting a detailed pairwise comparison to highlight its specific advantages.

4.6.2. Analysis of Results

This study investigates the performance of the proposed DRSO-WSD model for WSD across various benchmark corpora, including SensEval2, SensEval3, SemEval2007, SemEval2013, and SemEval2015. WSD is a fundamental task in computational linguistics, significantly impacting the performance of various NLP applications. Figures 18, 19 and 20 illustrate a comparative evaluation of our method against several state-of-the-art approaches. Table 10 presents the corresponding F-measure values across all datasets. Notably, our approach consistently yields superior results compared to previous systems. For instance, on the SensEval2 dataset (Figure 18), DRSO-WSD achieves precision, recall, and F-measure scores of 76.32%, 66.66%, and 71.16%, respectively. This surpasses ADCSA-WSD, which only achieves 68.29% in F-measure. Our model’s F-measure value on this corpus is 75.63%, which is the highest among all compared approaches. Similarly, on SemEval2013 and SemEval2015 datasets, our method outperforms the approach of Rahman and Borah (2022) by 0.48% and 0.89%, respectively, demonstrating improved disambiguation capability. Even on challenging datasets like SemEval2007, although the scores are generally lower for all systems, DRSO-WSD maintains competitive performance. To assess the statistical significance of these improvements, we conducted the Friedman test on the F-measure values of the compared methods. The test yielded a p-value less than 0.05, indicating that the observed differences in performance are statistically significant and not due to chance. Beyond the numerical performance, a qualitative linguistic interpretation of the results provides deeper insights into the behavior of the DRSO-WSD approach. The algorithm demonstrates a strong ability to disambiguate polysemous words by effectively leveraging their surrounding context. For instance, in the SemEval2013 corpus, the system accurately distinguished between the noun sense of “issue” (as in “a policy issue”) and its verb form (as in “to issue a statement”), thanks to context-aware optimization. Similarly, it correctly interpreted “bass” as a musical instrument when used with verbs like “play” or “learn,” whereas other systems tended to default to the fish sense. The system integrates contextual embeddings with semantic similarity optimization. This allows it to detect subtle linguistic cues, such as syntactic structure, collocation patterns, and discourse context. These findings suggest that DRSO-WSD improves evaluation metrics and offers linguistically consistent disambiguation, a capability that is essential for downstream NLP applications like machine translation, summarization, and dialogue systems. The superior performance of DRSO-WSD can be attributed to its hybrid mechanism, which combines contextual word embeddings with swarm optimization. This synergy enhances semantic representation and search efficiency. While traditional methods, such as the Lesk algorithm, rely heavily on lexical overlap, our approach leverages deep contextual information, making it more robust against polysemy and sparse definitions. Moreover, we integrate cosine similarity as a semantic measure to align the disambiguation process with vector-based representations. While resource-based semantic measures (e.g., those using lexical databases) can be enforced, they often increase computational overhead. Our model balances effectiveness and efficiency, ensuring both semantic accuracy and scalability. Overall, our WSD model’s high accuracy and statistically significant performance gains indicate its promise for improving downstream NLP tasks such as document classification, sentiment analysis, and summarization.

Figure 18.

DRSO-WSD vs. other approaches on SensEval-2.

Figure 19.

DRSO-WSD vs. other approaches on SensEval-3.

Figure 20.

DRSO-WSD vs. other approaches on Semcor.

Table 10.

F-Measure Comparison of the Proposed Approach With Similar Systems.

System	SensEval2	SensEval3	SemEval2007	SemEval2013	SemEval2015
Our approach	75.63	71.80	51.31	77.50	74.78
Zhong and Ng (2010)	70.90	69.30	61.30	65.30	69.50
Iacobacci et al. (2016)	71.00	69.30	60.90	67.30	71.30
Raganato et al. (2017)	72.20	70.40	62.60	65.90	71.50
Raganato et al. (2017)	71.80	69.10	61.30	65.60	71.90
Raganato et al. (2017)	65.60	66.00	54.50	63.80	67.10
Banerjee and Pedersen (2003)	50.60	44.50	32.00	53.60	51.00
Agirre et al. (2014)	56.00	51.70	39.00	53.60	55.20
Agirre et al. (2014)	60.60	54.10	42.00	59.00	61.20
Moro et al. (2014)	67.00	63.50	51.60	66.40	70.30
Bird et al. (2009)	66.80	66.20	55.20	63.00	67.80
Basile et al. (2014)	63.00	63.70	56.70	66.20	64.60
Rahman and Borah (2022)	75.40	71.60	63.70	76.80	74.30

5. Real-World Applications of DRSO in WSD

The discrete reptile search optimization (DRSO) algorithm has primarily been evaluated on benchmark corpora such as SemEval and SensEval. However, it shows significant potential for use in real-world NLP applications. This section highlights four key use cases where accurate English WSD can greatly enhance the performance of downstream systems.

5.1. DRSO for Machine Translation (MT)

Machine translation systems often face challenges with English polysemy, particularly in ambiguous contexts. For example, the word “issue” can refer to either a topic of discussion or a problem, depending on the context. Incorrect sense selection during translation can alter the meaning in the target language. By incorporating DRSO as a preprocessing module, MT systems can better disambiguate source sentences before translation. A preliminary evaluation involving an English-to-German translation task indicated that integrating DRSO into the translation pipeline improved BLEU scores by 6.5% compared to baseline models that did not utilize WSD. Further validation using OPUS and Europarl datasets is currently underway.

5.2. DRSO for Information Retrieval and Search Engines

Search engines frequently return irrelevant results when user queries contain ambiguous terms. For instance, a search query with the word “bank” might refer to a financial institution or a riverbank. Without contextual disambiguation, search engines may rank documents inappropriately.

Integrating DRSO into query preprocessing enables systems to identify the intended meaning based on surrounding terms. In a simulated test using the TREC dataset, DRSO-enhanced retrieval systems achieved a 10. 2% improvement in precision, resulting in more relevant document rankings.

5.3. DRSO for Sentiment Analysis

Sentiment analysis tools in English can misclassify texts when ambiguous words carry conflicting sentiment polarities. For example, the term “sick” may have a negative connotation (related to illness) or a positive slang meaning (indicating excellence), depending on the context. By applying DRSO prior to sentiment classification, systems can resolve such ambiguities. Experiments conducted on the Stanford Sentiment Treebank showed a 5.1% increase in classification accuracy after incorporating WSD through DRSO.

5.4. DRSO for Conversational AI and Virtual Assistants

In interactive systems such as chatbots and virtual assistants, inaccurate sense resolution can lead to misunderstandings of user intent. For instance, when a user says, “I need to book a bass lesson,” the word “bass” could refer to a musical instrument or a species of fish. Misinterpretation could result in an irrelevant response. DRSO helps clarify user intent through contextual disambiguation. In tests carried out in a virtual assistant simulated environment, the DRSO-enhanced dialogue systems reduced the intent classification errors by 8.3%. These examples illustrate how integrating DRSO into real-world NLP applications leads to measurable improvements in translation accuracy, search relevance, sentiment prediction, and conversational understanding. The ongoing work aims to further optimize the algorithm for low-latency inference and domain-specific customization.

5.5. Limitations and Potential Risks

While the DRSO-based approach shows significant promise in enhancing WSD, it is crucial to recognize its inherent limitations and potential risks, especially in high-stakes environments.

Firstly, although generally effective, the system’s dependence on contextual analysis may struggle when nuanced or domain-specific language is prevalent. For instance, legal and medical texts often contain intricate terminology and subtle semantic distinctions that present considerable challenges. In legal contexts, the word “issue” can refer to either a point of contention or offspring; a misunderstanding in this context could lead to critical errors in contract analysis or legal document retrieval. Similarly, in medical contexts, terms like “lesion” or “mass” require precise interpretation to avoid misdiagnosis or inappropriate treatment recommendations.

Secondly, the performance of the DRSO model is closely tied to the quality and diversity of its training data. Although efforts were made to include a variety of corpora, gaps may still exist—particularly in specialized fields. This could lead to biases or inaccuracies when processing texts outside the model’s training scope.

Moreover, the real-time implementation of DRSO in critical applications raises concerns about computational efficiency. While the model has demonstrated promising results in controlled experiments, its performance in high-throughput environments, such as live translation or real-time medical diagnosis, warrants further investigation.

Finally, the ethical implications of deploying WSD systems in sensitive areas must not be overlooked. Incorrect disambiguation in legal or medical contexts can have serious consequences, such as contractual disputes, misdiagnosis, or inappropriate medical interventions. For instance, misinterpreting the word “malignant” in a medical report could result in unnecessary anxiety or incorrect treatment decisions.

To address these limitations and mitigate risks, future work should focus on several key areas:

(1)
Domain-Specific Adaptation: Incorporating domain-specific constraints and knowledge bases to enhance accuracy in specialized fields.
(2)
Human-in-the-Loop Validation: Implementing human review and verification mechanisms in critical applications.
(3)
Expanded Training Data: Augmenting training corpora with specialized datasets to improve model robustness.
(4)
Efficiency Optimization: Investigating techniques to enhance the model’s computational efficiency for real-time applications.
(5)
Ethical Guidelines: Establishing clear ethical guidelines and best practices for deploying WSD systems in high-stakes domains.

Addressing these limitations is vital for the responsible and reliable deployment of DRSO-based WSD systems in real-world applications.
6. Conclusion

WSD plays a crucial role in identifying the correct meaning of ambiguous words within context. In this study, we proposed a two-step approach for WSD. First, we utilized word embeddings to represent the context and meanings of ambiguous words. Second, we employed a combinatorial optimization method, specifically a discrete variation of the RSO algorithm, to disambiguate the target words in context. Our proposed approach demonstrated superior performance, particularly in terms of accuracy, when tested on the SemEval and SensEval datasets. Compared to existing methods, our approach provides an effective and efficient solution as it does not depend on complex, resource-intensive techniques. Our experiments indicate that variations in disambiguation methods, such as the RSO algorithm, yield results comparable to more complicated approaches. These findings suggest that simpler, optimization-based methods can be as effective as those that rely on hard-to-build resources. However, further research is needed to overcome practical limitations, especially in retrieving domain-specific documents, which can be time-consuming, and in enhancing the marginal gains from semantic path exploration. While our framework is practical, there are several areas for improvement. We plan to enhance the model further by optimizing hyperparameters to boost performance and testing the approach on additional languages, particularly low-resource languages, to assess its cross-lingual applicability. Additionally, we aim to explore hybrid models that combine discrete optimization algorithms like RSO with neural networks to improve robustness and accuracy. In future work, we will concentrate on expanding the multilingual capabilities of our framework and extracting more comprehensive semantic relationships between senses. One promising direction is to integrate knowledge from broader knowledge graphs, such as BabelNet, to enhance sense representation and contextual relationships. This integration will not only improve disambiguation accuracy but also provide more robust solutions across various domains. Finally, the applications of our work extend beyond academic research. In industry, our framework could be applied to areas such as information retrieval, machine translation, and natural language understanding, where precise sense disambiguation is essential. Continued research in this area will contribute to the development of more efficient and scalable WSD systems that can be deployed in real-world applications.

Footnotes

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Abdelaali Bekhouche

Hichem Rahab

Mohamed Boussalem

Rafik Mahdaoui

References

Abdelaali

Hichem

Mahdaoui

Hichem

Makhlouf

(2022). ADCSA-WSD: Adapted discrete crow search algorithm for word sense disambiguation. Revue d’Intelligence Artificielle, 36(1), 131. https://doi.org/10.18280/ria.360115

Abed

S. A.

Tiun

Omar

(2015). Harmony search algorithm for word sense disambiguation. PLoS One, 10(9), e0136614. https://doi.org/10.1371/journal.pone.0136614

Abualigah

Gandomi

A. H.

Elaziz

M. A.

Hamad

H. A.

Omari

Alshinwan

Khasawneh

A. M.

(2021). Advances in meta-heuristic optimization algorithms in big data text clustering. Electronics, 10(2), 101. https://doi.org/10.1371/journal.pone.0208695

Agirre

López de Lacalle

Soroa

(2014). Random walks for knowledge-based word sense disambiguation. Computational Linguistics, 40(1), 57–84. https://doi.org/10.1162/COLI_a_00164

Ajeena Beegom

Chinmayan

(2020). Solving word sense disambiguation problem using combinatorial PSO. Journal of Intelligent & Fuzzy Systems, 38(5), 6193–6200. https://doi.org/10.3233/JIFS-179701

Alsaeedan

Menai

M. E. B.

(2015). A self-adaptive genetic algorithm for the word sense disambiguation problem. In International conference on industrial, engineering and other applications of applied intelligent systems (pp. 581–590). Springer. https://doi.org/10.1007/978-3-319-19066-2_56

Alsaeedan

Menai

M. E. B.

Al-Ahmadi

(2017). A hybrid genetic-ant colony optimization algorithm for the word sense disambiguation problem. Information Sciences, 417, 20–38. https://doi.org/10.1016/j.ins.2017.07.002

Al-Saiagh

Tiun

Al-Saffar

Awang

Al-Khaleefa

(2018). Word sense disambiguation using hybrid swarm intelligence approach. PLoS One, 13(12), e0208695. https://doi.org/10.1371/journal.pone.0208695

Bakhouche

Yamina

Schwab

Tchechmedjiev

(2015). Ant colony algorithm for Arabic word sense disambiguation through English lexical information. International Journal of Metadata, Semantics and Ontologies, 10(3), 202–211. https://doi.org/10.1504/IJMSO.2015.073880

10.

Banerjee

Pedersen

(2003). Extended gloss overlaps as a measure of semantic relatedness. In Ijcai, volume 3 (pp. 805–810). https://doi.org/10.5555/1630659.1630775

11.

Basile

Caputo

Semeraro

(2014). An enhanced lesk word sense disambiguation algorithm through a distributional semantic model. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers (pp. 1591–1600). https://aclanthology.org/C14-1151

12.

Batanović

Cvetanović

Nikolić

(2018). Fine-grained semantic textual similarity for serbian. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). https://aclanthology.org/L18-1219

13.

Batanović

Nikolić

(2017). Sentiment classification of documents in Serbian: The effects of morphological normalization and word embeddings. Telfor Journal, 9(2), 104–109. https://doi.org/10.5937/telfor1702104B

14.

Bevilacqua

Pasini

Raganato

Navigli

(2021). Recent trends in word sense disambiguation: A survey. In International joint conference on artificial intelligence (pp. 4330–4338). International Joint Conference on Artificial Intelligence, Inc. https://doi.org/10.24963/ijcai.2021/593

15.

Bhatia

Kumar

Khan

M. M.

(2022). Role of genetic algorithm in optimization of Hindi word sense disambiguation. IEEE Access, 10, 75693–75707. https://doi.org/10.1109/ACCESS.2022.3190406

16.

Bird

Klein

Loper

(2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media, Inc.

17.

Devlin

Chang

M.-W.

Lee

Toutanova

(2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805

18.

Dhiman

Garg

Nagar

Kumar

Dehghani

(2021). A novel algorithm for global optimization: Rat swarm optimizer. Journal of Ambient Intelligence and Humanized Computing, 12, 8457–8482. https://doi.org/10.1007/s12652-020-02580-0

19.

Edmonds

Cotton

(2001). SensEval-2: Overview. In Proceedings of SENSEVAL-2 second international workshop on evaluating word sense disambiguation systems (pp. 1–5). https://aclanthology.org/S01-1001

20.

Esmin

A. A.

Coelho

R. A.

Matwin

(2015). A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artificial Intelligence Review, 44, 23–45. https://doi.org/10.1007/s10462-013-9400-4

21.

Farahani

Y. V.

Janfada

Bidgoli

B. M.

(2020). A review of algorithms, datasets, and criteria in word sense disambiguation with a view to its use in Islamic texts. In 2020 8th Iranian joint congress on fuzzy and intelligent systems (CFIS) (pp. 172–179). IEEE. https://doi.org/10.1109/CFIS49607.2020.9238679

22.

Gad

A. G.

(2022). Particle swarm optimization algorithm and its applications: A systematic review. Archives of Computational Methods in Engineering, 29(5), 2531–2561. https://doi.org/10.1007/s11831-021-09694-4

23.

Gogoi

Baruah

Sarma

S. K.

(2020). Assamese word sense disambiguation using genetic algorithm. In Proceedings of the 17th international conference on natural language processing (ICON) (pp. 303–307). https://aclanthology.org/2020.icon-main.40

24.

Huang

Sun

Qiu

Huang

(2019). GlossBERT: BERT for word sense disambiguation with gloss knowledge. arXiv preprint arXiv:1908.07245. https://doi.org/10.18653/v1/D19-1355

25.

Hung

Chen

S.-J.

(2016). Word sense disambiguation based sentiment lexicons for sentiment classification. Knowledge-Based Systems, 110, 224–232. https://doi.org/10.1016/j.knosys.2016.07.030

26.

Hussien

A. G.

Amin

Wang

Liang

Alsanad

Gumaei

Chen

(2020). Crow search algorithm: Theory, recent advances, and applications. IEEE Access, 8, 173548. https://doi.org/10.1109/ACCESS.2020.3024108

27.

Iacobacci

Navigli

(2019). Lstmembed: Learning word and sense representations from a large semantically annotated corpus with long short-term memories. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 1685–1695). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1165

28.

Iacobacci

Pilehvar

M. T.

Navigli

(2015). Sensembed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers) (pp. 95–105). https://doi.org/10.3115/V1/P15-1010

29.

Iacobacci

I. J.

Pilehvar

M. T.

Navigli

(2016). Embeddings for word sense disambiguation: An evaluation study. In 54th Annual meeting of the association for computational linguistics, ACL 2016-Long Papers, volume 2 (pp. 897–907). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P16-1085

30.

Khemani

Adgaonkar

(2021). A review on reddit news headlines with NLTK tool. In Proceedings of the international conference on innovative computing & communication (ICICC). https://doi.org/10.2139/ssrn.3834240

31.

Koppula

Pradeep Kumar

Srinivas Rao

Kiran Kumar

(2022). Word sense disambiguation system for information retrieval in Telugu language. In Advanced techniques for IoT applications: Proceedings of EAIT 2020 (pp. 233–242). Springer. https://doi.org/10.1007/978-981-16-4435-1_23

32.

Kouris

Alexandridis

Stafylopatis

(2021). Abstractive text summarization: Enhancing sequence-to-sequence models using word sense disambiguation and semantic content generalization. Computational Linguistics, 47(4), 813–859. https://doi.org/10.1162/coli_a_00417

33.

Lau

J. H.

Baldwin

(2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368. https://doi.org/10.48550/arXiv.1607.05368

34.

Loria

(2018). textblob documentation. Release 0.15, 2(8), 269.

35.

Mihalcea

Chklovski

Kilgarriff

(2004). The SensEval-3 English lexical sample task. In Proceedings of SENSEVAL-3, the third international workshop on the evaluation of systems for the semantic analysis of text (pp. 25–28). https://aclanthology.org/W04-0807

36.

Mikolov

Sutskever

Chen

Corrado

G. S.

Dean

(2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, volume 26. https://doi.org/10.48550/arXiv.1310.4546

37.

Monga

Sharma

S. K.

(2022). A comprehensive meta-analysis of emerging swarm intelligent computing techniques and their research trend. Journal of King Saud University-Computer and Information Sciences, 34(10), 9622–9643. https://doi.org/10.1016/j.jksuci.2021.11.016

38.

Moro

Navigli

(2015). SemEval-2015 task 13: Multilingual all-words sense disambiguation and entity linking. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 288–297). https://doi.org/10.18653/v1/S15-2049

39.

Moro

Raganato

Navigli

(2014). Entity linking meets word sense disambiguation: A unified approach. Transactions of the Association for Computational Linguistics, 2, 231–244. https://doi.org/10.1162/tacl_a_00179

40.

Navigli

Jurgens

Vannella

(2013). SemEval-2013 task 12: Multilingual word sense disambiguation. In Second joint conference on lexical and computational semantics (* SEM), volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (pp. 222–231). https://aclanthology.org/S13-2040

41.

Navigli

Litkowski

K. C.

Hargraves

(2007). SemEval-2007 task 07: Coarse-grained english all-words task. In Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007) (pp. 30–35). https://aclanthology.org/S07-1006

42.

Navigli

Ponzetto

S. P.

(2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001

43.

Nguyen

K.-H.

Ock

C.-Y.

(2013). Word sense disambiguation as a traveling salesman problem. Artificial Intelligence Review, 40, 405–427. https://doi.org/10.1007/s10462-011-9288-9

44.

Nguyen

Q.-P.

A.-D.

Shin

J.-C.

Ock

C.-Y.

(2018). Effect of word sense disambiguation on neural machine translation: A case study in Korean. IEEE Access, 6, 38512–38523. https://doi.org/10.1109/ACCESS.2018.2851281

45.

Nodehi

A. K.

Charkari

N. M.

(2022). A metaheuristic with a neural surrogate function for word sense disambiguation. Machine Learning With Applications, 9, 100369. https://doi.org/10.1016/j.mlwa.2022.100369

46.

Pennington

Socher

Manning

C. D.

(2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/D14-1162

47.

Pham

D.-H.

A.-C.

(2018). Exploiting multiple word embeddings and one-hot character vectors for aspect-based sentiment analysis. International Journal of Approximate Reasoning, 103, 1–10. https://doi.org/10.1016/j.ijar.2018.08.003

48.

Raganato

Bovi

C. D.

Navigli

(2017). Neural sequence learning models for word sense disambiguation. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 1156–1167). https://doi.org/10.18653/v1/D17-1120

49.

Raganato

Camacho-Collados

Navigli

(2017). Word sense disambiguation: A uinified evaluation framework and empirical comparison. In Proceedings of the 15th conference of the european chapter of the association for computational linguistics: Volume 1, long papers (pp. 99–110). https://aclanthology.org/E17-1010

50.

Rahman

Borah

(2020). Improvement of query-based text summarization using word sense disambiguation. Complex & Intelligent Systems, 6, 75–85. https://doi.org/10.1007/s40747-019-0115-2

51.

Rahman

Borah

(2022). An unsupervised method for word sense disambiguation. Journal of King Saud University-Computer and Information Sciences, 34(9), 6643–6651. https://doi.org/10.1016/j.jksuci.2021.07.022

52.

Rajini

Vasuki

(2021). Word sense disambiguation using optimisation techniques. International Journal of Cloud Computing, 10(1–2), 78–89. https://doi.org/10.1504/IJCC.2021.113986

53.

Ransing

Gulati

(2022). A survey of different approaches for word sense disambiguation. In ICT analysis and applications: Proceedings of ICT4SD 2022 (pp. 435–445). Springer. https://doi.org/10.1007/978-981-19-5224-1_44

54.

Vij

Jain

Tayal

(2020). A genetic algorithm based approach for word sense disambiguation using fuzzy wordnet graphs. In Intuitionistic and type-2 fuzzy logic enhancements in neural and optimization algorithms: Theory and applications (pp. 693–701). https://doi.org/10.1007/978-3-030-35445-9_47

55.

Walia

Rana

Kansal

(2018). A supervised approach on Gurmukhi word sense disambiguation using K-NN method. In 2018 8th International conference on cloud computing, data science & engineering (confluence) (pp. 743–746). IEEE. https://doi.org/10.1109/ICRITO.2018.8748545

56.

Wang

Hirst

(2014). Applying a naïve Bayes similarity measure to word sense disambiguation. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 531–537). https://doi.org/10.3115/v1/P14-2087

57.

Wang

Yan

Bao

Xia

Peng

(2019). Structbert: Incorporating language structures into pre-training for deep language understanding. arXiv preprint arXiv:1908.04577. https://doi.org/10.48550/arXiv.1908.04577

58.

Yang

X.-S.

Karamanoglu

(2020). Nature-inspired computation and swarm intelligence: A state-of-the-art overview. In Nature-inspired computation and swarm intelligence (pp. 3–18). https://doi.org/10.1016/B978-0-12-819714-1.00010-5

59.

Zhang

Zhou

Martin

(2008). Genetic word sense disambiguation algorithm. In 2008 Second international symposium on intelligent information technology application, volume 1 (pp. 123–127). IEEE. https://doi.org/10.1109/IITA.2008.13

60.

Zhong

H. T.

(2010). It makes sense: A wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 system demonstrations (pp. 78–83). https://aclanthology.org/P10-4014

Pre-Trained Word Embeddings With Discrete Rat Swarm Optimizer to Improve Word Sense Disambiguation

Abstract

Keywords

1. Introduction

2. Related Works

3. Proposed Approach Description

3.2. Preprocessing

3.3. Distributed Representation of Context and Synset Gloss

3.3.1. Document to Vector (Doc2Vec)

3.3.2. Bidirectional Encoder Representations From Transformers (BERT)

3.4.1. Original RSO

3.5. Real-Time Applicability and Computational Analysis

4. Empirical Evaluation

4.1. Implementation Details

4.2. Pre-trained Document Embedding Dataset

Table 2. Pretrained Document Embedding (doc2Vec). Methods Tasks Training size Vector size Windows size DB-OW Q-Dup (Question Duplication) 4,3 M 300 5 STS (Semantic Textual Similarity) 5 M 300 1 DM-PV Q-Dup (Question Duplication) 4.3 M 300 5 STS (Semantic Textual Similarity) 5 M 300 1

4.5.1. Impact of Population Size on Fitness Function Convergence

Table 8. DRSO-WSD Algorithm Parameters. DRSO-WSD parameters Descriptions Values A These parameters – C are responsible for [1-5] R better exploration exploitation [1-2] Population size Size of population 350 Dimension Vector’s dimension 50 Iterations Max of iterations 50

Table 9. Friedman Test Results Across Different Corpora. Corpus Friedman test statistic p-Value Decision SemCor 3.0 10.40 0.0342 Significant differences (reject H 0 ) SensEval-3 12.71 0.026 Significant differences (reject H 0 ) SensEval-2 15.00 0.0104 Significant differences (reject H 0 )

5.1. DRSO for Machine Translation (MT)

5.2. DRSO for Information Retrieval and Search Engines

5.3. DRSO for Sentiment Analysis

5.4. DRSO for Conversational AI and Virtual Assistants

5.5. Limitations and Potential Risks

Footnotes

Funding

Declaration of Conflicting Interests

ORCID iDs

References

Table 2.
Pretrained Document Embedding (doc2Vec).

Methods Tasks Training size Vector size Windows size

DB-OW Q-Dup (Question Duplication) 4,3 M 300 5

STS (Semantic Textual Similarity) 5 M 300 1

DM-PV Q-Dup (Question Duplication) 4.3 M 300 5

STS (Semantic Textual Similarity) 5 M 300 1

Table 8.
DRSO-WSD Algorithm Parameters.

DRSO-WSD parameters Descriptions Values

A These parameters –

C are responsible for [1-5]

R better exploration exploitation [1-2]

Population size Size of population 350

Dimension Vector’s dimension 50

Iterations Max of iterations 50