Abstract
Confused drug names are a common cause of medication errors, and are related to look-alike and sound-alike drug names. For the problem of identifying confused drug name pairs, individual similarity measures are used between the drug names. In the state-of-art, a logistic regression with the standard learning algorithm has been used to combine individual similarity measures. However, only three similarity measures have been combined but the results of previous research do not outperform with a statistical significance to any individual measure. In addition, the problem of potential confused drug names pairs presents a high unbalanced distribution of dataset that it is a hard problem to supervised machine learning models. In this paper, an improved combined logistic regression measure based on 21 individual measures is presented with the standard learning algorithm. Also, we present an evolutionary learning method for a combined logistic regression measure that allows to learn an unbalanced dataset. According to the experimentation with a gold standard dataset, our proposed combined measures outperform previous research with a statistical significance to identify pairs of confused drug names. In addition, the rankings of individual and combined similarity measures are presented.
Keywords
Get full access to this article
View all access options for this article.
