Abstract
Aiming at the problem of insufficient labeled data in the medical field, a Named Entity Recognition model that introduces counterfactual mechanism to enhance vocabulary is proposed in this article. With the idea of Semi-Supervised Learning, the model aims at a small amount of labeled data, builds a counterfactual vocabulary generator that captures more dependencies to enhance medical data by introducing and improving the counterfactual mechanism in the structural causal model. Further more, a vocabulary information fusion recognizer is constructed to verify the effectiveness of the data. The recognizer integrates character feature embedding, vocabulary information feature embedding in training data, and position feature embedding. While achieving medical vocabulary enhancement, it also solves the problem of inaccurate entity recognition and improves the accuracy of entity recognition. Through comparative and ablation experiments, it is shown that the named entity recognition model with counterfactual mechanism in this paper achieved an F1 score of 84.67% and 86.15% on the CCKS2019 and CCKS2020 datasets, respectively, which were 0.22%–4.57% and 0.315–5.04% higher than other related models, and 3.82% and 3.86% higher than traditional counterfactual generators, respectively, proving the effectiveness of the model.
Get full access to this article
View all access options for this article.
