Sage Journals: Discover world-class research

Abstract

Currently, the primary challenges in entity relation extraction are the existence of overlapping relations and cascading errors. In addressing these issues, both CasRel and TPLinker have demonstrated their competitiveness. This study aims to explore the application of these two models in the context of entity relation extraction from Chinese medical text. We evaluate the performance of these models using the publicly available dataset CMeIE and further enhance their capabilities through the incorporation of pre-trained models that are tailored to the specific characteristics of the text. The experimental findings demonstrate that the TPLinker model exhibits a heightened and consistent boosting effect compared to CasRel, while also attaining superior performance through the utilization of advanced pre-trained models. Notably, the MacBERT + TPLinker combination emerges as the optimal choice, surpassing the benchmark model by 12.45% and outperforming the leading model ERNIE-Health 3.0 in the CBLUE challenge by 2.31%.

Keywords

Chinese medical text entity relation extraction pre-trained model relation overlapping

Introduction

As one of the key tasks in a knowledge graph, entity relation extraction is of great significance and affects the quality of the knowledge graph. Entity relation extraction aims to identify specific associations between two entities and construct relation triples. For example, according to the disease diagnosis text, we can extract relations between “COVID-19”, “fever” and “cough” as (COVID-19, symptom, fever) and (COVID-19, symptom, cough). At present, the common method is neural network models such as Convolutional Neural Networks (CNN),¹ Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM). These models are more advantageous in terms of autonomous learning features and good adaptability. With the development of natural language processing (NLP) technology, advanced methods such as neural network model and attention mechanism fusion,² tagging framework improvement,³ and triple generation order adjustment⁴ have successively achieved state-of-the-art performance on entity relation extraction.

In the field of medical text, medical entity relation extraction has been gradually taken seriously since the Informatics for Integrating Biology & the Bedside (i2b2) released a relevant task in 2010. However, due to the large differences between Chinese and English, the entities and entity relation types of Chinese texts are more special, so the effect of related techniques in processing Chinese texts may be less effective. Meanwhile, due to the privacy of medical information, the professionalism of the medical field, and the high standards and requirements of medical research, there is still much room for improving the effect of entity relation extraction for Chinese medical texts.

In addition, in the entity relation extraction of Chinese medical texts, there are often problems such as entity nesting and overlapping relations. For example, COVID-19, as shown in the previous example, can be used as the subject of a relation triple corresponding to different objects, thus forming multiple relation triples. At the same time, the traditional entity relation extraction adopts pipeline approaches, divides the entity extraction and the relation extraction into two subtasks, and regards the relation extraction as a classification problem, which leads to cascading errors. Therefore, this paper focuses on the entity relation extraction task of Chinese medical text in Chinese Biomedical Language Understanding Evaluation (CBLUE).⁵ CasRel⁶ and TPLinker⁷ are selected to solve the problems of overlapping relations and cascade error in entity relation extraction, and pre-training models are added to improve the effect of entity relation extraction.

Related works

Pre-trained model

Pre-trained models are generally considered an efficient method to improve NLP tasks,^8–11 especially after BERT was proposed. BERT can be fine-tuned to fit multiple downstream tasks, which accelerates and improves NLP tasks.¹²

It is well known that BERT includes two main tasks, the masked language model (MLM) and next sentence prediction (NSP), which differ from traditional pre-trained models and make BERT a state-of-the-art model. Later, numerous derivative models were proposed to improve these tasks. e.g., the RoBERTa model is an enhanced BERT model with several improvements; it changes static masks to dynamic masks in MLM, removes the NSP task, and adopts a larger level of byte-pair text encoding and other changes in parameters and the volume of training data.¹³ In addition, transforming the MLM task in BERT from the original random masking to the whole word masking also derived the BERT-wwm model and RoBERTa-wwm model. This kind of model is superior to the general BERT model and ERNIE model in tasks such as machine reading comprehension, natural language inference, and sentence pair matching.¹⁴

However, it is still necessary to consider the applicability of different pre-training models in specific fields or tasks. e.g., Hofer et al.¹⁰ analyzed the comparison of five optimization improvement measures of named entity recognition task in the case of a handful of tagged text for special corpora, such as medical texts and electronic medical records, and found that the best improvement effect could be achieved by using pre-training methods similar to the target field.

Joint extraction of entities and relations

Pipelined and joint learning are two main approaches in entity relation extraction. Pipeline extraction identifies named entities in sentences from a given dataset and then learns the relation between entities. The pipeline method is easy to conduct and flexible because entity extraction and relation extraction are operated separately, and datasets can be different. However, it also has some disadvantages. Errors in entity identification will be passed to the relation extraction task, thus affecting the following steps (i.e., the cascading error). Moreover, possible connections and dependencies between two highly decoupled tasks might be ignored. The task of pairing two entities can cause redundant information between two unrelated entities, which raises the calculation complexity and leads to errors.

Joint extraction combines entity and relation extraction to mitigate possible errors and error accumulation in the pipeline method. Some studies also show that joint extraction obtains better performance than pipeline extraction.^15–17 Joint extraction can be divided into multitask learning based on shared parameters and structured prediction based on joint decoding according to the difference in model design.

Multitask Learning Based on Shared Parameters

In this approach, the state of input features and internal hidden layers in entity and relation extraction are joined together by sharing the same coding layer of the model. Another strategy is to strengthen the interaction between two tasks that apply encoders separately and finally obtain the optimal parameters, such as dependency enhancement such as reinforcement learning,¹⁸ and risk minimization.¹⁹ Miwa and Bansal²⁰ and FENG et al.¹⁸ both take a “BiLSTM + Tree-LSTM” structure to achieve joint extraction through parameter sharing, but the latter applies reinforcement learning to relation classification, which leads to better results than the former. Zheng et al.²¹ proposed a BiLSTM-ED method for entity recognition where Bi-LSTM parameters are shared in encoders and achieved superior results compared with Miwa and Bansal²⁰ and other models.

In the joint extraction method of shared parameters, most experiments use the encoding of BiLSTM to achieve parameter sharing and then incorporate models such as the attention mechanism and text span representation to enhance semantic association,^2,3,17,22,23 followed by classification mapping of relations through the identified entities. In terms of the execution order of the two subtasks of entity recognition and relation extraction, there are also some experiments in which relations are extracted first and then mapped to entities through relations^24–26, or the head entity in the relation triad is mapped to relations and tail entities.^6,27,28

Structured prediction based on joint decoding

In the multitask learning method based on shared parameters, entity recognition, and relation extraction are still completed separately in a certain order, which cannot avoid generating redundant information. The joint decoding method will further strengthen the interaction between the two subtask models for global optimization. There are two main solutions: one is to use sequence tagging to obtain relation triples,^3,7 and the other is to use the sequence-to-sequence method to re-encode and decode.^29–32 In the entity relation extraction method of joint decoding, more attention is given to solving the problem of overlapping. According to the different types of overlapping, it can be divided into the following three categories: ①Normal, that is, no overlap; ②EPO (Entity Pair Overlap) or relations overlapping, that is, there are multiple relations in an entity pair; ③SEO (Single Entity Overlap) or entity overlapping, that is, different triples share the head entity or tail entity.

Zeng et al.²⁹ first proposed the classification of overlapping and designed an end-to-end model based on sequence-to-sequence learning of the replication mechanism, which has two ways of one decoder and multiple decoders in the decoding stage. According to the experimental results, regardless of whether one decoder or multiple decoders are used, the F1 value of the model is slightly worse than that of the optimal NovelTagging model ³ in the case of no overlap. However, in the case of EPO and SEO, the one-decoder and multiple-decoder modes are improved by 7.8% and 31.1%, respectively, and the multiple-decoder mode achieves the highest F1 value. Yuan et al.²⁵ proposed the relation-specific attention network (RSAN) model to specifically solve the overlapping problem and had the best performance in solving SEO and EPO types. The F1 value on the EPO type reaches the same performance as the current best in normal cases. In addition, there are other optimization approaches, such as supervised multi-head self-attention mechanism,² position attention mechanism,¹⁶ and hybrid dual pointer network.³³

Methodology

CasRel

CasRel is a general algorithmic framework proposed by the team of Jianlin Su to solve the overlapping problem. Unlike other approaches regarding relations as discrete labels on entity pairs, CasRel models relations as a function of subject mapping to objects. The reason is that if the relations of entity pairs are discrete, the model will obtain unevenly distributed data, and for overlapping labels, if the training data are insufficient, the accuracy rate of the classifier will be greatly reduced.

Relation extraction in CasRel identifies sentence subjects first and then identifies all possible relations and objects for each subject. As shown in Figure 1, the model consists of three modules: BERT-based encoder, subject tagging, and relation-specific object tagging. Subject tagger adopts two binary classifiers to predict the head and tail index of subjects. Classifiers calculate the subject start or end position probability of each word. The corresponding token will be tagged as one when the probability exceeds a given threshold and otherwise as 0, thus predicting all subjects. The relation-specific object tagger takes the encoded vector of the subject into account. Vectors are passed to the process of conditional layer normalization, where the input data with different dimensions of features are normalized. All the relations in the dataset are traversed once to obtain objects corresponding to this subject under each relation.

Figure 1.

The structure of CasRel.

TPLinker

CasRel is an effective solution to the overlapping problem, but exposure bias still exists during training and inference, while TPLinker tackles overlapping and exposure bias based on the Muti-head selection (MHS)¹⁵ mechanism, which appears to be more complete than the former.

Exposure bias is caused by the inconsistency of training input and inference input. Each word input is ground truth in the training process, but the inference input is from the output of the last word depending on the model prediction, which causes errors to accumulate.

The tags in TPLinker include 0, 1, and 2, where one indicates the relation from subject to object and two from object to subject. As shown in Figure 2(a), given a sentence example “小明在北京 (Xiaoming is in Beijing)”, the cross position of the row of “小” and the column of “明” with tag one means that the sequence fragment “小明” is an entity. The cross position of the row “小” and the column “北” has a tag of 1, which means a relation between two entities starting with “小” and “北”, respectively; that is, there is a correlation between “小明” and “北京”. In this way, entity extraction and relation extraction can be performed in the same phase, without the need to decouple the task as in CasRel.

Figure 2.

An example of tagging in TPLinker.

There are a large number of tag 0 in the lower left triangle region due to the logical order of the entity head and tail, which will lead to an enormous waste of memory and increase the calculation amount of the whole model. However, because the order between the object and subject is not certain, deleting the lower triangle directly will also lead to the loss of some relations. In this regard, the TPLinker model proposes a new solution by mapping the tag “1” in the lower left triangle to the upper right triangle and then relabelling it as “2”. The final tag result is shown in Figure 2(b). In the end, the remaining parts of the triangle are flattened into a sequence for calculation.

According to the tagging strategy of TPLinker, the method transforms the task of joint entity relation extraction into a Token Pair Linking (TPLinker) problem, thus proposing three different linking methods, namely, head-to-tail linking of the same entity (EH-TO-ET), head-to-head linking of different entities (SH-TO-OH), and tail-to-tail linking of different entities (ST-TO-OT).¹⁴ All token pairs are enumerated and tagged through matrices for the convenience of obtaining entities. In decoding, for the type EH-TO-ET, all entities can be extracted from the sentence and stored in the dictionary. For each type of relationship, decoding ST-TO-OT and SH-TO-OH yields the tail position set of the subject and object entity pairs and the start position set of the subject and object entity pairs, respectively. Then, TPLinker traverses the entity dictionary using the start position set corresponding to the entity pair and checks the tail position set. If the tail exists, an SPO triple is successfully extracted. In Figure 3, the text “故宫博物院坐落于中国首都北京” is an example showing the Token Pair Linking process of TPLinker.

Figure 3.

An example of a token pair link through TPLinker.

Experiments

Dataset

The dataset used in this study is CMeIE provided by the Chinese Biomedical Language Understanding Evaluation (CBLUE).⁵ CMeIE contains 14,339 pieces of data from the open training set, 3585 from the validation set, and 4482 from the test set, involving a corpus of 518 pediatric diseases and 109 common diseases, all from medical textbooks and medical texts of clinical practice, with a total of approximately 75,000 triples, 28,000 disease statements, and 53 defined schemas. There are 10 types of head entities and 11 types of tail entities in entity categories, one synonym sub-relation, and 43 other subtypes in relationship categories, for a total of 53 schemas.

Figures 4 and 5 show the statistics of the relation categories in the training and validation sets. From the figures, we can see that the distributions of the two datasets are similar, and the most numerous relation categories are “临床表现 (clinical manifestation)”, “药物表现 (drug manifestation)”, “同义词 (synonym)” and “病因 (etiology)”. However, the distribution of relation categories is not balanced. We can see that the most frequent category “临床表现 (clinical manifestation)” is more than twice as frequent as the second most frequent category “药物表现 (drug manifestation)” and 200 times more frequent than minor categories in the training set.

Figure 4.

Relation categories statistics in the training set.

Figure 5.

Relation categories statistics in the validation set.

Evaluation and baseline

The operating system used in our experiments is Linux Spuer-HR 4.15.0-189-generic, Intel(R) Xeon(R) Gold 6139 M CPU @ 2.30 GHz CPU, and NVIDIA GeForce RTX 2080Ti GPU.

The CasRel model in our experiments is adopted from Wei et al.⁶ and implemented in TensorFlow. TPLinker is adopted from the optimized version TPLinker_plus by Wang et al,⁷ implemented in PyTorch. In our experiment, we use exact matches to evaluate whether entity relations are correct. We report three metrics, Precision(P), Recall(R), and F1 value, which are commonly used in NLP tasks.

Both models were previously conducted on two English datasets, NYT and WebNLG, for the entity relation extraction task, and both BERT and BiLSTM encodings were tested.⁷ From the experimental results, the TPLinker model has a 2.3% advantage over the CasRel model when using BERT encoding on the NYT dataset; the two models have the same effect when using BiLSTM encoding. On the WebNLG dataset, the TPLinker model has only a 0.1% advantage over the CasRel model when using BERT coding; the TPLinker model has a 6.8% advantage over the CasRel model when using BiLSTM coding. Overall, the TPLinker model has a slight advantage over the CasRel model in handling the entity relations extraction task for the English dataset, but the advantage is not stable due to the difference in dataset and encoding method.

In this experiment, for Chinese medical text, the CasRel model and TPLinker model are pre-trained in the encoding part using BERT-base-Chinese, and the experimental results are shown in Table 1 below.

Table 1.

Comparison of CasRel and TPLinker with BERT-base, Chinese.

Model	P	R	F1
CasRel	61.27	59.93	60.59
TPLinker	64.73	68.53	66.58

From the experimental results, the precision, recall, and F1 values of the TPLinker model are higher than those of the CasRel model in the face of entity relationship extraction of Chinese medical text. The recall rate of the former exceeds that of the latter by 8.6% and the F1 value by 6.01%, and the advantage of the TPLinker model is more significant.

According to the benchmark model released for this task,¹ as shown in Table 2 below, the best result among the 11 Chinese pre-training models is the RoBERTa-wwm-ext-large model with an F1 value of 55.9. In the official ranking of the CBLUE challenge,² the top model is ERNIE-Health 3.0, with an F1 value of 66.044. In comparison, the TPLinker model in this experiment still outperforms the best result listed in the benchmark model.

Table 2.

Performance of the CBLUE benchmark. * indicates the result quoted from the ranking of the CBLUE challenge. Another 11 results are quoted from the benchmark model on GitHub.

Model	F1
ERNIE-health 3.0*	66.044
BERT-base	54.0
BERT-wwm-ext-base	54.0
ALBERT-tiny	35.9
ALBERT-xxlarge	47.6
RoBERTa-large	54.4
RoBERTa-wwm-ext-base	53.7
RoBERTa-wwm-ext-large	55.9
PCL-MedBERT	49.1
ZEN	50.1
MacBERT-base	53.2
MacBERT-large	51.6

Optimization

From the above analysis results, the CasRel model and the TPLinker model do not differ much in dealing with the entity relations extraction task of the English dataset, and both of them can solve the overlapping relations problem to some extent. However, the TPLinker model has a greater advantage in dealing with Chinese medical texts. Different pre-training models have some differences in encoding methods, which lead to different model effects. In this paper, we chose several better pre-training models in CBLUE (BERT-wwm-ext-base, RoBERTa-wwm-ext-large) as encoding layers for optimization. According to the research suggestion of Hofer et al,¹⁰ we also chose pre-trained models (MC-BERT,³⁴ MedBERT,³⁵ MacBERT³⁶) similar to the Chinese medical text domain to join the comparison. The experimental results are shown in Table 3 below.

Table 3.

Optimization results in different pre-training. + marks results reported by Guan et al.³⁷

Model	P	R	F1
⁺CasRel_BERT	60.61	55.09	57.72
⁺CasRel_BERT-wwm	60.80	55.02	57.76
⁺CasRel_RoBERTa-wwm	60.45	56.57	58.44
TPLinker_BERT	64.73	68.53	66.58
TPLinker_BERT-wwm	64.97	68.05	66.48
TPLinker_RoBERTa-wwm	65.80	70.67	68.15
TPLinker_MacBERT	66.96	69.79	68.35
TPLinker_MC-BERT	64.98	68.05	66.48
TPLinker_Med-BERT	64.90	69.30	67.02

Discussion

Model comparison for joint extraction

Based on the findings presented in Table 3, the experimental results indicate that, for the first six sets of experiments, the TPLinker model consistently exhibits a more stable improvement advantage over the CasRel model when processing the Chinese medical dataset CMeIE, irrespective of the encoding method employed. In terms of encoding methods, the RoBERTa-wwm model demonstrates superior support compared to both BERT and BERT-wwm-ext-base.

The CasRel model employs an entity mapping relation approach to facilitate joint extraction. Within this model, the extraction of relations relies on the correspondence between subject entities and relation-specific object entities. Consequently, if the object entities are either untagged or incorrectly tagged, the extraction of the triple is impeded or may result in erroneous extraction. Conversely, the TPLinker model generates a span matrix by considering token pairs during the tagging phase. After decoding the span matrix, the corresponding triples can be obtained based on the entity tag and relations tag types. Furthermore, entity relations can be acquired through a unified encoding and decoding process. In contrast, the TPLinker model maintains consistency throughout the tag, encoding, and decoding stages, thereby yielding superior outcomes and demonstrating enhanced suitability for Chinese medical text.

Optimization comparison of the pre-training model

During the optimization process of the TPLinker model, it was observed that the pre-trained models MacBERT, RoBERTa-wwm, and MedBERT exhibited superior performance. In comparison to the original model utilizing BERT, these models demonstrated increases in F1 scores by 1.77%, 1.57%, and 0.44% respectively. However, BERT-wwm and MC-BERT experienced slight decreases in performance. Notably, the F1 value achieved by the optimized structure “MacBERT + TPLinker” reached 68.35%, surpassing the F1 score of the top model Ernic-Health 3.0 in the CBLUE challenge by 2.31%. The findings demonstrate that the utilization of pre-trained models trained on a corpus resembling Chinese medical text yields superior model outcomes. However, the magnitude of improvement is contingent upon the source of training data, the volume of training data, and the parameter configurations of the pre-training models.

RoBERTa employs a significantly larger dataset (160G) during the pre-training phase, resulting in an extended training duration. It is worth mentioning that the RoBERTa-wwm variant incorporates a whole word mask mechanism, effectively addressing Chinese semantic complexities and enhancing its applicability in Chinese text-mining endeavors. In contrast, MacBERT is a perpetually refined model built upon the integrated BERT model and its enhanced iterations. It employs a blend of synonymous and random words for masking during the MLM task, thereby mitigating the disparity between the pre-training and fine-tuning stages.³⁶

Both the MC-BERT and MedBERT models are pre-trained models derived from BERT models, specifically designed for the biomedical field. The MC-BERT model is trained using Chinese Biomedical Community Question Answering, Chinese Medical Encyclopedia, and Chinese Electronic Health Records (EHR) as data sources, with a training data volume of approximately 20 M. Moreover, this model incorporates explicit medical knowledge by employing whole entity masking and whole span masking methods.³⁴ The MedBERT model utilizes Cerner Health Facts, a structured electronic health record (EHR) database, as its training data source, which consists of approximately 20 million data points. To pre-train, MedBERT employs the International Classification of Diseases (ICD) encoding and serialized word embedding.³⁵ However, it is worth noting that incorporating explicit medical knowledge as input restricts the model’s learning capacity. In contrast, the adoption of a generalized serialized word embedding learning approach proves more suitable for the practical implementation of medical texts.

Comparison with ChatGPT

The generative artificial intelligence represented by ChatGPT has reshaped natural language processing and demonstrated strong comprehensive performance and generative ability. It can produce coherent, fluid text and is comfortable with questions and answers, writing, conversation responses, and so on.^38–41 In this paper, part of the sample data is randomly selected and input into ChatGPT (the version is GPT 3.5-turbo). The triplet results extracted by ChatGPT are shown in Table 4 below.

Table 4.

The triplet results were extracted by ChatGPT.

Sample data 1	肝硬化@###胸痛可出现于门脉性肺动脉高压。(Hepatic cirrhosis @### pectoralgia can occur in the presence of portal pulmonary hypertension)
ChatGPT results	(肝硬化,临床表现,胸痛) (Hepatic cirrhosis, clinical manifestation, Pectoralgia)
Standard results	(肝硬化,临床表现,胸痛) (Hepatic cirrhosis, clinical manifestation, Pectoralgia)
Overlap or not	no overlap
Sample data 2	稳定型缺血性心脏疾病@胸痛往往由胸膜炎所致, 平卧位加重, 坐起缓解。稳定型缺血性心脏疾病@检查心电图显示弥漫性ST段抬高及PR段压低。(Stable ischemic heart disease @ Pectoralgia is often caused by pleurisy, aggravated in the supine position, and relieved when sitting up. Stable ischemic heart disease @ examination Electrocardiogram shows diffuse ST-segment elevation and PR-segment depression)
ChatGPT results	(稳定型缺血性心脏疾病, 临床表现, 胸痛, 由胸膜炎引起, 平卧位加重, 坐起缓解) (stable ischemic heart disease, clinical manifestation, Pectoralgia, caused by pleurisy, aggravated in the supine position, relieved by sitting up)
ChatGPT results	(稳定型缺血性心脏疾病, 检查, 心电图显示弥漫性ST段抬高及PR段压低) (stable ischemic heart disease, examination, ECG shows diffuse ST-segment elevation and PR-segment depression)
Standard results	(胸膜炎,影像学检查,心电图) (Pleurisy, imaging examination, Electrocardiogram)
	(胸膜炎,临床表现,胸痛) (Pleurisy, clinical manifestations, Pectoralgia)
	(胸膜炎,临床表现,平卧位加重) (Pleurisy, clinical manifestations, aggravated in supine position)
	(胸膜炎,临床表现,坐起缓解) (Pleurisy, clinical manifestations, relieved by sitting up)
Overlap or not	EPO、SEO
Sample data 3	患者常由于吮吸乏力, 气道不畅或容易疲劳导致营养不良和生长障碍脊柱侧弯是SMA最为严重的骨骼畸形, 不能行走的患者会更早地发展为脊柱畸形, 大部分的畸形位于胸腰部。 (Patients often experience malnutrition and growth retardation due to weak sucking, airway obstruction, or easy fatigue. Scoliosis is the most severe skeletal deformity in SMA, and patients who cannot walk will develop spinal deformities earlier, with the majority of deformities located in the thoracolumbar region.)
ChatGPT results	(患者,导致,营养不良和生长障碍) (Patients, resulting in, malnutrition and growth retardation)
	(脊柱侧弯,是,骨骼畸形) (scoliosis, is, skeletal deformity)
	(患者,会更早地发展为,脊柱畸形) (Patients, develop much earlier, spinal deformity)
	(畸形,位于,胸腰部) (malformed, located in, thoracolumbar)
Standard results	(SMA,临床表现,吮吸乏力) (SMA, clinical manifestations, weak sucking)
	(SMA,临床表现,气道不畅) (SMA, clinical manifestations, airway obstruction)
	(SMA,临床表现,容易疲劳) (SMA, clinical manifestations, fatigue)
	(SMA,临床表现,脊柱侧弯) (SMA, clinical manifestations, scoliosis)
	(SMA,临床表现,脊柱畸形) (SMA, clinical manifestations, spinal deformity)
	(脊柱畸形,发病部位,胸腰部) (spinal deformity, located in, thoracolumbar)
	(骨骼畸形,相关༈转化),脊柱畸形) (skeletal deformity, related (transformation), spinal deformity)
	(SMA,相关༈转化),骨骼畸形) (SMA, related (transformation), skeletal deformity)
	(SMA,相关 (导致),营养不良) (SMA, related (resulting in), malnutrition)
	(SMA,相关 (导致),生长障碍) (SMA, related (resulting in), growth retardation)
Overlap or not	EPO、SEO
Sample data 4	狂犬病@可能有蝙蝠咬伤或接触史。(Rabies @ there may be a history of bat bite or contact.)
ChatGPT results	(狂犬病,可能病因,蝙蝠咬伤或接触史) (rabies, possible etiology, bat bite or contact history)
Standard results	(狂犬病,病因,蝙蝠咬伤) (rabies, etiology, bat bite)
Overlap or not	no overlap

We provide ChatGPT with sample data and all available relationship types for learning and then input sample data and request ChatGPT to output triples in (subject, predicate, object) format.⁴² In Table 4, Sample data represents the data input into ChatGPT, ChatGPT results are the triplets extracted by ChatGPT, Standard results are the triplets in the validation dataset, and Overlap or not means the triplet overlap type.

According to the results in Table 4, ChatGPT has the functions of entity identification and triplet extraction. For simple text (e.g. Sample data one and Sample data 4), the extraction results can be close to the verification set results, but they do not meet the strict standards. In the context of text extraction involving compound structure (e.g. Sample data two and Sample data 3), ChatGPT demonstrates an inability to accurately generate triples. Furthermore, it is crucial to acknowledge that despite the progress achieved in the development of ChatGPT, there remains a necessity for comprehensive training and expert guidance to effectively employ the model for unfamiliar tasks. Additionally, it is noteworthy that the output produced by the model may not consistently align with accuracy. The experimental findings of Qin et al.⁴³ further illustrate the formidable obstacles ChatGPT encounters when addressing sequence annotation tasks, including named entity recognition. Our model is primarily designed to tackle this particular problem, focusing on the simultaneous extraction of entity and relation while also considering relation overlap in complexly structured text. As a result, our model offers distinct advantages in addressing this issue.

Limitations

It is important to acknowledge that the findings of this experiment reveal certain limitations in the performance of CasRel and TPLinker. For instance, when considering the CasRel model, it achieved notable improvements of 17.5% and 30.2% on the English datasets NYT and WebNLG, respectively, surpassing the previous best results. However, in the present experiment, the disparity in results between CasRel and the benchmark is merely 4.59%, with CasRel’s performance falling short of the result obtained by the CBLUE challenge ERNIE-Health 3.0. The TPLinker achieved a significant advancement, as evidenced by the superior performance of all optimized pre-training models compared to the result obtained from the ERNIE-Health 3.0 model in the CBLUE challenge. The potential factors contributing to this success are comprehensively examined as follows:

(1) The Chinese medical text dataset contains a substantial quantity of lengthy text and cross-sentence SPO triples, with an uneven distribution of triples across different relation categories. The dataset exhibits a prominent long-tail problem, as the “clinical manifestations” relation category is frequently represented, whereas the “pathogenesis” category is less prevalent.

(2) The absence of any subsequent fine-tuning for the pre-trained model is a significant limitation. Fine-tuning plays a crucial role in enhancing the performance of pre-trained models, particularly in the realm of natural language processing tasks. By allowing pre-trained models to adapt to specific downstream tasks, fine-tuning not only enhances their performance and efficiency in processing new data and tasks but also capitalizes on their pre-existing knowledge and representations, thereby mitigating the necessity for extensive training on task-specific data.^43–46

Conclusion

This study centers on the extraction of entity relations in Chinese medical text. To tackle the issues of overlapping relations and cascading errors, we employ CasRel and TPLinker to compare their performance in Chinese medical text and determine the most suitable pre-trained model for optimization. The findings indicate that TPLinker exhibits a more consistent improvement compared to CasRel and achieves superior results when supported by an advanced pre-trained model. The optimal model, MacBERT + TPLinker, surpasses the top-performing model, Ernic-Health 3.0, in the CBLUE challenge by 2.31%, and exceeding the benchmark model by 12.45%. Additionally, we conducted supplementary experiments using ChatGPT, which revealed its capability to generate triples. However, when applied to Chinese medical texts characterized by intricate structures, ChatGPT exhibited limitations in generating logical triples and addressing the issue of relationship overlap.

Footnotes

Acknowledgements

Our work is supported by the Basic Project of the Science and Technology Program of Guangzhou, China (NO.202002020036). The results disclosed in this work are obtained using the high-performance computing infrastructure and resources of Sun Yat-Sen University and the Reporting System of China Chest Pain Center Certification (Website:).

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Our work is supported by the Basic Project of the Science and Technology Program of Guangzhou, China (NO.202002020036)

Ethical Statement

ORCID iDs

Hongyu Chen

Yueyun Zhang

Notes

References

Collobert

Weston

Bottou

, et al. Natural language processing (almost) from scratch. JMach Learn Res 2011; 12: 2493–2537.

Liu

Chen

Wang

, et al. Attention as relation: learning supervised multi-head self-attention for relation extraction. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. California: International Joint Conferences on Artificial Intelligence Organization, 2020, pp. 3787–3793.

Zheng

Wang

Bao

, et al. Joint extraction of entities and relations based on a novel tagging scheme. In:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, 2017, pp. 1227–1236.

Sui

Chen

Liu

, et al. Joint entity and relation extraction with set prediction networks. CoRR 2020: 01675, abs/2011.

Zhang

Chen

, et al. CBLUE: a Chinese biomedical language understanding evaluation benchmark. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, 2022, pp. 7888–7915.

Wei

Wang

, et al. A novel cascade binary tagging framework for relational triple extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: Association for Computational Linguistics, 2020, pp. 1476–1488.

Wang

Zhang

, et al. TPLinker: single-stage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain: International Committee on Computational Linguistics, 2020, pp. 1572–1582.

Howard

Ruder

. Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018, pp. 328–339.

Dai

. Semi-supervised sequence learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. NIPS’15. Cambridge, MA, USA: MIT Press, 2015, pp. 3079–3087.

10.

Hofer

Kormilitzin

Goldberg

, et al. Few-shot learning for named entity recognition in medical text. CoRR 2018: 05468, abs/1811.

11.

Wang

Guo

, et al. Pre-training text representations as meta learning. CoRR 2020: 05568, abs/2004.

12.

Devlin

Chang

Lee

, et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp. 4171–4186.

13.

Liu

Ott

Goyal

, et al. RoBERTa: a robustly optimized BERT pre-training approach. CoRR 2019: 11692, abs/1907.

14.

Cui

Che

Liu

, et al. Pre-training with whole word masking for Chinese BERT. IEEE/ACM Trans Audio Speech Lang Process 2021; 29: 3504–3514.

15.

. Incremental joint extraction of entity mentions and relations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Baltimore, Maryland: Association for Computational Linguistics, 2014, pp. 402–412.

16.

Dai

Xiao

Lyu

, et al. Joint extraction of entities and overlapping relations using position-attentive sequence labeling. Proc AAAI Conf Artif Intell 2019; 33(01): 6300–6308.

17.

Tan

Zhao

Wang

, et al. Jointly extracting multiple triplets with multilayer translation constraints. Proc AAAI Conf Artif Intell 2019; 33(01): 7080–7087.

18.

Feng

Zhang

Hao

, et al. Joint extraction of entities and relations using reinforcement learning and deep learning. Intell Neuroscience 2017; 2017: 7643065.

19.

Sun

Lan

, et al. Extracting entities and relations with joint minimum risk training. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 2256–2265.

20.

Miwa

Bansal

. End-to-end relation extraction using LSTMs on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, 2016, pp. 1105–1116.

21.

Zheng

Hao

, et al. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing 2017; 257: 59–66.

22.

Luo

Yang

Cao

, et al. A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature. J Biomed Inform 2020; 103: 103384.

23.

Luan

Ostendorf

, et al. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 3219–3232.

24.

Zhou

Zheng

, et al. Joint extraction of multiple relations and entities by using a hybrid neural network. In: Sun

Wang

Chang

, et al (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Cham: Springer International Publishing, 2017, pp. 135–146.

25.

Yuan

Zhou

Pan

, et al. A relation-specific attention network for joint entity and relation extraction. In: Bessiere

(ed). In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. International Joint Conferences on Artificial Intelligence Organization, 2021, pp. 4054–4060.

26.

Wang

. Two are better than one: joint entity and relation extraction with table-sequence encoders. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: Association for Computational Linguistics, 2020, pp. 1706–1721.

27.

Bekoulis

Deleu

Demeester

, et al. Adversarial training for multi-context joint entity and relation extraction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 2830–2836.

28.

Bekoulis

Deleu

Demeester

, et al. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Syst Appl 2018; 114: 34–45.

29.

Zeng

, et al. Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018, pp. 506–514.

30.

Zeng

, et al. Learning the extraction order of multiple relational facts in a sentence with reinforcement learning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, 2019, pp. 367–377.

31.

Zeng

Zhang

Liu

. CopyMTL: copy mechanism for joint extraction of entities and relations with multi-task learning. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020. New York, NY, USA: AAAI Press, 2020, pp. 9507–9514.

32.

Nayak

. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. Proc AAAI Conf Artif Intell 2020; 34(05): 8528–8535.

33.

Pang

Liu

, et al. A deep neural network model for joint entity and relation extraction. IEEE Access 2019; 7: 179143–179150.

34.

Zhang

Jia

Yin

, et al. Conceptualized representation learning for Chinese biomedical text mining. CoRR 2020: 10813, abs/2008.

35.

Rasmy

Xiang

Xie

, et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Digit Med 2021; 4(1): 86–113.

36.

Cui

Che

Liu

, et al. Revisiting pre-trained models for Chinese natural language processing. In:Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: Association for Computational Linguistics, 2020, pp. 57–668.

37.

Guan

Zan

Zhou

, et al. CMeIE: construction and evaluation of Chinese medical information extraction dataset. In: Zhu

Zhang

Hong

, et al. (eds). Natural Language Processing and Chinese Computing. Cham: Springer International Publishing, 2020, pp. 270–282.

38.

Guo

Chen

, et al. AIGC Challenges and opportunities related to Public Safety: a case study of CHATGPT. Journal of Safety Science and Resilience 2023; 4(4): 329–339.

39.

Liu

Wang

Liu

. Utility of ChatGPT in clinical practice. J Med Internet Res 2023; 25: e48568.

40.

Liu

, et al. A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J Autom Sinica 2023; 10(5): 1122–1136.

41.

Kim

Chua

Rickard

, et al. ChatGPT and large language model (LLM) chatbots: the current state of acceptability and a proposal for guidelines on utilization in academic medicine. J Pediatr Urol 2023; 19(5): 598–604.

42.

Qin

Zhang

, et al. Is ChatGPT a general-purpose natural language processing task solver? Ithaca, New York: arXiv (Cornell University), 2023.

43.

Chaudhari

Yang

, et al. Rethinking the hyperparameters for fine-tuning. Ithaca, New York: arXiv (Cornell University), 2020.

44.

Zhao

Zhang

, et al. Hybrid fine-tuning strategy for few-shot classification. Comput Intell Neurosci 2022; 2022: 9620755.

45.

Zhang

Katiyar

, et al. Revisiting few-sample BERT fine-tuning. Ithaca, New York: arXiv (Cornell University), 2020.

46.

Antonello

Beckage

Turek

, et al. Selecting informative contexts improves language model fine-tuning. Ithaca, New York: arXiv (Cornell University), 2020.

Research on entity relation extraction for Chinese medical text

Abstract

Keywords

Introduction

Related works

Pre-trained model

Joint extraction of entities and relations

Multitask Learning Based on Shared Parameters

Structured prediction based on joint decoding

Methodology

CasRel

TPLinker

Experiments

Dataset

Evaluation and baseline

Optimization

Discussion

Model comparison for joint extraction

Optimization comparison of the pre-training model

Comparison with ChatGPT

Limitations

Conclusion

Footnotes

Acknowledgements

Declaration of conflicting interests

Funding

Ethical Statement

ORCID iDs

Notes

References