Characterizing pituitary adenomas in clinical notes: Corpus construction and its application in LLMs

Abstract

Objective: Faced with the challenges of differential diagnosis caused by the complex clinical manifestations and high pathological heterogeneity of pituitary adenomas, this study aims to construct a high-quality annotated corpus to characterize pituitary adenomas in clinical notes containing rich diagnosis and treatment information. Methods: A dataset from a pituitary adenomas neurosurgery treatment center of a tertiary first-class hospital in China was retrospectively collected. A semi-automatic corpus construction framework was designed. A total of 2000 documents containing 9430 sentences and 524,232 words were annotated, and the text corpus of pituitary adenomas (TCPA) was constructed and analyzed. Its potential application in large language models (LLMs) was explored through fine-tuning and prompting experiments. Results: TCPA had 4782 medical entities and 28,998 tokens, achieving good quality with the inter-annotator agreement value of 0.862–0.986. The LLMs experiments showed that TCPA can be used to automatically identify clinical information from free texts, and introducing instances with clinical characteristics can effectively reduce the need for training data, thereby reducing labor costs. Conclusion: This study characterized pituitary adenomas in clinical notes, and the proposed method were able to serve as references for relevant research in medical natural language scenarios with highly specialized language structure and terminology.

Keywords

corpus construction fine-tuning large language models pituitary adenomas text annotation

Introduction

Pituitary adenomas are one of the most common intracranial tumors, occurring in the intracranial human endocrine center, the pituitary gland, with a high prevalence reported up to 20%.¹ In the general population, approximately 1 in 1100 individuals suffers from clinically evident pituitary adenomas.² Moreover, they mostly occur in young and middle-aged adults, and seriously damage patient’s growth, development, labor capacity, fertility, etc., resulting physical, mental, economic and social burden.

Clinically, the comprehensive assessment of patients with pituitary adenoma faces the challenges of complex clinical manifestations and high pathological heterogeneity, limiting the health management, diagnosis and treatment plan recommendations, and recurrence risk prediction of such patients. Clinical notes contain rich diagnostic and treatment information that can be used to characterize pituitary adenomas.¹ However, since these notes are recorded in natural language, useful sematic information and features needs to be identified and annotated from the free-text narratives as machine-readable and computable structured knowledge of characterization.^3–5

Several excellent corpora have been present based on specialized diseases with different application objectives. To evaluate future machine learning (ML)-based NLP methods, Chaturvedi et al.⁶ manually developed a pain-related corpus based on mental health electronic health records (EHRs). Given the urgent need for scientific and knowledge on rare diseases, Martínez-deMiguel et al.⁷ developed a gold standard corpus through a dictionary-based approach for extracting medical information. Additionally, facing the growing demand for non-English clinical corpora, Frei et al.⁸ created a small annotated text dataset to train a medical named entity recognition (NER) model on German text, Oliveira et al.⁹ developed a semantically annotated corpus for Portuguese clinical NLP tasks, and Cai et al.¹⁰ annotated the clinical diagnostic criteria for premature ovarian decline based on Chinese medical records. Nevertheless, existing annotated corpora are still mainly in English.

Corpus annotation is a labor-intensive task, especially for purely manual annotation. ML techniques facilitate automated processing of texts, and a series of excellent models and methods have emerged. Among them, active learning^11,12 has been shown to be effective in reducing the cost of annotation compared to the typical passive framework based on random selection of samples. Recently, pre-trained language models have greatly improved the performance of NLP, but they still need to be combined with domain knowledge to support the underlying understanding and reasoning of specific tasks.¹³ To demonstrate the use of case reports for data construction and pandemic surveillance, Raza et al.¹⁴ created a gold standard annotation dataset using limited COVID-19 case reports to train BERT during the few-shot learning process. In a study on the annotation of a large multi-cancer genomic dataset, Kehl et al.¹⁵ trained deep NLP models to extract clinical outcomes based on limited manual medical record annotations derived from separate retrospective cohort studies.

To improve the efficiency and reduce the labor and time cost of annotation, this study presents a semi-automatically clinical text corpus construction framework with a combination of ML and NLP approaches, and characterizes pituitary adenomas by annotating sematic information and constructing a clinical texts corpus. Furthermore, experiments on emerging advanced large language models (LLMs)^16–18 demonstrate the practicality of pituitary adenoma characterization and its value for medical artificial intelligence (AI). To the best of our knowledge, this is the first clinical text corpus study dedicated to characterizing pituitary adenomas.

The main contributions of this work are as follows: (1) a semi-automatic framework that combines ML and NLP methods was designed to build a corpus; (2) a novel clinical text annotation guideline was developed, according to which the text corpus of pituitary adenomas (TCPA) was constructed, and common clinical information related to pituitary adenomas were identified; and (3) the fine-tuning and prompting experiments were conducted, and the practical applicability of this study in LLMs era was verified.

Material and methods

Dataset

We retrospectively collected the clinical notes of 500 inpatients, which are randomly selected from a pituitary adenomas neurosurgery treatment center of a tertiary first-class hospital in China. To reduce the redundant information of various notes and fully show the personalized medical characteristics of patients, four fine-grained text types are focused through the analysis of different components of clinical notes (detailed in Appendix 1). Thus, 2000 text documents are contained in the dataset of this study. The data distribution is given in Table 1.

Table 1.

Statistics of data distribution.

Texts type	Word count	Sentence count
CH	308,691	4885
PH	59,665	1453
CC	137,413	2534
FH	18,463	558
Total	524,232	9430

CH: current medical history; PH: past history; CC: case characteristics; FH: family history.

Methods

To characterize pituitary adenomas in clinical notes, TCPA is constructed following a semi-automatic framework based on machine-assisted manual annotation. Then, fine-tuning and prompting experiments are conducted to demonstrate the practical applicability of this study in the LLMs era. The workflow of our approach is shown in Figure 1.

Figure 1.

Workflow overview of our approach. CCNLP: Chinese clinical natural language processing; CNPL: clinical natural language processing; TCPA: text corpus of pituitary adenomas; LLMs: large language models.

Specifically, the annotated corpus construction framework of TCPA is illustrated in Figure 2, consisting the following steps: (1) guideline development, (2) unannotated texts recommendation, and (3) corpus construction. An initial guideline is drafted based on the discussion of domain experts, including the clinicians and information scientists, as well as referring to the relevant annotation specifications. In view of the special language expression of clinical notes, several typical medical records are selected, and the annotation specifications are discussed and revised by analyzing the language and structural characteristics of notes. The annotators are trained on the annotation guidelines to manually pre-annotate part of the texts until the consistency satisfies the stability requirements. To improve the efficiency of annotation, the active learning method is used to query the most valuable texts from the clinical texts to be annotated, and the top N texts are recommended for machine-assisted annotation. Then, the ML algorithm is used to automatically annotate the recommended texts. Thus, annotators can easily make additions, deletions and modifications based on the automatic annotation results. Different annotators may have different understandings of the annotation contents, so the multi-round annotation mode is adopted to ensure the corpus quality. Besides, a senior clinician is added as reviewer to unify the annotation results being aggregated into the final corpus.

Figure 2.

Diagram illustrating the corpus construction framework. CCNLP: Chinese clinical natural language processing.

Annotation guideline

The annotation guideline is the basis for guiding the corpus construction. 2010 i2b2/VA has defined three categories of medical concept for general medical problem,¹³ that is, medical problem, treatment, test, and six categories of concept assertions, that is, present, absent, possible, conditional, hypothetical, and not associated with the patient. To enrich the semantic information of pituitary adenomas, concepts and their assertions of interest in this study are defined in Table 2.

Table 2.

Concepts and their assertions of interest in this study.

Concept	Assertion	Example (medical concepts are in bold font)
Disease	Positive	Hypertension nephropathy for 8 years
Disease	Negative	Denial of coronary heart disease
Symptom	Positive	Recurring nausea
	Alleviated	Snoring improved after surgery
	Negative	No dizziness
Body	Positive	A small black shadow appears in the left eye
Drug	Positive	Oral levothyroxine tablets 100 ug daily
Drug	Negative	Denial of oral prednisone
Surgery	Positive	Proceed endoscopic transnasal transsphenoidal tumor resection
Disease course	Positive	Elderly female with chronic disease course
Family history	Positive	His father was suffering from hyperlipidemia

The differences of the annotation guidelines between 2010 i2b2/VA and our work include: (1) the concept categories of medical problem and treatment defined in 2010 i2b2/VA are fine-grained into disease, symptom, drug, and surgery, (2) since the examination and test information have been structured and stored in the dedicated structured databases, concept test is not annotated rationally, (3) three concept categories are added to improve the refinement of annotation, that is, body, disease course, and family history, and (4) the assertion annotation guideline of 2010 i2b2/VA¹⁴ is simplified from a practical perspective as following: positive concepts in clinical texts need to be identified; negative disease, symptom and drug providing auxiliary reference for clinical diagnosis and treatment^19,20 need to be analyzed; and especially, alleviated symptom is valuable to complete clues for disease progression.

Annotation platform

To support the annotation and analysis of clinical information, we developed a web-based platform named Chinese clinical natural language processing (CCNLP).²¹ CCNLP provides online and incremental learning, and the active learning algorithm used in Step 2 (detailed in Appendix 2) and automatic text annotation algorithm used in Step 3 (detailed in Appendix 3) are both integrated to facilitate the visualization of man-machine interaction. Figure 3 gives a screenshot of the CCNLP interface.

Figure 3.

Screenshot of the configuration interface for the automatic machine-assisted annotation in CCNLP. The English explanations corresponding to the functions displayed on the platform interface are provided in orange font.

In CCNLP, the personalization of medical concepts and their assertions is supported. It also provides convenient annotation operations and user-friendly keyboard shortcuts for quick annotation. Figure 4 shows an example of the annotation of Chinese clinical notes using CCNLP.

Figure 4.

An annotation example of Chinese clinical notes in CCNLP. Different medical concepts are distinguished by different colors. CCNLP: Chinese clinical natural language processing.

Corpus application

In the field of clinical NLP, the fine-tuning of LLMs for specific tasks such as clinical NER^22–24 is a pivotal area of research. A novel approach in this domain involves the integration of human-annotated data with Low-Rank Adaptation (LoRA)²⁵ to enhance the model’s performance in identifying and classifying named entities within text. Our approach takes advantage of TCPA, which serves as a gold standard, providing the model with precise examples of named entities to learn from. Subsequently, the LoRA model, which introduces low-rank matrices into the pre-trained LLM’s architecture, is employed to fine-tune the model. Unlike traditional fine-tuning that adjusts all parameters and risks overfitting, LoRA strategically modifies only a small subset of the model’s weights, preserving the general knowledge while adapting to the nuances of the NER task. This approach not only enhances the model’s ability to generalize across different contexts but also efficiently leverages the human-annotated data, resulting in a more accurate and robust system for NER.

Since the study data is in Chinese, here we use the Chinese-friendly ChatGLM3-6B²⁶ as the base model of LLM. The experimental environment and parameter settings are listed in Appendix 4.

In addition, prompting is an important method for embedding LLMs with clinical information extraction pipeline. But the prompt designs are generally hard in clinical domain in terms of the LLMs lack of medical knowledge. In our dataset annotation work, we not only conducted the high quality annotated dataset, but also provide the prompting instances recommended by clinical experts (see Appendix 5).

As shown in Figure 1, we randomly selected samples from the corpus, analyzed the performance of the fine-tuned model with different proportions of corpus participation, and compared the differences between fine-tuned LLMs prompting without instances and with instances.

Statistical analysis

All statistical analyses in this study were performed in Python (version 3.10.13) with the following libraries: NumPy (version 1.23.4) for numerical calculations, pandas (version 2.2.2) for data processing, and SciPy (version 1.14.0) for scientific computing. For the deep learning framework, PyTorch (version 2.1.0) was chosen with the Hugging Face’s transformers library (version 4.43.4). In addition, the following libraries were also used: datasets (version 2.20.0) for processing and loading datasets, DeepSpeed (version 0.14.0) for accelerating model training, and flash-attn (version 2.6.3) for optimizing the computational efficiency of the attention mechanism. The experimental environment and parameter settings are available in Appendix 4.

Results

Annotation consistency

The quality of annotated corpus is related to the efficiency and depth of clinical information application. To ensure the annotation quality, the principle of multi-person annotation is strictly followed, and the annotation consistency is evaluated by calculating the inter-annotator agreement (IAA) value using the measure of F1-sorce.^27,28

Denote $A_{i}$ as the number of tokens annotated by the annotator $i$ , and $m$ as the number of the consistent annotation tokens. In this study, two researchers annotate the notes through multiple rounds of annotation. Then, F1-sorce can be formulated as

F 1 = (1 + β^{2}) * m / (β^{2} * A_{1} + A_{2})

(3)

where

β = 1

The IAA calculation results are shown in Table 3. It can be seen that after three rounds of training, the annotation consistency tends to be stable, and the most IAA values of the medical concepts exceed 0.90 in the corpus construction stage. Disease course usually has a relatively clear text description in clinical notes, and has the highest annotation consistency of 0.986. Although the IAA values for the disease concept and symptom concept were lower, the IAA value for the alleviated symptom concept with the least agreement still reached 0.862, not to mention that symptom is expressed in various ways and it is the most complex concept in free-text clinical notes. The annotation result can be regarded as reliable when the IAA value exceeds 0.80,^28,29 so the corpus is of good quality.

Table 3.

IAA value for the corpus construction.

Concept	Assertion	IAA
Concept	Assertion	1st round	2nd round	3rd round	Corpus construction
Disease	Positive	0.839	0.908	0.917	0.891
Disease	Negative	0.826	0.901	0.904	0.883
Symptom	Positive	0.803	0.889	0.891	0.876
	Alleviated	0.784	0.874	0.879	0.862
	Negative	0.862	0.927	0.933	0.903
Body	Positive	0.843	0.923	0.928	0.902
Drug	Positive	0.893	0.961	0.962	0.947
Drug	Negative	0.877	0.943	0.958	0.931
Surgery	Positive	0.867	0.936	0.943	0.928
Disease course	Positive	0.932	0.982	0.988	0.986
Family history	Positive	0.906	0.974	0.976	0.969

Corpus statistics

TCPA contains a total of 4782 entities and 28,998 tokens. The distribution of medical concepts is shown in Table 4. The data query and visualization interfaces are available online.³⁰

Table 4.

Concept distribution in our annotated clinical texts.

Concept	Assertion	Entities	Tokens	% in all entities	% in all tokens
Disease	Positive	917	3111	19.18	10.73
Disease	Negative	47	3218	0.98	11.10
Symptom	Positive	1692	8121	35.38	28.01
	Alleviated	178	311	3.72	1.07
	Negative	779	9736	16.29	33.57
Body	Positive	334	1939	6.98	6.69
Drug	Positive	364	1103	7.61	3.80
Drug	Negative	21	41	0.44	0.14
Surgery	Positive	369	707	7.72	2.44
Disease course	Positive	14	467	0.29	1.61
Family history	Positive	67	244	1.40	0.84
Total		4782	28,998	100	100

Table 5 presents the top-10 pituitary adenomas concept tokens. TCPA shows typical characteristics of pituitary adenomas. The diseases in the corpus are mostly pituitary adenomas-related and sellar-related diseases. The main symptoms are head discomfort, abnormal changes in vision and weight. Symptoms describing vision are very diverse, such as visual field defects, blurred vision, etc. Therefore, there are more body entities related to vision, such as bilateral, temporal, left eye, and right eye. The drugs that often appear in the corpus mainly include Shanlong, Euthyrox, Metformin, Baixintong, Bromocriptine and Aspirin, etc. Shanlong is mainly used for acromegaly, which is highly correlated with pituitary adenomas. Other drugs are very related to common symptoms and diseases, and are mainly used for the regulation of hormones and the treatment of related symptoms such as headaches. The surgery concepts documented cesarean delivery of many female patients, as well as pituitary adenoma surgery of other patients. From the perspective of disease course concepts, most of the texts recorded are chronic disease courses and less acute disease courses, which are closely related to the characteristics of pituitary adenomas. The top three family history concepts are hypertension, diabetes mellitus and coronary heart disease, which are also more routinely consulted for the patients of pituitary adenomas. Moreover, there are some overlapping concepts for different concept assertions, such as headache, dizziness, and vision loss in the symptom concept, hypertension, diabetes, and hyperlipidemia in the disease concept, and Aspirin, Euthyrox, and Baixintong in the drug concept.

Table 5.

Top-10 tokens for each fine-grained medical concept.

Concept	Assertion	Top-10 tokens
Disease	Positive	Sellar mass, pituitary tumor, pituitary adenoma, hypertension, pituitary macroadenoma, pituitary microadenoma, diabetes mellitus, pituitary mass, history of hypertension, cushing’s disease
Disease	Negative	Typhoid fever, malaria, tuberculosis, hepatitis, coronary heart disease, diabetes, hypertension, hyperlipidemia, liver cirrhosis, liver and kidney disease
Symptom	Positive	Headache, decreased vision, facial changes, dizziness, weight gain, blurred vision, weakness, enlarged hands and feet, visual field defects, snoring
	Alleviated	Mental, sleep, body weight, lactation, visual field, headache, purpura, visual acuity, central obesity, visual field defect
	Negative	Headache, dizziness, fatigue, weight loss, acne, menstrual recovery, vision recovery, round face, snoring, edema
Body	Positive	Bilateral, both eyes, left eye, lower limbs, right eye, temporal side, abdomen, face, double temporal side, right side
Drug	Positive	Shanlong, euthyrox, metformin, baixintong, bromocriptine, aspirin, nifedipine, betaloc, insulin, irbesartan
Drug	Negative	Aspirin, bromocriptine, euthyrox, shanlong, potassium chloride, saizhi, baixintong, furosemide, prednisone, insulin
Surgery	Positive	Cesarean section, appendectomy, cesarean section, hysterectomy, cholecystectomy, cesarean section, myomectomy, gamma knife treatment, induced abortion, pituitary tumor surgery
Disease course	Positive	Chronic course, insidious onset, insidious onset, acute course, subacute course, chronic onset, acute onset, course of disease 5 years, course of disease more than 20 years, course of disease 2 years
Family history	Positive	Hypertension, diabetes, coronary heart disease, liver cancer, lung cancer, heart disease, gastric cancer, cerebral infarction, hyperlipidemia, pituitary tumor

LLMs experiments

In clinical information extraction, the human expert annotated corpus are crucial in training a robust deep learning model for related information extraction. In LLMs experiments, we conduct the evaluation on how the annotated corpus and prompts benefit the clinical NLP task in the LLMs era. The experimental results are shown in Figure 5.

Figure 5.

Performance changes of LLMs fine-tuned with different percentages of the corpus. LLMs: larger language models.

The experimental results show that when TCPA is not used for fine-tuning, that is, when 0% of the corpus is used, the F1 score is 0 regardless of whether it is prompting with instances or without instances. This indicates that it is difficult for a general LLM to understand the semantics of clinical concept entities without fine-tuning with domain data.

Considering semantics of entities are hard for LLMs to understand in this task, TCPA was used and the F1 score increased to 0.8179 (orange dotted line in Figure 5). Thus, it is verified that TCPA can significantly improve the performance of LLMs in understanding and processing pituitary adenoma-related texts, which is of great significance for clinical natural language scenarios with highly specialized language structures and terminology.

Further prompting engineering experiment show that instances provided by domain experts are beneficial to performance improvement (solid line in Figure 5). The LLM fine-tuned on 20% of the annotated data and prompted with instances provided by experts can achieve the same effect as that fine-tuned on 45% of the dataset and prompted without instances. Thus, the same performance can be achieved with less annotated data, thereby reducing labor costs.

Discussion

Clinical findings

The pituitary gland is the most important endocrine gland in the body, locating at the base of the brain, and pituitary adenomas may cause significant morbidity or mortality.^1,2 Early diagnosis, timely treatment and effective prognosis can significantly improve the life and health of patients with pituitary adenomas. However, the characteristics of pituitary adenomas are only partly understood.³¹ The Pituitary Society recently recommended the integration of various types of information on pituitary-related tumors, including clinical information, bioinformatics, etc. at the international Pituitary Neoplasm Nomenclature (PANOMEN) workshop.³²

This study confirmed our hypothesis that rich information about pituitary adenomas is available in clinical notes and can be standardized and automated for annotation and identification based on NLP and ML. The most common disease, symptom, body, drug, surgery, disease course and family history of pituitary adenomas were identified and analyzed. The study found that visual field-related symptoms are representative in the pituitary adenomas notes, and most of body concepts are related to vision, such as binocular, bilateral, temporal, etc. These observations are consistent with previous research findings that visual field defects are typical symptoms of pituitary adenoma.^33,34 Additionally, fine-grained annotated data such as unilateral or bilateral and temporal visual field defects in this study have reference value and positive effects in assisting clinical diagnosis and treatment.

The annotated corpus constructed in this study initiatively provides clinical characteristics knowledge of pituitary adenomas, which is quality-assured and can serve as a reliable source for future integration of multi-source and multi-type information resources. In addition, the characteristics identified are computer-readable and computable, which accelerates the future development and application with large-scale biomedical knowledgebase such as gene ontology (GO)³⁵ and DrugBank.³⁶

Technical significance

Annotation of clinical data is a labor-intensive task, and how to reduce the labor costs and improve the efficiency is being studied. The semi-automatic corpus construction framework proposed is generally applicable to medical natural language scenarios. With the growing demand for non-English clinical corpora,^8–10 the annotation guideline jointly discussed and revised by domain experts can provide a reference for tasks such as annotation, information extraction, data integration, and knowledge fusion of pituitary adenoma-related texts in other languages. The characterization idea is applicable to the related research of other diseases. Additionally, the recommendation and annotation models have been integrated into the CCNLP platform accessible online.

The constructed corpus not only achieves the characterization of pituitary adenomas, but also facilitates secondary utilization for clinical NLP due to its good computer readability and computability. LLMs have demonstrated powerful learning and reasoning capabilities in general fields. But for clinical natural language scenarios where language structure and terminology are highly specialized, its performance is worrisome and even faces hallucination challenges.^37,38 Some studies^39–41 have raised concerns that the clinical NLP performance of LLMs may not be as good as existing deep learning models. Consistent with previous studies, our study shows that the performance of LLMs without fine-tuning is indeed unsatisfactory. In the meanwhile, an impressive finding is that after fine-tuning with TCPA, the performance has been greatly improved, which suggests that the disease characteristic knowledge annotated in the domain corpus is conducive to enhance the clinical utility effect of LLMs.

Limitations and future work

Although we have characterized pituitary adenomas and applied them on LLMs, our study was limited to Chinese clinical notes, and the study cohort was not fully representative of the patient population with pituitary adenomas. For future work, with the development of deep neural networks and generative language models, the constructed text corpus would be combined with larger and more diverse data such as laboratory examinations and tests for the joint multimodal clinical data analysis and mining. The integrated application of the identified clinical knowledge with large-scale biomedical ontologies is also our next research focus. In addition, this study preliminarily explored the feasibility of using clinical knowledge of pituitary adenomas to guide LLMs for automated information extraction. In view of the positive experimental results, the use of LLMs for the construction of a larger medical corpus is also the next direction of effort to accelerate the secondary use of clinical NLP and develop the huge potential of AI technology in clinical auxiliary diagnosis and treatment.

Conclusions

In this paper, we constructed an annotated corpus and characterized pituitary adenomas in clinical notes. To improve the efficiency of text annotation, we proposed a corpus construction framework and developed CCNLP platform with integration of the text recommendation algorithm and the automatic annotation algorithm. An annotation guideline defining medical concepts and their assertions of clinical notes was developed, and fine-grained text types were identified to be annotated with the comprehensive consideration of the language, structure and content characteristics of clinical texts. Based on the method proposed in this paper, an annotated clinical text corpus of pituitary adenomas has been constructed. The rich clinical information about disease, symptom, body, drug, surgery, disease course and family history of pituitary adenomas were identified and analyzed. Additionally, this study applied the constructed TCPA to fine-tune LLMs and manifested the benefit of internal domain knowledge to LLMs in medical applications.

The constructed corpus has good machine-readable and computability, and can be further used in AI-aided medical applications such as multi-dimensional risk assessment, postoperative recurrence prediction, and patient health management. The corpus construction framework proposed in our work is general for clinical free-texts and can provide a reference for the characterization of other diseases.

Supplemental Material

Supplemental Material - Characterizing pituitary adenomas in clinical notes: Corpus construction and its application in LLMs

Supplemental Material for Characterizing pituitary adenomas in clinical notes: Corpus construction and its application in LLMs by Jiahui Hu, Jin Fu, Wanqing Zhao, Pei Lou, Ming Feng, Huiling Ren, Shanshan Feng, Yansheng Li and An Fang in Health Informatics Journal.

Footnotes

Acknowledgements

The author expressed gratitude to the relevant contributors.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research was supported by the CAMS Innovation Fund for Medical Sciences (CIFMS) (Grant No. 2021-I2M-1-056), the National Natural Science Foundation of China (Grant No. 72074222), and the National Social Science Foundation of China (Grant No. 21CTQ016).

Ethical statement

ORCID iDs

Jiahui Hu

Jin Fu

Wanqing Zhao

Pei Lou

Ming Feng

Huiling Ren

Shanshan Feng

Yansheng Li

An Fang

Supplemental Material

Supplemental material for this article is available online.

References

Khan

Hanrahan

Baldeweg

, et al. Current and future advances in surgical therapy for pituitary adenoma. Endocr Rev 2023; 44(5): 947–959.

Tritos

Miller

. Diagnosis and management of pituitary adenomas: a review. JAMA 2023; 329(16): 1386–1398.

Landolsi

Hlaoua

Ben Romdhane

. Information extraction from electronic medical documents: state of the art and future research directions. Knowl Inf Syst 2023; 65(2): 463–516.

Chen

, et al. Clinical concept extraction: a methodology review. J Biomed Inf 2020; 109: 103526.

Roberts

Datta

, et al. Deep learning in clinical natural language processing: a methodical review. J Am Med Inf Assoc 2020; 27(3): 457–470.

Chaturvedi

Chance

Mirza

, et al. Development of a corpus annotated with mentions of pain in mental health records: natural Language Processing approach. JMIR Form Res 2023; 7(1): e45849.

Martínez-deMiguel

Segura-Bedmar

Chacón-Solano

, et al. The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms. J Biomed Inf 2022; 125: 103961.

Frei

Kramer

. Annotated dataset creation through large language models for non-English medical NLP. J Biomed Inf 2023; 145: 104478.

Oliveira

LES

Peters

Da Silva

AMP

, et al. SemClinBr-a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. J Biomed Semant 2022; 13(1): 13.

10.

Cai

Chen

Guo

, et al. RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records. BMC Med Inf Decis Making 2023; 23(1): 126.

11.

Liu

Wong

ZSY

. Utilizing active learning strategies in machine-assisted annotation for clinical named entity recognition: a comprehensive analysis considering annotation costs and target effectiveness. J Am Med Inf Assoc 2024; 31: ocae197.

12.

Meedin

Caldera

Perera

, et al. A novel annotation scheme to generate hate speech corpus through crowdsourcing and active learning. Int J Adv Comput Sci Appl 2022; 13(11): 410–417.

13.

Fei

Ren

Zhang

, et al. Enriching contextualized language model from knowledge graph for biomedical information extraction. Briefings Bioinf 2021; 22(3): bbaa110.

14.

Raza

Schwartz

. Constructing a disease database and using natural language processing to capture and standardize free text clinical information. Sci Rep 2023; 13(1): 8591.

15.

Kehl

Gusev

, et al. Artificial intelligence-aided clinical annotation of a large multi-cancer genomic dataset. Nat Commun 2021; 12(1): 7304.

16.

Thirunavukarasu

Ting

DSJ

Elangovan

, et al. Large language models in medicine. Nat Med 2023; 29(8): 1930–1940.

17.

Floridi

Chiriatti

. GPT-3: its nature, scope, limits, and consequences. Minds Mach 2020; 30: 681–694.

18.

Chen

Tworek

Jun

, et al. Evaluating large language models trained on code. Ithaca, NY: Arxiv (Cornell University), 2021.

19.

Funkner

Balabaeva

Kovalchuk

. Negation detection for clinical text mining in Russian. Stud Health Technol Inf 2020; 270: 342–346.

20.

Lybarger

Ostendorf

Thompson

, et al. Extracting COVID-19 diagnoses and symptoms from clinical text: a new annotated corpus and neural event extraction framework. J Biomed Inf 2021; 117: 103761.

21.

Institute of Medical Information, Chinese Academy of Medical Sciences (IMICAMS) . Chinese clinical natural language processing (CCNLP) platform. https://ccnlp.imicams.ac.cn/ (2024, accessed 16 August 2024).

22.

Chowdhery

Narang

Devlin

, et al. Palm: scaling language modeling with pathways. J Mach Learn Res 2023; 24(240): 1–113.

23.

Zhao

Liu

Liang

, et al. A novel cascade instruction tuning method for biomedical NER. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Seoul, Korea, 14–19 April 2024, pp. 11701–11705.

24.

Brown

Mann

Ryder

, et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020; 33: 1877–1901.

25.

Shen

Wallis

, et al. Lora: low-rank adaptation of large language models. Ithaca, NY: ArXiv (Cornell University), 2021.

26.

Glm

Zeng

, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools. Ithaca, NY: ArXiv (Cornell University), 2024.

27.

Hasan

Paul

Mourikis

, et al. Context-aware query selection for active learning in event recognition. IEEE Trans Pattern Anal Mach Intell 2020; 42(3): 554–567.

28.

Sourati

Gholipour

, et al. Active deep learning with Fisher information for patch-wise semantic segmentation. In: International workshop on deep learning in medical image analysis, Granada, Spain, 20 September 2018, 83–91.

29.

Hasan

Paul

Mourikis

, et al. Context-aware query selection for active learning in event recognition. IEEE Trans Pattern Anal Mach Intell 2020; 42(3): 554–567.

30.

Institute of Medical Information, Chinese Academy of Medical Sciences (IMICAMS) . Text corpus of pituitary adenomas (TCPA). https://ccnlp.imicams.ac.cn/tcpa/ (2024, accessed 16 August 2024).

31.

Ilie

Vasiljevic

Bertolino

, et al. Biological and therapeutic implications of the tumor microenvironment in pituitary adenomas. Endocr Rev 2023; 44(2): 297–311.

32.

KKY

Kaiser

Chanson

, et al. Pituitary adenoma or neuroendocrine tumour: the need for an integrated prognostic classification. Nat Rev Endocrinol 2023; 19(11): 671–678.

33.

Luger

Broersen

LHA

Biermasz

, et al. ESE Clinical Practice Guideline on functioning and nonfunctioning pituitary adenomas in pregnancy. Eur J Endocrinol 2021; 185(3): G1–G33.

34.

Kwancharoen

Deerochanawong

Peerapatdit

, et al. Pituitary adenomas registry in Thailand. J Clin Neurosci 2023; 115: 138–147.

35.

Gene Ontology Consortium . The gene ontology resource: enriching a gold mine. Nucleic Acids Res 2021; 49(D1): D325–D334.

36.

Knox

Wilson

Klinger

, et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res 2024; 52(D1): D1265–D1275.

37.

Cascella

Montomoli

Bellini

, et al. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst 2023; 47(1): 33.

38.

Rao

Pang

Kim

, et al. Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study. J Med Internet Res 2023; 25: e48659.

39.

Chen

, et al. Improving large language models for clinical named entity recognition via prompt engineering. J Am Med Inf Assoc 2024; 31: ocad259.

40.

Wang

Sun

, et al. Gpt-ner: named entity recognition via large language models. Ithaca, NY: ArXiv (Cornell University), 2023.

41.

Datta

Lee

Paek

, et al. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. J Am Med Inf Assoc 2024; 31(2): 375–385.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.40 MB