Abstract
The vast volume and complexity of clinical research articles make it challenging for individuals to efficiently access and analyze the data. To tackle this issue, Artificial Intelligence (AI) and Natural Language Processing (NLP) are becoming invaluable for managing unstructured data. The primary obstacles are the scarcity of high-quality labeled data and the specialized terminology in healthcare, which differs significantly from Standard English. Historically, conventional healthcare information extraction systems have heavily relied on human involvement to manually establish extraction rules or create tagged training examples. However, given the enormous amount of data available online and the extensive and ambiguous relationships of interest, it has become essential to move away from models dependent on predefined relationships and high-quality labeled data for information extraction. This proposed study implements a framework based on an AI and NLP large language model with data augmentation strategies while adhering to the semantic network of the healthcare domain. The models demonstrate a substantial improvement in the F1-score.
Keywords
Get full access to this article
View all access options for this article.
