Sage Journals: Discover world-class research

Abstract

Objectives

This study evaluated the feasibility of using the open-source large language model (LLM) DeepSeek-R1 to generate standardized Subjective, Objective, Assessment, and Plan (SOAP)-format medication logs and its potential to support clinical pharmacists.

Materials and methods

Thirty complete oncology medication profiles were collected, from which 80 days of logs were extracted and converted into simulated pharmacist–patient dialogues. The experiment compared single-information-source inputs (dialogue only) with multi-information-source inputs (dialogue plus patient information, records, and test results), using five prompts of increasing complexity. Performance was measured using the Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE), alongside a blinded expert evaluation based on the Seven-Dimension Index (7DI) metric.

Results

DeepSeek-R1 effectively generated structured SOAP medication logs when integrated with multi-source information and complex prompts (especially Prompts 4 and 5). Both machine scores and manual 7DI evaluation confirmed the superiority of multi-source inputs over single-source dialogues. While Prompt 4 achieved the highest BERT-F1 and ROUGE scores, the model's output quality remained highly dependent on input data completeness and required pharmacist review to correct errors (e.g. incomplete analyses), after which scores improved significantly.

Discussion

This study confirms DeepSeek-R1's utility in generating SOAP medication logs using multi-source data and structured prompts, potentially enhancing pharmacists’ efficiency. Limitations such as oncology-specific scope and artificial intelligence (AI) hallucinations necessitate pharmacist review and future validation across specialties, alongside comparisons with closed-source LLMs and explainable AI integration.

Conclusion

This study demonstrates that DeepSeek-R1 can generate structured SOAP-format medication logs when guided by prompt-engineered multi-source clinical information, while highlighting that output quality depends on input completeness and that pharmacist review remains essential for clinical reliability.

Keywords

Natural language processing clinical pharmacy information systems drug therapy medical records systems,computerized large language model

Introduction

The medication profile is a pharmacist's objective record of a patient's medication history and serves as an important foundation and essential source of information for pharmacists to provide individualized services to patients. An excellent medication profile should comprehensively integrate both subjective and objective patient information and include a time-oriented record of medication administration and related clinical assessments throughout hospitalization; this component is referred to in this study as the medication log. The Subjective, Objective, Assessment, and Plan (SOAP) structure is a standardized medication log format widely used in clinical practice to document a patient's condition, symptoms, assessment, and treatment plan.^1–3 With the widespread adoption of electronic health records (EHRs), SOAP's modular structure facilitates system entry and cross-team information sharing.⁴ SOAP structure is now an important part of the structure of a clinical pharmacist's written medication profile.⁵ SOAP-structured documentation supports clinical in organizing their thoughts, solving problems, and making clinical recommendations.⁶ Qualified SOAP-structured documentation should detail all medication-related issues and make the best and most thorough assessment, then list the relevant objectives for the issues.⁷ Also, SOAP-structured documentation should include a treatment plan for treatment-related problems, as well as a monitoring plan and patient education.⁷

However, in real-world clinical practice, pharmacists encounter many challenges when writing medication profiles. Mainly, clinical pharmacists are required to complete the writing of multiple patients’ medication profiles in limited time, resulting in many medication profiles being difficult to follow a complete SOAP format and a decrease in writing quality.⁸ In addition, patients with multiple conditions or those taking a multitude of medications often need detailed records, for which medication profile can be considered tedious and repetitive.⁶ In patients with complex poly-pharmacy, clinical pharmacists are required to synthesize information from multiple sources, including clinical guidelines, primary literature, and patient-specific data. The increasing complexity and volume of medication-related information may challenge traditional manual workflows and increase the risk of incomplete or inconsistent documentation.⁹ Patients’ medication information may be dispersed across multiple sources, such as prescriptions, pharmacy records, and patient self-reports, requiring additional time and effort to aggregate and process information from multiple sources into a medication profile. Hence, although writing medication profiles is one of the most essential tasks for clinical pharmacists, they still experience problems with this task.

Previous studies have demonstrated that well-designed health information systems play a critical role in supporting clinical workflows, reducing cognitive burden, and improving documentation quality in medication-related processes. For example, usability-focused evaluations of electronic prescribing systems have shown that system design and information accessibility substantially influence prescribing accuracy and clinical efficiency, highlighting the importance of informatics tools in complex medication management scenarios.¹⁰

With the rapid development of large language models (LLMs), an increasing number of studies have explored their application to clinical tasks, such as medical Q&A, simulated dialogues, and processing medical texts.^11–16 DeepSeek LLM is an open-source LLM whose goal is to advance the development of open-source language models through long-termism.¹⁷ In this context, open-source LLMs represent a promising class of informatics tools for medication documentation, as they allow for flexible local deployment, data privacy protection, and customization within healthcare organizations. DeepSeek-R1 was selected in this study as a representative open-source LLM to explore this potential.^17,18 To date, no studies have investigated the use of DeepSeek to assist clinical pharmacists with medication logs within medication profile; however, prior research suggests its theoretical feasibility. For example, one study demonstrated that DeepSeek enables rapid retrieval of the latest medical literature and clinical guidelines, and DeepSeek's data collection and processing capabilities can automatically generate structured EHRs, which are considered to help doctors process and understand medical text faster.¹⁹ Thus, DeepSeek has the potential to help clinical pharmacists process medication profiles more efficiently, solving the problem of time-consuming medication profile writing. In addition, other studies have proved that DeepSeek is capable of performing many clinical healthcare-related tasks, such as assisting in the diagnosis of diseases,²⁰ improving disease management,^21,22 and predicting drug toxicity,²³ those can assist clinical pharmacists with medication analysis in the medication profile as well as pharmacovigilance.

The medication profile contains many elements, and this study focuses on the medication log portion of the medication profile. Leveraging DeepSeek's established strengths in text recognition and text generation,^24,25 and guided by principles of prompt engineering,^26,27 the aim of this study is to assess the feasibility of utilizing DeepSeek to generate standardized medication logs.

Methods

Research process

This study was designed as a retrospective methodological evaluation to assess the feasibility and performance of a prompt-driven open-source LLM (DeepSeek-R1) for generating SOAP-structured medication logs using single-source and multi-source clinical information. As illustrated in Figure 1, the research strategy of this study consists of five steps: (1) converting medication logs written by clinical pharmacists into simulated dialogues of pharmacy clinical rounds between pharmacists and patients; (2) to investigate the effect of different information sources on the generation of medication logs by DeepSeek-R1, cases were divided into two groups: a single-information-source group (only simulated check-up dialogues are provided) and a multi-information-source group (in addition to simulated check-up dialogues, medical record information and ancillary tests were also provided); (3) providing the two different sets of information to DeepSeek-R1 and combining with five prompts designed to guide DeepSeek-R1 in generating medication logs; (4) conducting machine and manual evaluation of the quality of DeepSeek-R1-generated medication logs; and (5) performing manual co-checking and optimization, with check points revised based on the generated errors.

Figure 1.

The overall structure of our study. Collecting manually written medication logs and transforming them into simulated clinical pharmacist check-in dialogues. The different sources of information were divided into two groups and given to DeepSeek-R1 combined with five prompts, and then medication log was generated. Machine scoring with human critique was then carried out, for the generated medication logs. Finally, manual co-checking and optimization were performed and check points were revised based on the generated error.

Data acquisition and processing

We collected 30 cases of medication profiles from Jiangsu Cancer Hospital. All medication profiles were originally documented in Chinese. To avoid a decrease in the quality of LLM generation due to excessively long text,²⁸ and to simulate a realistic clinical scenario, we divided each complete medication log (from the patient's admission to discharge) into days and excluded days that did not involve medication changes or specific medication analyses, resulting in a total of 80 daily logs for the original data. The original medication profile information was then systematically organized according to the application scenario. All patient-identifiable information was removed to protect privacy, and the remaining content was structured into a standardized input-format dataset. This dataset served as the foundational task corpus for the application of artificial intelligence (AI) models.

Rather than using raw electronic medical record (EMR) text directly as model input, we adopted a dialogue-based simulation strategy. In routine clinical pharmacy practice, medication logs are not generated by simply reformatting EMR medication sections, which are often fragmented, template-driven. Instead, pharmacists actively integrate information obtained through patient communication, medication reconciliation, and review of clinical data before synthesizing structured documentation.

To ensure data controllability and prevent information distortion during the log-to-dialogue transformation, pharmacist-authored medication logs were retrospectively converted into simulated clinical pharmacy round dialogues using DeepSeek-R1 and Supplemental Note 1 shows the prompt for the conversion. All dialogues were generated automatically and reviewed by three experienced clinical pharmacists solely to verify the completeness and clinical plausibility of medication-related content. No manual editing or rewriting was performed; dialogues that failed to meet pre-defined criteria were discarded and regenerated by the model. Only reviewer-approved, unmodified dialogues were used as input for subsequent SOAP-structured medication log generation.

LLM configuration and prompt design

Our study employed the DeepSeek-R1 model (https://chat.DeepSeek.com/) with the parameters set to default values.

We designed five prompts with increasing levels of constraint to progressively guide the LLM toward generating standardized SOAP-format medication logs. This stepwise design follows “Prompt Engineering” from Google and other studies, in which task specification, role definition, and structural constraints are incrementally introduced to improve output consistency and task alignment.^26,29 Prompt 1 adopted a zero-shot design approach and provided only a simple task description. Prompt 2 was built upon Prompt 1 by adding a reminder of the identity of the clinical pharmacist, and Prompt 3 further extended Prompt 2 by adding definitions of the medication profile and medication log. Prompt 4 introduced a content template for the medication log that conformed to SOAP formatting requirements. Prompt 5 utilized the “CRISPE” framework. The “CRISPE” framework contains several elements: Capacity and Role (clarifying the role played by AI); Insight (provides sufficient background information and context to enable AI to better understand the problem); Statement (clearly state the requirements or problems); Personality (set the language style or structure for answering questions); Example (standard examples are provided in this study). Separate prompt sets were fine-tuned for the single-information-source and multi-information-source groups. Supplemental Notes 2 and 3 provide the detailed prompts design of this study.

Assessment system

We applied both machine scoring and manual scoring approaches to evaluate the quality of the medication logs generated by DeepSeek-R1. The machine score in this study consisted of the Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE).^30,31 They were used to quantify semantic and lexical similarity between generated medication logs and pharmacist-authored reference texts. Higher values indicate greater content overlap and information coverage, whereas lower values suggest increased divergence from the reference. As these metrics do not directly reflect clinical accuracy or reliability and lack universally accepted performance thresholds, they were interpreted as supportive indicators and complemented by expert manual evaluation.

For manual scoring, we referenced EMR's manual critique approach, using 7DI (seven-dimensional indicators) scoring,²⁹ and adapted two indicators to our study content. The seven dimensions included: information integrity, terminology use, logical consistency, grammatical structure, stylistic style, subjective and objective information integrity, and pharmacovigilance content accuracy. Detailed scoring criteria could be found in Supplemental Note 4. All generated medication logs were independently evaluated by three experienced clinical pharmacists. The experts then summarized the advantages and disadvantages of DeepSeek-R1-generated text as well as the check points based on the common errors, and then the generated texts with the highest evaluation scores were manually corrected and refined according to these check points.

This process simulated the actual work of a clinical pharmacist using DeepSeek-R1 to generate medication logs and manually review and modify them.

Statistical analysis

The BERT score/ROUGE library was used for machine score calculations; “bert-base-chinese” pre-trained model was used to better compute the BERT score for Chinese text. Inter-rater reliability of the total 7DI scores among the three raters was assessed using Interclass Correlation Coefficient (ICC)(3,1). An independent two-sample t-test was used to compare the two groups of data to see if they were significantly different. Statistical results were expressed as mean ± standard deviation. P values <0.05 were considered statistically significant, with all tests being two-sided. Effect sizes were calculated using Cohens’ d to quantify the magnitude of differences between single-information-source and multi-information-source groups. Data processing and visualization were performed using Python (version 3.13) and Microsoft Excel 2021.

Ethical consideration

This study was reviewed and approved by the Institutional Review Board (IRB) of Jiangsu Cancer Hospital (Ethics approval No. KY-2025-085). The study involved a retrospective analysis of de-identified patient data, for which informed consent was not required. The IRB conducted a detailed review of our data handling protocol, including the nature of the de-identification process, the specific data fields involved, the mechanism of data transfer to the third-party service, and the privacy policies of DeepSeek (as understood at the time), and determined that the study met applicable ethical standards. All procedures were performed in accordance with the relevant guidelines and regulations, and the study complied with the ethical standards of the Declaration of Helsinki.

Results

Machine scoring of generated text

We used machine scores (the BERT score and ROUGE) to assess the similarity between the generated texts and the reference texts. Overall, the multi-information-source group consistently outperformed the single-information-source group. For example, for Prompt 4 (0.68 ± 0.03 vs. 0.65 ± 0.01, P < 0.01), similar trends were observed for precision and recall (Supplemental Table 1).

For the BERT score, Prompt 4 with multi-source inputs achieved the highest score with a recall value of 0.69 ± 0.03 and a precision of 0.67 ± 0.03 and an F1 value of 0.68 ± 0.03 (Figure 2 and Supplemental Table 1). For the ROUGE score, Prompt 5 achieved better recall values on ROUGE-2 (0.15 ± 0.07). However, Prompt 4 achieved the highest precision for ROUGE-1 (0.46 ± 0.11), ROUGE-2 (0.15 ± 0.06), and ROUGE-L (0.23 ± 0.07), and F1 scores for ROUGE-1 (0.41 ± 0.05), ROUGE-2 (0.14 ± 0.07), and ROUGE-L (0.21 ± 0.04). Notably, the single-information-source Prompt 1 achieved the highest score (0.51 ± 0.12) in terms of precision value for ROUGE-1, but its recall and F1 values were lower. This may reflect that its generated content matched well with the reference text, but the generated content was missing compared to the reference text (Supplemental Figure 1 and Supplemental Table 2).

Figure 2.

BERT score for the generated medication log. Scoring the text generated by the five prompts (precision, recall, and F1). (a) Single-information-source group. (b) Multi-information-source group.

Figure 3 demonstrates the t-distributed Stochastic Neighbor Embedding (t-SNE) plots of the embeddings of the two groups. The distributions of generated and reference texts were different. The distributions of Prompts 1, 2, and 3 were much closer together than those of Prompts 4 and 5.

Figure 3.

t-SNE plots of generated text and reference text embeddings. (a) Single-information-source group. (b) Multi-information-source group.

Manual review of generated text

The generated text was then manually scored (7DI) by three professional clinical pharmacists to assess the content quality of the text generated by DeepSeek-R1 (Figure 4 and Supplemental Table 3). All three raters agreed that text generated from multi-information-source was superior to text from single-information-source with consistently higher mean 7DI scores across prompts (e.g. Prompt 4—Rater 1: 29.35 ± 1.54 vs. 25.66 ± 1.90; Rater 2: 29.55 ± 2.92 vs. 25.15 ± 1.77; Rater 3: 30.60 ± 1.85 vs. 27.49 ± 2.07; all P < 0.01; Supplemental Table 3). The overall ICC(3,1) was 0.63 (95% confidence interval: 0.60–0.66, P < 0.001), indicating moderate to good agreement among raters. This suggests that the 7DI scoring system is reasonably robust across different raters. All raters considered that the text generated by Prompts 4 and 5 was of high-quality and well-structured, which was consistent with the previous results. Raters 1 and 2 considered that Prompt 4 achieved the highest score (Rater 1: 29.35 ± 1.54; Rater 2: 29.55 ± 2.92), while Rater 3 considered that Prompt 5 got the highest score (30.64 ± 1.85).

Figure 4.

7DI scores for generated medication log. Three experts rate the quality of DeepSeek-R1's generation of medication log. (a) Single-information-source group. (b) Multi-information-source group. 7DI: seven-dimension index.

Overall assessment of the generated text

Since medication logs are typically long texts (thousands of characters), we first evaluated differences in the length of the texts generated by DeepSeek-R1 across different information-source conditions and prompt designs. A comprehensive evaluation revealed that the text volumes generated by Prompts 4 and 5 were substantially higher than Prompts 1, 2, and 3, irrespective of whether they were derived from a single or multiple sources of information. The text produced by Prompts 4 and 5 approached the length of manually written medication logs. On the other hand, through richer information sources, there was a boost in the number of words of generated text for Prompts 1 and 2 (Prompt 1: from 1567.25 ± 335.15 to 1932.34 ± 329.40; Prompt 2: from 1868.60 ± 270.03 to 2011.88 ± 296.29), but for the other prompts there was no significant enhancement (Figure 5).

Figure 5.

Length of DeepSeek-R1-generated text. (a) Single-information-source group. (b) Multi-information-source group.

In addition to length, we also assessed the overall structure of the generated text. Three experts agreed that the structure of the text generated by Prompts 4 and 5 was more consistent with the SOAP format. A Prompt 4-generated example is listed in Supplemental Notes 5 to demonstrate the structure and quality of DeepSeek-R1's generation of medication log. To further demonstrate the role of pharmacist oversight in real-world application, the Prompt 4 medication logs with multi-information-source were subsequently reviewed and corrected by experienced clinical pharmacists as a post-generation step. This manual review process was intended to identify and correct content-level inaccuracies and omissions, rather than to reflect improvements in the intrinsic performance of the model. Recurrent error patterns identified during pharmacist review were classified and are summarized in Table 1. Observed advantages, limitations, and pharmacist review checkpoints are provided in Supplemental Table 4. After manual modification, the BERT score and ROUGE of the generated text were both improved significantly (e.g. BERT F1 increased from 0.68 ± 0.03 to 0.71 ± 0.05, P < 0.05; Figure 6 and Supplemental Figure 2), which suggested that this step of manual revision was essential.

Figure 6.

BERT score of optimized Prompt 4-generated text from multi-information-source. Comparison of BERT precision, recall, and F1 value before and after modification.

Table 1.

Classification and frequency of errors identified during pharmacist review of DeepSeek-R1-generated medication logs.

Error category	Description	Frequency (n, %)	Potential clinical relevance
Incomplete medication analysis	Missing indication, monitoring parameters, or dose rationale	4 (5)	Moderate
Incorrect medication interpretation	Misclassification of drug purpose or inappropriate assessment	3 (3.75)	High
Incorrect or missing administration dates	Incorrect or missing administration dates	11 (13.75)	Low to moderate
Redundant or clinically irrelevant content	Inclusion of medication information or analyses not applicable to the patient's actual clinical context	7 (8.75)	Low to moderate
Formatting or structuring issues	Minor SOAP structure deviations without content loss	1 (1.25)	Low

SOAP: Subjective, Objective, Assessment, and Plan.

Discussion

This study demonstrated the potential capacity of the open-source LLM DeepSeek-R1 for generating medication logs, which could accurately extract key pharmacy elements from EMRs and generate structured medication logs in SOAP format through carefully designed prompts and the construction of an information fusion mechanism that combined “pharmacist check-in dialogues-patient information-medical record information.” This work has the potential to support clinical pharmacists in documentation-related tasks and may contribute to improved workflow efficiency. Moreover, the immediacy of generation may facilitate timely identification of medication-related issues for subsequent review and communication within the healthcare team. This study also found that prompts characterized by stringent structural qualification requirements exhibited a marked superiority over zero-shot and few-shot prompts. Consequently, incorporating prompts that align with the clinical documentation workflow appears to be important. More importantly, the consistent superiority of multi-information-source inputs over single dialogue inputs represents a key practical insight of this study. Medication log generation is inherently a multi-source cognitive task that requires the integration of patient-reported information, prescription records, laboratory results, and clinical context. A single pharmacist–patient dialogue, even when well-structured, is insufficient to fully capture this complexity. Our findings demonstrate that LLM performance in this setting is not merely sensitive to prompt design, but fundamentally constrained by the completeness and reliability of upstream information. This suggests that in real-world clinical pharmacy applications, LLMs should not be deployed as dialogue-only summarization tools, but rather as components embedded within information systems capable of aggregating and presenting multi-source clinical data. In this sense, multi-source data integration is not an optional enhancement, but a pre-requisite for clinically reliable LLM-assisted medication documentation.

We chose DeepSeek-R1 as an LLM for the study because its open-source nature enables more flexible deployment within healthcare organizations, allowing local data to be processed without transmission over the internet, which avoids a lot of data privacy issues.^32,33 However, this study only covered DeepSeek-R1 on a website without localized deployment. And DeepSeek-R1's ability to run in a localized deployment is something we need to investigate next.

In earlier studies, LLMs (e.g. ChatGPT) have been referenced for their capacity to produce EMRs, thereby indicating their aptitude for extracting information from patients’ verbal accounts and generating electronic documentation,²⁹ which can help healthcare workers to write medical records.³⁴ There is also some confirmation that an LLM is capable of generating a discharge summary (part of the medication profile).^35,36 Another study was conducted to draw parallels between the comparative abilities of open-source and closed-source LLMs with regard to the generation of medical reports, and found that both open-source LLMs and closed-source LLMs had the capacity to demonstrate performance that was analogous to, or in some cases superior to, that of GPT in certain tasks.³⁷ It was hypothesized by another study that DeepSeek-R1 shared similarities with LLMs such as GPT in terms of its clinical reasoning capabilities.³⁸

It is worth noting that the machine scores in this study, which mainly evaluated the similarity between the generated text and the reference text, indicated that DeepSeek-R1-generated medication logs still exhibited gaps when compared with manually written medication logs, which was corroborated by the t-SNE plots (Figure 3). Similar t-SNE plots have been observed in other studies between the generated text and the reference text.³⁷ An analysis of the data suggested that this phenomenon may be attributable to the marked difference in writing style between DeepSeek-R1-generated texts and those produced manually. Other studies have explored the syntactic and lexical differences between AI-generated and human texts.^39,40

During manual review of the DeepSeek-R1-generated medication logs, we similarly found a number of errors in the content, which was consistent with conclusions reported in previous studies.²⁸ Therefore, the review of DeepSeek-R1-generated texts by clinical pharmacists is indispensable. This study meticulously delineates the check points pertinent to the prevalent issues encountered in the DeepSeek-R1-generated text, which could assist clinical pharmacists in the more efficient identification and correction of problems generated by AI. Additionally, this study aims to empower clinical pharmacists to supplement personalized content through natural language, thus enhancing the comprehensibility and utility of the DeepSeek-R1-generated text.

Despite the remarkable results, this study has some limitations. First, the cases involved in this study were only those in oncology hospitals, which have certain limitations in terms of medication use and lack of analysis of whether there is consistency in the use of medication in other departments as well as some special medications (e.g. antimicrobials). Future studies should extend the proposed workflow to additional clinical specialties to evaluate its robustness across diverse medication management contexts. Furthermore, the present study exclusively encompassed the analysis of DeepSeek-R1 as an LLM, eschewing a comparison with other LLMs. The subsequent phase of the study can be initiated by expanding the range of cases and conducting a comparison with other LLMs from two perspectives. In addition, we cannot ignore the issues inherent to DeepSeek-R1 itself. These include the necessity to localize its use in order to circumvent data privacy concerns, as previously referenced, and the “black box” nature of LLMs such as DeepSeek-R1, which continues to give rise to concerns among pharmacists.⁴¹ Future work should explore locally deployed implementations to further assess feasibility under real-world data privacy and institutional constraints.

Conclusion

This study demonstrates that an open-source LLM can generate structured SOAP-format medication logs when guided by carefully designed prompts and integrated with multi-source clinical information. The findings further indicate that the quality of LLM-generated medication documentation depends primarily on the completeness and integration of input data, as well as on structured prompt design, rather than on dialogue input alone. At the same time, observed content inaccuracies and omissions underscore that such outputs should be positioned as pharmacist-support tools rather than autonomous documentation systems, with expert review remaining essential for clinical reliability. Collectively, these results clarify the conditions under which open-source LLMs may be applied to medication log generation and provide methodological guidance for future research on LLM-assisted clinical documentation.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076261428234 - Supplemental material for Generating subjective, objective, assessment, and plan (SOAP)-structured medication logs using DeepSeek-R1 through prompt engineering and multi-source clinical information

Supplemental material, sj-docx-1-dhj-10.1177_20552076261428234 for Generating subjective, objective, assessment, and plan (SOAP)-structured medication logs using DeepSeek-R1 through prompt engineering and multi-source clinical information by Yuxuan Zhu, Jizhong Zhang, Yuhao Sun, Jiayu Wen, Zhixian Liu, Xin Liu, Silu Xu, Nan Wu, Yuanyuan Zhang, Guoren Zhou and Jifu Wei in DIGITAL HEALTH

Footnotes

Acknowledgments

The authors would like to express sincere gratitude to the pharmacists from the Pharmacy Department of Jiangsu Cancer Hospital for their assistance.

ORCID iD

Yuxuan Zhu

Ethical considerations

This study involves a retrospective study of de-identified patient data. The Institutional Review Board of Jiangsu Cancer Hospital has conducted an ethical review and approved it. The specific information is reflected in the Methods section of the article.

Author contributions

YXZ, YHS, and JZZ: investigation, methodology, software, data curation, formal analysis, visualization, and writing—original draft. JYW and ZXL: data curation, investigation, methodology, and writing—original draft. XL, SLX, and NW: methodology, data curation, formal analysis, validation, and writing—original draft. YYZ, GRZ, and JFW: conceptualization, supervision, and writing—review and editing. All authors have read and approved the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Yishan Project of Jiangsu Cancer Hospital (YSZD202406) and Qunfeng Project of Jiangsu Cancer Hospital (No. DFXK202501).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

Data supporting this study's findings can be obtained from the corresponding author upon reasonable request.

Guarantor

Yuanyuan Zhang or Guoren Zhou or Jifu Wei.

Supplemental material

Supplemental material for this article is available online.

References

Lisenby

Andrus

Jackson

, et al. Ambulatory care preceptors’ perceptions on SOAP note writing in advanced pharmacy practice experiences (APPEs). Curr Pharm Teach Learn 2018; 10: 1574–1578.

Gogineni

Aranda

Garavalia

. Designing professional program instruction to align with students’ cognitive processing. Curr Pharm Teach Learn 2019; 11: 160–165.

Andrus

McDonough

SLK

Kelley

, et al. Development and validation of a rubric to evaluate diabetes SOAP note writing in APPE. Am J Pharm Educ 2018; 82: 6725.

Suhaili

Nasif

Oktavia Sari

. The effect of education on writing integrated patient progress notes (IPPNs) at several government hospitals in Bukittinggi, Indonesia. J Farm Dan Ilmu Kefarmasian Indones 2024; 11: 101–111.

American College of Clinical Pharmacy. Standards of practice for clinical pharmacists. Pharmacother J Hum Pharmacol Drug Ther 2014; 34: 794–797.

Chan

Saeteaw

Chui

, et al. Perceptions of pharmacy students and pharmacists on SOAP note education and utility in pharmacy practice. Curr Pharm Teach Learn 2016; 8: 77–82.

Sherman

Johnson

. Assessment of pharmacy students’ patient care skills using case scenarios with a SOAP note grading rubric and standardized patient feedback. Curr Pharm Teach Learn 2019; 11: 513–521.

Minard

Deal

Harrison

, et al. Pharmacists’ perceptions of the barriers and facilitators to the implementation of clinical pharmacy key performance indicators. PLoS ONE 2016; 11: e0152903.

Canning

Munns

Tai

. Accuracy of best possible medication history documentation by pharmacists at an Australian tertiary referral metropolitan hospital. Eur J Hosp Pharm 2018; 25: e52–e58.

10.

Hayavi-Haghighi

Davoodi

Teshnizi

, et al. Usability evaluation of electronic prescribing systems from physicians’ perspective: a case study from southern Iran. Inform Med Unlocked 2024; 45: 101460.

11.

Singhal

Gottweis

, et al. Toward expert-level medical question answering with large language models. Nat Med 2025; 31: 943–950.

12.

Mao

Lin

, et al. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. Inf Fusion 2025; 118: 102963.

13.

Park

Y-J

Pillai

Deng

, et al. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Med Inform Decis Mak 2024; 24: 72.

14.

Thirunavukarasu

Ting

DSJ

Elangovan

, et al. Large language models in medicine. Nat Med 2023; 29: 1930–1940.

15.

Yalamanchili

Sengupta

Song

, et al. Quality of large language model responses to radiation oncology patient care questions. JAMA Netw Open 2024; 7: e244630.

16.

Ayers

Poliak

Dredze

, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023; 183: 589–596.

17.

Temsah

Alhasan

Altamimi

, et al. DeepSeek in healthcare: revealing opportunities and steering challenges of a new open-source artificial intelligence frontier. Cureus 2025; 17: e79221.

18.

Sandmann

Hegselmann

Fujarski

, et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat Med. Epub ahead of print 23 April 2025. DOI: 10.1038/s41591-025-03727-2.

19.

Liang

Chen

Zou

, et al. DeepSeek: the “Watson” to doctors—from assistance to collaboration. J Thorac Dis 2025; 17: 1103–1105.

20.

Kaygisiz

ÖF

Teke

. Can DeepSeek and ChatGPT be used in the diagnosis of oral pathologies? BMC Oral Health 2025; 25: 638.

21.

Jin

Zhang

Yang

, et al. Every cloud has a silver lining: DeepSeek’s light through acute respiratory distress syndrome shadows. J Thorac Dis 2025; 17: 1109–1113.

22.

Marcaccini

Seth

Xie

, et al. Breaking bones, breaking barriers: ChatGPT, DeepSeek, and Gemini in hand fracture management. J Clin Med 2025; 14: 1983.

23.

Chen

Y-Q

Song

Z-Q

, et al. Application of large language models in drug-induced osteotoxicity prediction. J Chem Inf Model 2025; 65: 3370–3379.

24.

Zhou

Pan

Zhang

, et al. Evaluating AI-generated patient education materials for spinal surgeries: comparative analysis of readability and DISCERN quality across ChatGPT and DeepSeek models. Int J Med Inf 2025; 198: 105871.

25.

Kayaalp

Prill

Sezgin

, et al. DeepSeek versus ChatGPT: multimodal artificial intelligence revolutionizing scientific discovery. From language editing to autonomous content generation-redefining innovation in research and practice. Knee Surg Sports Traumatol Arthrosc Off J ESSKA 2025; 33: 1553–1556.

26.

Meskó

. Prompt engineering as an important emerging skill for medical professionals: tutorial. J Med Internet Res 2023; 25: e50638.

27.

Wang

Chen

Deng

, et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. Npj Digit Med 2024; 7: 41.

28.

Kernberg

Gold

Mohan

. Using ChatGPT-4 to create structured medical notes from audio recordings of physician–patient encounters: comparative study. J Med Internet Res 2024; 26: e54419.

29.

Ding

Xia

Zhou

, et al. Evaluation and practical application of prompt-driven ChatGPTs for EMR generation. Npj Digit Med 2025; 8: 77.

30.

Zhang

Kishore

, et al. BERTScore: Evaluating Text Generation with BERT. In: International conference on learning representations , 2020.

31.

Lin

C-Y

. ROUGE: a package for automatic evaluation of summaries, 2004.

32.

Liu

Zhang

, et al. Surviving ChatGPT in healthcare. Front Radiol 2024; 3: 1224682.

33.

Wiest

Ferber

Zhu

, et al. Privacy-preserving large language models for structured medical information retrieval. Npj Digit Med 2024; 7: 257.

34.

Cascella

Montomoli

Bellini

, et al. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst 2023; 47: 33.

35.

Chintagunta

Katariya

Amatriain

, et al. Medically aware GPT-3 as a data generator for medical dialogue summarization. In: Proceedings of the second workshop on natural language processing for medical conversations , pp.66–76: Association for Computational Linguistics.

36.

Patel

Lam

. ChatGPT: the future of discharge summaries? Lancet Digit Health 2023; 5: e107–e108.

37.

Liu

Koopman

Brown

, et al. Generating synthetic clinical text with local large language models to identify misdiagnosed limb fractures in radiology reports. Artif Intell Med 2025; 159: 103027.

38.

Tordjman

Liu

Yuce

, et al. Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning. Nat Med 2025; 31: 2550–2555.

39.

Yildiz Durak

Eğin

Onan

. A comparison of human-written versus AI-generated text in discussions at educational settings: investigating features for ChatGPT, Gemini and BingAI. Eur J Educ 2025; 60: e70014.

40.

Georgiou

. Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool. arXiv.org, https://arxiv.org/abs/2407.03646v2 (2024, accessed 28 May 2025).

41.

Liu

Barreto

Dong

, et al. Discrepancy between perceptions and acceptance of clinical decision support systems: implementation of artificial intelligence for vancomycin dosing. BMC Med Inform Decis Mak 2023; 23: 157.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.34 MB

Generating subjective,objective,assessment,and plan (SOAP)-structured medication logs using DeepSeek-R1 through prompt engineering and multi-source clinical information

Abstract

Objectives

Materials and methods

Results

Discussion

Conclusion

Keywords

Introduction

Methods

Research process

Data acquisition and processing

LLM configuration and prompt design

Assessment system

Statistical analysis

Ethical consideration

Results

Machine scoring of generated text

Manual review of generated text

Overall assessment of the generated text

Discussion

Conclusion

Supplemental Material

sj-docx-1-dhj-10.1177_20552076261428234 - Supplemental material for Generating subjective, objective, assessment, and plan (SOAP)-structured medication logs using DeepSeek-R1 through prompt engineering and multi-source clinical information

Footnotes

Acknowledgments

ORCID iD

Ethical considerations

Author contributions

Funding

Declaration of conflicting interests

Data availability

Guarantor

Supplemental material

References

Supplementary Material