Abstract
Objectives
This study evaluated the feasibility of using the open-source large language model (LLM) DeepSeek-R1 to generate standardized Subjective, Objective, Assessment, and Plan (SOAP)-format medication logs and its potential to support clinical pharmacists.
Materials and methods
Thirty complete oncology medication profiles were collected, from which 80 days of logs were extracted and converted into simulated pharmacist–patient dialogues. The experiment compared single-information-source inputs (dialogue only) with multi-information-source inputs (dialogue plus patient information, records, and test results), using five prompts of increasing complexity. Performance was measured using the Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE), alongside a blinded expert evaluation based on the Seven-Dimension Index (7DI) metric.
Results
DeepSeek-R1 effectively generated structured SOAP medication logs when integrated with multi-source information and complex prompts (especially Prompts 4 and 5). Both machine scores and manual 7DI evaluation confirmed the superiority of multi-source inputs over single-source dialogues. While Prompt 4 achieved the highest BERT-
Discussion
This study confirms DeepSeek-R1's utility in generating SOAP medication logs using multi-source data and structured prompts, potentially enhancing pharmacists’ efficiency. Limitations such as oncology-specific scope and artificial intelligence (AI) hallucinations necessitate pharmacist review and future validation across specialties, alongside comparisons with closed-source LLMs and explainable AI integration.
Conclusion
This study demonstrates that DeepSeek-R1 can generate structured SOAP-format medication logs when guided by prompt-engineered multi-source clinical information, while highlighting that output quality depends on input completeness and that pharmacist review remains essential for clinical reliability.
Keywords
Introduction
The medication profile is a pharmacist's objective record of a patient's medication history and serves as an important foundation and essential source of information for pharmacists to provide individualized services to patients. An excellent medication profile should comprehensively integrate both subjective and objective patient information and include a time-oriented record of medication administration and related clinical assessments throughout hospitalization; this component is referred to in this study as the medication log. The Subjective, Objective, Assessment, and Plan (SOAP) structure is a standardized medication log format widely used in clinical practice to document a patient's condition, symptoms, assessment, and treatment plan.1–3 With the widespread adoption of electronic health records (EHRs), SOAP's modular structure facilitates system entry and cross-team information sharing. 4 SOAP structure is now an important part of the structure of a clinical pharmacist's written medication profile. 5 SOAP-structured documentation supports clinical in organizing their thoughts, solving problems, and making clinical recommendations. 6 Qualified SOAP-structured documentation should detail all medication-related issues and make the best and most thorough assessment, then list the relevant objectives for the issues. 7 Also, SOAP-structured documentation should include a treatment plan for treatment-related problems, as well as a monitoring plan and patient education. 7
However, in real-world clinical practice, pharmacists encounter many challenges when writing medication profiles. Mainly, clinical pharmacists are required to complete the writing of multiple patients’ medication profiles in limited time, resulting in many medication profiles being difficult to follow a complete SOAP format and a decrease in writing quality. 8 In addition, patients with multiple conditions or those taking a multitude of medications often need detailed records, for which medication profile can be considered tedious and repetitive. 6 In patients with complex poly-pharmacy, clinical pharmacists are required to synthesize information from multiple sources, including clinical guidelines, primary literature, and patient-specific data. The increasing complexity and volume of medication-related information may challenge traditional manual workflows and increase the risk of incomplete or inconsistent documentation. 9 Patients’ medication information may be dispersed across multiple sources, such as prescriptions, pharmacy records, and patient self-reports, requiring additional time and effort to aggregate and process information from multiple sources into a medication profile. Hence, although writing medication profiles is one of the most essential tasks for clinical pharmacists, they still experience problems with this task.
Previous studies have demonstrated that well-designed health information systems play a critical role in supporting clinical workflows, reducing cognitive burden, and improving documentation quality in medication-related processes. For example, usability-focused evaluations of electronic prescribing systems have shown that system design and information accessibility substantially influence prescribing accuracy and clinical efficiency, highlighting the importance of informatics tools in complex medication management scenarios. 10
With the rapid development of large language models (LLMs), an increasing number of studies have explored their application to clinical tasks, such as medical Q&A, simulated dialogues, and processing medical texts.11–16 DeepSeek LLM is an open-source LLM whose goal is to advance the development of open-source language models through long-termism. 17 In this context, open-source LLMs represent a promising class of informatics tools for medication documentation, as they allow for flexible local deployment, data privacy protection, and customization within healthcare organizations. DeepSeek-R1 was selected in this study as a representative open-source LLM to explore this potential.17,18 To date, no studies have investigated the use of DeepSeek to assist clinical pharmacists with medication logs within medication profile; however, prior research suggests its theoretical feasibility. For example, one study demonstrated that DeepSeek enables rapid retrieval of the latest medical literature and clinical guidelines, and DeepSeek's data collection and processing capabilities can automatically generate structured EHRs, which are considered to help doctors process and understand medical text faster. 19 Thus, DeepSeek has the potential to help clinical pharmacists process medication profiles more efficiently, solving the problem of time-consuming medication profile writing. In addition, other studies have proved that DeepSeek is capable of performing many clinical healthcare-related tasks, such as assisting in the diagnosis of diseases, 20 improving disease management,21,22 and predicting drug toxicity, 23 those can assist clinical pharmacists with medication analysis in the medication profile as well as pharmacovigilance.
The medication profile contains many elements, and this study focuses on the medication log portion of the medication profile. Leveraging DeepSeek's established strengths in text recognition and text generation,24,25 and guided by principles of prompt engineering,26,27 the aim of this study is to assess the feasibility of utilizing DeepSeek to generate standardized medication logs.
Methods
Research process
This study was designed as a retrospective methodological evaluation to assess the feasibility and performance of a prompt-driven open-source LLM (DeepSeek-R1) for generating SOAP-structured medication logs using single-source and multi-source clinical information. As illustrated in Figure 1, the research strategy of this study consists of five steps: (1) converting medication logs written by clinical pharmacists into simulated dialogues of pharmacy clinical rounds between pharmacists and patients; (2) to investigate the effect of different information sources on the generation of medication logs by DeepSeek-R1, cases were divided into two groups: a single-information-source group (only simulated check-up dialogues are provided) and a multi-information-source group (in addition to simulated check-up dialogues, medical record information and ancillary tests were also provided); (3) providing the two different sets of information to DeepSeek-R1 and combining with five prompts designed to guide DeepSeek-R1 in generating medication logs; (4) conducting machine and manual evaluation of the quality of DeepSeek-R1-generated medication logs; and (5) performing manual co-checking and optimization, with check points revised based on the generated errors.

The overall structure of our study. Collecting manually written medication logs and transforming them into simulated clinical pharmacist check-in dialogues. The different sources of information were divided into two groups and given to DeepSeek-R1 combined with five prompts, and then medication log was generated. Machine scoring with human critique was then carried out, for the generated medication logs. Finally, manual co-checking and optimization were performed and check points were revised based on the generated error.
Data acquisition and processing
We collected 30 cases of medication profiles from Jiangsu Cancer Hospital. All medication profiles were originally documented in Chinese. To avoid a decrease in the quality of LLM generation due to excessively long text, 28 and to simulate a realistic clinical scenario, we divided each complete medication log (from the patient's admission to discharge) into days and excluded days that did not involve medication changes or specific medication analyses, resulting in a total of 80 daily logs for the original data. The original medication profile information was then systematically organized according to the application scenario. All patient-identifiable information was removed to protect privacy, and the remaining content was structured into a standardized input-format dataset. This dataset served as the foundational task corpus for the application of artificial intelligence (AI) models.
Rather than using raw electronic medical record (EMR) text directly as model input, we adopted a dialogue-based simulation strategy. In routine clinical pharmacy practice, medication logs are not generated by simply reformatting EMR medication sections, which are often fragmented, template-driven. Instead, pharmacists actively integrate information obtained through patient communication, medication reconciliation, and review of clinical data before synthesizing structured documentation.
To ensure data controllability and prevent information distortion during the log-to-dialogue transformation, pharmacist-authored medication logs were retrospectively converted into simulated clinical pharmacy round dialogues using DeepSeek-R1 and Supplemental Note 1 shows the prompt for the conversion. All dialogues were generated automatically and reviewed by three experienced clinical pharmacists solely to verify the completeness and clinical plausibility of medication-related content. No manual editing or rewriting was performed; dialogues that failed to meet pre-defined criteria were discarded and regenerated by the model. Only reviewer-approved, unmodified dialogues were used as input for subsequent SOAP-structured medication log generation.
LLM configuration and prompt design
Our study employed the DeepSeek-R1 model (https://chat.DeepSeek.com/) with the parameters set to default values.
We designed five prompts with increasing levels of constraint to progressively guide the LLM toward generating standardized SOAP-format medication logs. This stepwise design follows “Prompt Engineering” from Google and other studies, in which task specification, role definition, and structural constraints are incrementally introduced to improve output consistency and task alignment.26,29 Prompt 1 adopted a zero-shot design approach and provided only a simple task description. Prompt 2 was built upon Prompt 1 by adding a reminder of the identity of the clinical pharmacist, and Prompt 3 further extended Prompt 2 by adding definitions of the medication profile and medication log. Prompt 4 introduced a content template for the medication log that conformed to SOAP formatting requirements. Prompt 5 utilized the “CRISPE” framework. The “CRISPE” framework contains several elements: Capacity and Role (clarifying the role played by AI); Insight (provides sufficient background information and context to enable AI to better understand the problem); Statement (clearly state the requirements or problems); Personality (set the language style or structure for answering questions); Example (standard examples are provided in this study). Separate prompt sets were fine-tuned for the single-information-source and multi-information-source groups. Supplemental Notes 2 and 3 provide the detailed prompts design of this study.
Assessment system
We applied both machine scoring and manual scoring approaches to evaluate the quality of the medication logs generated by DeepSeek-R1. The machine score in this study consisted of the Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE).30,31 They were used to quantify semantic and lexical similarity between generated medication logs and pharmacist-authored reference texts. Higher values indicate greater content overlap and information coverage, whereas lower values suggest increased divergence from the reference. As these metrics do not directly reflect clinical accuracy or reliability and lack universally accepted performance thresholds, they were interpreted as supportive indicators and complemented by expert manual evaluation.
For manual scoring, we referenced EMR's manual critique approach, using 7DI (seven-dimensional indicators) scoring, 29 and adapted two indicators to our study content. The seven dimensions included: information integrity, terminology use, logical consistency, grammatical structure, stylistic style, subjective and objective information integrity, and pharmacovigilance content accuracy. Detailed scoring criteria could be found in Supplemental Note 4. All generated medication logs were independently evaluated by three experienced clinical pharmacists. The experts then summarized the advantages and disadvantages of DeepSeek-R1-generated text as well as the check points based on the common errors, and then the generated texts with the highest evaluation scores were manually corrected and refined according to these check points.
This process simulated the actual work of a clinical pharmacist using DeepSeek-R1 to generate medication logs and manually review and modify them.
Statistical analysis
The BERT score/ROUGE library was used for machine score calculations; “bert-base-chinese” pre-trained model was used to better compute the BERT score for Chinese text. Inter-rater reliability of the total 7DI scores among the three raters was assessed using Interclass Correlation Coefficient (ICC)(3,1). An independent two-sample
Ethical consideration
This study was reviewed and approved by the Institutional Review Board (IRB) of Jiangsu Cancer Hospital (Ethics approval No. KY-2025-085). The study involved a retrospective analysis of de-identified patient data, for which informed consent was not required. The IRB conducted a detailed review of our data handling protocol, including the nature of the de-identification process, the specific data fields involved, the mechanism of data transfer to the third-party service, and the privacy policies of DeepSeek (as understood at the time), and determined that the study met applicable ethical standards. All procedures were performed in accordance with the relevant guidelines and regulations, and the study complied with the ethical standards of the Declaration of Helsinki.
Results
Machine scoring of generated text
We used machine scores (the BERT score and ROUGE) to assess the similarity between the generated texts and the reference texts. Overall, the multi-information-source group consistently outperformed the single-information-source group. For example, for Prompt 4 (0.68 ± 0.03 vs. 0.65 ± 0.01,
For the BERT score, Prompt 4 with multi-source inputs achieved the highest score with a recall value of 0.69 ± 0.03 and a precision of 0.67 ± 0.03 and an

BERT score for the generated medication log. Scoring the text generated by the five prompts (precision, recall, and
Figure 3 demonstrates the t-distributed Stochastic Neighbor Embedding (t-SNE) plots of the embeddings of the two groups. The distributions of generated and reference texts were different. The distributions of Prompts 1, 2, and 3 were much closer together than those of Prompts 4 and 5.

t-SNE plots of generated text and reference text embeddings. (a) Single-information-source group. (b) Multi-information-source group.
Manual review of generated text
The generated text was then manually scored (7DI) by three professional clinical pharmacists to assess the content quality of the text generated by DeepSeek-R1 (Figure 4 and Supplemental Table 3). All three raters agreed that text generated from multi-information-source was superior to text from single-information-source with consistently higher mean 7DI scores across prompts (e.g. Prompt 4—Rater 1: 29.35 ± 1.54 vs. 25.66 ± 1.90; Rater 2: 29.55 ± 2.92 vs. 25.15 ± 1.77; Rater 3: 30.60 ± 1.85 vs. 27.49 ± 2.07; all

7DI scores for generated medication log. Three experts rate the quality of DeepSeek-R1's generation of medication log. (a) Single-information-source group. (b) Multi-information-source group. 7DI: seven-dimension index.
Overall assessment of the generated text
Since medication logs are typically long texts (thousands of characters), we first evaluated differences in the length of the texts generated by DeepSeek-R1 across different information-source conditions and prompt designs. A comprehensive evaluation revealed that the text volumes generated by Prompts 4 and 5 were substantially higher than Prompts 1, 2, and 3, irrespective of whether they were derived from a single or multiple sources of information. The text produced by Prompts 4 and 5 approached the length of manually written medication logs. On the other hand, through richer information sources, there was a boost in the number of words of generated text for Prompts 1 and 2 (Prompt 1: from 1567.25 ± 335.15 to 1932.34 ± 329.40; Prompt 2: from 1868.60 ± 270.03 to 2011.88 ± 296.29), but for the other prompts there was no significant enhancement (Figure 5).

Length of DeepSeek-R1-generated text. (a) Single-information-source group. (b) Multi-information-source group
In addition to length, we also assessed the overall structure of the generated text. Three experts agreed that the structure of the text generated by Prompts 4 and 5 was more consistent with the SOAP format. A Prompt 4-generated example is listed in Supplemental Notes 5 to demonstrate the structure and quality of DeepSeek-R1's generation of medication log. To further demonstrate the role of pharmacist oversight in real-world application, the Prompt 4 medication logs with multi-information-source were subsequently reviewed and corrected by experienced clinical pharmacists as a post-generation step. This manual review process was intended to identify and correct content-level inaccuracies and omissions, rather than to reflect improvements in the intrinsic performance of the model. Recurrent error patterns identified during pharmacist review were classified and are summarized in Table 1. Observed advantages, limitations, and pharmacist review checkpoints are provided in Supplemental Table 4. After manual modification, the BERT score and ROUGE of the generated text were both improved significantly (e.g. BERT

BERT score of optimized Prompt 4-generated text from multi-information-source. Comparison of BERT precision, recall, and
Classification and frequency of errors identified during pharmacist review of DeepSeek-R1-generated medication logs.
SOAP: Subjective, Objective, Assessment, and Plan.
Discussion
This study demonstrated the potential capacity of the open-source LLM DeepSeek-R1 for generating medication logs, which could accurately extract key pharmacy elements from EMRs and generate structured medication logs in SOAP format through carefully designed prompts and the construction of an information fusion mechanism that combined “pharmacist check-in dialogues-patient information-medical record information.” This work has the potential to support clinical pharmacists in documentation-related tasks and may contribute to improved workflow efficiency. Moreover, the immediacy of generation may facilitate timely identification of medication-related issues for subsequent review and communication within the healthcare team. This study also found that prompts characterized by stringent structural qualification requirements exhibited a marked superiority over zero-shot and few-shot prompts. Consequently, incorporating prompts that align with the clinical documentation workflow appears to be important. More importantly, the consistent superiority of multi-information-source inputs over single dialogue inputs represents a key practical insight of this study. Medication log generation is inherently a multi-source cognitive task that requires the integration of patient-reported information, prescription records, laboratory results, and clinical context. A single pharmacist–patient dialogue, even when well-structured, is insufficient to fully capture this complexity. Our findings demonstrate that LLM performance in this setting is not merely sensitive to prompt design, but fundamentally constrained by the completeness and reliability of upstream information. This suggests that in real-world clinical pharmacy applications, LLMs should not be deployed as dialogue-only summarization tools, but rather as components embedded within information systems capable of aggregating and presenting multi-source clinical data. In this sense, multi-source data integration is not an optional enhancement, but a pre-requisite for clinically reliable LLM-assisted medication documentation.
We chose DeepSeek-R1 as an LLM for the study because its open-source nature enables more flexible deployment within healthcare organizations, allowing local data to be processed without transmission over the internet, which avoids a lot of data privacy issues.32,33 However, this study only covered DeepSeek-R1 on a website without localized deployment. And DeepSeek-R1's ability to run in a localized deployment is something we need to investigate next.
In earlier studies, LLMs (e.g. ChatGPT) have been referenced for their capacity to produce EMRs, thereby indicating their aptitude for extracting information from patients’ verbal accounts and generating electronic documentation, 29 which can help healthcare workers to write medical records. 34 There is also some confirmation that an LLM is capable of generating a discharge summary (part of the medication profile).35,36 Another study was conducted to draw parallels between the comparative abilities of open-source and closed-source LLMs with regard to the generation of medical reports, and found that both open-source LLMs and closed-source LLMs had the capacity to demonstrate performance that was analogous to, or in some cases superior to, that of GPT in certain tasks. 37 It was hypothesized by another study that DeepSeek-R1 shared similarities with LLMs such as GPT in terms of its clinical reasoning capabilities. 38
It is worth noting that the machine scores in this study, which mainly evaluated the similarity between the generated text and the reference text, indicated that DeepSeek-R1-generated medication logs still exhibited gaps when compared with manually written medication logs, which was corroborated by the t-SNE plots (Figure 3). Similar t-SNE plots have been observed in other studies between the generated text and the reference text. 37 An analysis of the data suggested that this phenomenon may be attributable to the marked difference in writing style between DeepSeek-R1-generated texts and those produced manually. Other studies have explored the syntactic and lexical differences between AI-generated and human texts.39,40
During manual review of the DeepSeek-R1-generated medication logs, we similarly found a number of errors in the content, which was consistent with conclusions reported in previous studies. 28 Therefore, the review of DeepSeek-R1-generated texts by clinical pharmacists is indispensable. This study meticulously delineates the check points pertinent to the prevalent issues encountered in the DeepSeek-R1-generated text, which could assist clinical pharmacists in the more efficient identification and correction of problems generated by AI. Additionally, this study aims to empower clinical pharmacists to supplement personalized content through natural language, thus enhancing the comprehensibility and utility of the DeepSeek-R1-generated text.
Despite the remarkable results, this study has some limitations. First, the cases involved in this study were only those in oncology hospitals, which have certain limitations in terms of medication use and lack of analysis of whether there is consistency in the use of medication in other departments as well as some special medications (e.g. antimicrobials). Future studies should extend the proposed workflow to additional clinical specialties to evaluate its robustness across diverse medication management contexts. Furthermore, the present study exclusively encompassed the analysis of DeepSeek-R1 as an LLM, eschewing a comparison with other LLMs. The subsequent phase of the study can be initiated by expanding the range of cases and conducting a comparison with other LLMs from two perspectives. In addition, we cannot ignore the issues inherent to DeepSeek-R1 itself. These include the necessity to localize its use in order to circumvent data privacy concerns, as previously referenced, and the “black box” nature of LLMs such as DeepSeek-R1, which continues to give rise to concerns among pharmacists. 41 Future work should explore locally deployed implementations to further assess feasibility under real-world data privacy and institutional constraints.
Conclusion
This study demonstrates that an open-source LLM can generate structured SOAP-format medication logs when guided by carefully designed prompts and integrated with multi-source clinical information. The findings further indicate that the quality of LLM-generated medication documentation depends primarily on the completeness and integration of input data, as well as on structured prompt design, rather than on dialogue input alone. At the same time, observed content inaccuracies and omissions underscore that such outputs should be positioned as pharmacist-support tools rather than autonomous documentation systems, with expert review remaining essential for clinical reliability. Collectively, these results clarify the conditions under which open-source LLMs may be applied to medication log generation and provide methodological guidance for future research on LLM-assisted clinical documentation.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076261428234 - Supplemental material for Generating subjective, objective, assessment, and plan (SOAP)-structured medication logs using DeepSeek-R1 through prompt engineering and multi-source clinical information
Supplemental material, sj-docx-1-dhj-10.1177_20552076261428234 for Generating subjective, objective, assessment, and plan (SOAP)-structured medication logs using DeepSeek-R1 through prompt engineering and multi-source clinical information by Yuxuan Zhu, Jizhong Zhang, Yuhao Sun, Jiayu Wen, Zhixian Liu, Xin Liu, Silu Xu, Nan Wu, Yuanyuan Zhang, Guoren Zhou and Jifu Wei in DIGITAL HEALTH
Footnotes
Acknowledgments
The authors would like to express sincere gratitude to the pharmacists from the Pharmacy Department of Jiangsu Cancer Hospital for their assistance.
Ethical considerations
This study involves a retrospective study of de-identified patient data. The Institutional Review Board of Jiangsu Cancer Hospital has conducted an ethical review and approved it. The specific information is reflected in the Methods section of the article.
Author contributions
YXZ, YHS, and JZZ: investigation, methodology, software, data curation, formal analysis, visualization, and writing—original draft. JYW and ZXL: data curation, investigation, methodology, and writing—original draft. XL, SLX, and NW: methodology, data curation, formal analysis, validation, and writing—original draft. YYZ, GRZ, and JFW: conceptualization, supervision, and writing—review and editing. All authors have read and approved the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Yishan Project of Jiangsu Cancer Hospital (YSZD202406) and Qunfeng Project of Jiangsu Cancer Hospital (No. DFXK202501).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Data supporting this study's findings can be obtained from the corresponding author upon reasonable request.
Guarantor
Yuanyuan Zhang or Guoren Zhou or Jifu Wei.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
